darts.cli
¶
Entrypoint for the darts-pipeline CLI.
LoggingManager
module-attribute
¶
LoggingManager = (
darts.utils.logging.LoggingManagerSingleton()
)
app
module-attribute
¶
app = cyclopts.App(
version=darts.__version__,
console=rich.get_console(),
config=darts.cli.config_parser,
help_format="plaintext",
version_format="plaintext",
)
inference_app
module-attribute
¶
inference_app = cyclopts.App(
name="inference",
group=darts.cli.subcommands_group,
help="Predefined inference pipelines",
)
inference_data_app
module-attribute
¶
inference_data_app = cyclopts.App(
name="prep-data",
group=darts.cli.utilities_group,
help="Data preparation for offline use",
)
sequential_group
module-attribute
¶
subcommands_group
module-attribute
¶
training_app
module-attribute
¶
training_app = cyclopts.App(
name="training",
group=darts.cli.subcommands_group,
help="Predefined training pipelines",
)
training_data_app
module-attribute
¶
ConfigParser
¶
Parser for cyclopts config.
An own implementation is needed to select our own toml structure and source. Implemented as a class to be able to provide the config-file as a parameter of the CLI.
Initialize the ConfigParser (no-op).
Source code in darts/src/darts/utils/config.py
__call__
¶
__call__(
apps: list[cyclopts.App],
commands: tuple[str, ...],
arguments: cyclopts.ArgumentCollection,
)
Parser for cyclopts config. An own implementation is needed to select our own toml structure.
First, the configuration file at "config.toml" is loaded. Then, this config is flattened and then mapped to the input arguments of the called function. Hence parent keys are not considered.
Parameters:
-
apps(list[cyclopts.App]) –The cyclopts apps. Unused, but must be provided for the cyclopts hook.
-
commands(tuple[str, ...]) –The commands. Unused, but must be provided for the cyclopts hook.
-
arguments(cyclopts.ArgumentCollection) –The arguments to apply the config to.
Examples:
Setup the cyclopts App¶
import cyclopts
from darts.utils.config import ConfigParser
config_parser = ConfigParser()
app = cyclopts.App(config=config_parser)
# Intercept the logging behavior to add a file handler
@app.meta.default
def launcher(
*tokens: Annotated[str, cyclopts.Parameter(show=False, allow_leading_hyphen=True)],
log_dir: Path = Path("logs"),
config_file: Path = Path("config.toml"),
):
command, bound, _ = app.parse_args(tokens)
add_logging_handlers(command.__name__, console, log_dir)
return command(*bound.args, **bound.kwargs)
if __name__ == "__main__":
app.meta()
Usage¶
Config file ./config.toml:
Function signature which is called:
Calling the function from CLI:
Source code in darts/src/darts/utils/config.py
123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 | |
apply_config
¶
Apply the loaded config to the cyclopts mapping.
Parameters:
-
arguments(cyclopts.ArgumentCollection) –The arguments to apply the config to.
Source code in darts/src/darts/utils/config.py
open_config
¶
Open the config file, takes the 'darts' key, flattens the resulting dict and saves as config.
Parameters:
Source code in darts/src/darts/utils/config.py
PipelineV2Paths
dataclass
¶
PipelineV2Paths(
model_files: list[pathlib.Path] = None,
default_dirs: darts_utils.paths.DefaultPaths = (
lambda: darts_utils.paths.DefaultPaths()
)(),
output_data_dir: pathlib.Path | None = None,
arcticdem_dir: pathlib.Path | None = None,
tcvis_dir: pathlib.Path | None = None,
orthotiles_dir: pathlib.Path | None = None,
scenes_dir: pathlib.Path | None = None,
sentinel2_grid_dir: pathlib.Path | None = None,
raw_data_store: pathlib.Path | None = None,
raw_data_source: typing.Literal["cdse", "gee"] = "cdse",
no_raw_data_store: bool = False,
)
Default paths for v2 pipelines.
default_dirs
class-attribute
instance-attribute
¶
default_dirs: darts_utils.paths.DefaultPaths = dataclasses.field(
default_factory=lambda: darts_utils.paths.DefaultPaths()
)
raw_data_source
class-attribute
instance-attribute
¶
sentinel2_grid_dir
class-attribute
instance-attribute
¶
__post_init__
¶
Source code in darts/src/darts/pipelines/sequential_v2.py
log
¶
Log all paths managed.
Source code in darts/src/darts/pipelines/sequential_v2.py
PlanetPipeline
dataclass
¶
PlanetPipeline(
model_files: list[pathlib.Path] = None,
default_dirs: darts_utils.paths.DefaultPaths = (
lambda: darts_utils.paths.DefaultPaths()
)(),
output_data_dir: pathlib.Path | None = None,
arcticdem_dir: pathlib.Path | None = None,
tcvis_dir: pathlib.Path | None = None,
device: typing.Literal["cuda", "cpu", "auto"]
| int
| None = None,
ee_project: str | None = None,
ee_use_highvolume: bool = True,
tpi_outer_radius: int = 100,
tpi_inner_radius: int = 0,
patch_size: int = 1024,
overlap: int = 256,
batch_size: int = 8,
reflection: int = 0,
binarization_threshold: float = 0.5,
mask_erosion_size: int = 10,
edge_erosion_size: int | None = None,
min_object_size: int = 32,
quality_level: int
| typing.Literal[
"high_quality", "low_quality", "none"
] = 1,
export_bands: list[str] = (
lambda: [
"probabilities",
"binarized",
"polygonized",
"extent",
"thumbnail",
]
)(),
write_model_outputs: bool = False,
overwrite: bool = False,
offline: bool = False,
debug_data: bool = False,
orthotiles_dir: pathlib.Path | None = None,
scenes_dir: pathlib.Path | None = None,
image_ids: list = None,
)
Bases: darts.pipelines.sequential_v2._BasePipeline
Pipeline for processing PlanetScope data.
Processes PlanetScope imagery (both orthotiles and scenes) for RTS segmentation. Supports both offline and online processing modes.
Data Structure
Expects PlanetScope data organized as:
- Orthotiles: orthotiles_dir/tile_id/scene_id/
- Scenes: scenes_dir/scene_id/
Parameters:
-
orthotiles_dir(pathlib.Path | None, default:None) –Directory containing PlanetScope orthotiles. If None, uses default path from DARTS paths. Defaults to None.
-
scenes_dir(pathlib.Path | None, default:None) –Directory containing PlanetScope scenes. If None, uses default path from DARTS paths. Defaults to None.
-
image_ids(list | None, default:None) –List of image/scene IDs to process. If None, processes all images found in orthotiles_dir and scenes_dir. Defaults to None.
-
model_files(pathlib.Path | list[pathlib.Path] | None, default:None) –Path(s) to model file(s) for segmentation. Single Path implies
write_model_outputs=False. If None, searches default model directory for all .pt files. Defaults to None. -
output_data_dir(pathlib.Path | None, default:None) –Output directory for results. If None, uses
{default_out}/planet. Defaults to None. -
arcticdem_dir(pathlib.Path | None, default:None) –Directory for ArcticDEM datacube. Will be created/downloaded if needed. If None, uses default path. Defaults to None.
-
tcvis_dir(pathlib.Path | None, default:None) –Directory for TCVis data. If None, uses default path. Defaults to None.
-
device(typing.Literal['cuda', 'cpu', 'auto'] | int | None, default:None) –Computation device. "cuda" uses GPU 0, int specifies GPU index, "auto" selects free GPU. Defaults to None.
-
ee_project(str | None, default:None) –Earth Engine project ID. May be omitted if defined in persistent credentials. Defaults to None.
-
ee_use_highvolume(bool, default:True) –Whether to use EE high-volume server. Defaults to True.
-
tpi_outer_radius(int, default:100) –Outer radius (m) for TPI calculation. Defaults to 100.
-
tpi_inner_radius(int, default:0) –Inner radius (m) for TPI calculation. Defaults to 0.
-
patch_size(int, default:1024) –Patch size for inference. Defaults to 1024.
-
overlap(int, default:256) –Overlap between patches. Defaults to 256.
-
batch_size(int, default:8) –Batch size for inference. Defaults to 8.
-
reflection(int, default:0) –Reflection padding for inference. Defaults to 0.
-
binarization_threshold(float, default:0.5) –Threshold for binarizing probabilities. Defaults to 0.5.
-
mask_erosion_size(int, default:10) –Disk size for mask erosion and inner edge cropping. Defaults to 10.
-
edge_erosion_size(int | None, default:None) –Size for outer edge cropping. If None, uses
mask_erosion_size. Defaults to None. -
min_object_size(int, default:32) –Minimum object size (pixels) to keep. Defaults to 32.
-
quality_level(int | typing.Literal['high_quality', 'low_quality', 'none'], default:1) –Quality filtering level. 0="none", 1="low_quality", 2="high_quality". Defaults to 1.
-
export_bands(list[str], default:(lambda: ['probabilities', 'binarized', 'polygonized', 'extent', 'thumbnail'])()) –Bands to export. Can include "probabilities", "binarized", "polygonized", "extent", "thumbnail", "optical", "dem", "tcvis", "metadata", or specific band names. Defaults to ["probabilities", "binarized", "polygonized", "extent", "thumbnail"].
-
write_model_outputs(bool, default:False) –Save individual model outputs (not just ensemble). Defaults to False.
-
overwrite(bool, default:False) –Overwrite existing output files. Defaults to False.
-
offline(bool, default:False) –Skip downloading missing data. Defaults to False.
-
debug_data(bool, default:False) –Write intermediate debugging data. Defaults to False.
default_dirs
class-attribute
instance-attribute
¶
default_dirs: darts_utils.paths.DefaultPaths = dataclasses.field(
default_factory=lambda: darts_utils.paths.DefaultPaths()
)
device
class-attribute
instance-attribute
¶
export_bands
class-attribute
instance-attribute
¶
export_bands: list[str] = dataclasses.field(
default_factory=lambda: [
"probabilities",
"binarized",
"polygonized",
"extent",
"thumbnail",
]
)
quality_level
class-attribute
instance-attribute
¶
__post_init__
¶
Source code in darts/src/darts/pipelines/sequential_v2.py
cli
staticmethod
¶
cli(
*,
pipeline: darts.pipelines.sequential_v2.PlanetPipeline,
)
Run the sequential pipeline for PlanetScope data.
Parameters:
-
pipeline(darts.pipelines.sequential_v2.PlanetPipeline) –Configured PlanetPipeline instance.
cli_prepare_data
staticmethod
¶
cli_prepare_data(
*,
pipeline: darts.pipelines.sequential_v2.PlanetPipeline,
aux: bool = False,
force: bool = False,
)
Download all necessary data for offline processing.
Parameters:
-
pipeline(darts.pipelines.sequential_v2.PlanetPipeline) –Configured PlanetPipeline instance.
-
aux(bool, default:False) –If True, downloads auxiliary data (ArcticDEM, TCVis). Defaults to False.
-
force(bool, default:False) –If True, downloads all possible data, independent of the
auxflag or model needs. Defaults to False.
Source code in darts/src/darts/pipelines/sequential_v2.py
prepare_data
¶
Download and prepare data for offline processing.
Validates configuration, determines data requirements from models, and downloads requested data (optical imagery and/or auxiliary data).
Parameters:
-
optical(bool, default:False) –If True, downloads optical imagery. Defaults to False.
-
aux(bool, default:False) –If True, downloads auxiliary data (ArcticDEM, TCVis) as needed. Defaults to False.
-
force(bool, default:False) –If True, downloads all possible data, independent of
opticalandauxflags or model needs. Defaults to False.
Raises:
-
KeyboardInterrupt–If user interrupts execution.
-
SystemExit–If the process is terminated.
-
SystemError–If a system error occurs.
Source code in darts/src/darts/pipelines/sequential_v2.py
369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 | |
run
¶
Run the complete segmentation pipeline.
Executes the full pipeline including: 1. Configuration validation and dumping 2. Loading ensemble models 3. Creating/loading auxiliary datacubes 4. Processing each tile: - Loading optical data - Loading auxiliary data (ArcticDEM, TCVis) as needed - Preprocessing - Segmentation - Postprocessing - Exporting results 5. Saving results and timing information
Results are saved to the output directory with timestamped configuration, results parquet file, and timing information.
Raises:
-
KeyboardInterrupt–If user interrupts execution.
Source code in darts/src/darts/pipelines/sequential_v2.py
450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 | |
PlanetRayPipeline
dataclass
¶
PlanetRayPipeline(
model_files: list[pathlib.Path] = None,
output_data_dir: pathlib.Path = pathlib.Path(
"data/output"
),
arcticdem_dir: pathlib.Path = pathlib.Path(
"data/download/arcticdem"
),
tcvis_dir: pathlib.Path = pathlib.Path(
"data/download/tcvis"
),
num_cpus: int = 1,
devices: list[int] | None = None,
ee_project: str | None = None,
ee_use_highvolume: bool = True,
tpi_outer_radius: int = 100,
tpi_inner_radius: int = 0,
patch_size: int = 1024,
overlap: int = 256,
batch_size: int = 8,
reflection: int = 0,
binarization_threshold: float = 0.5,
mask_erosion_size: int = 10,
min_object_size: int = 32,
quality_level: int
| typing.Literal[
"high_quality", "low_quality", "none"
] = 1,
export_bands: list[str] = (
lambda: [
"probabilities",
"binarized",
"polygonized",
"extent",
"thumbnail",
]
)(),
write_model_outputs: bool = False,
overwrite: bool = False,
orthotiles_dir: pathlib.Path = pathlib.Path(
"data/input/planet/PSOrthoTile"
),
scenes_dir: pathlib.Path = pathlib.Path(
"data/input/planet/PSScene"
),
image_ids: list = None,
)
Bases: darts.pipelines.ray_v2._BaseRayPipeline
Pipeline for PlanetScope data.
Parameters:
-
orthotiles_dir(pathlib.Path, default:pathlib.Path('data/input/planet/PSOrthoTile')) –The directory containing the PlanetScope orthotiles.
-
scenes_dir(pathlib.Path, default:pathlib.Path('data/input/planet/PSScene')) –The directory containing the PlanetScope scenes.
-
image_ids(list, default:None) –The list of image ids to process. If None, all images in the directory will be processed.
-
model_files(pathlib.Path | list[pathlib.Path], default:None) –The path to the models to use for segmentation. Can also be a single Path to only use one model. This implies
write_model_outputs=FalseIf a list is provided, will use an ensemble of the models. -
output_data_dir(pathlib.Path, default:pathlib.Path('data/output')) –The "output" directory. Defaults to Path("data/output").
-
arcticdem_dir(pathlib.Path, default:pathlib.Path('data/download/arcticdem')) –The directory containing the ArcticDEM data (the datacube and the extent files). Will be created and downloaded if it does not exist. Defaults to Path("data/download/arcticdem").
-
tcvis_dir(pathlib.Path, default:pathlib.Path('data/download/tcvis')) –The directory containing the TCVis data. Defaults to Path("data/download/tcvis").
-
device(typing.Literal['cuda', 'cpu'] | int) –The device to run the model on. If "cuda" take the first device (0), if int take the specified device. If "auto" try to automatically select a free GPU (<50% memory usage). Defaults to "cuda" if available, else "cpu".
-
ee_project(str, default:None) –The Earth Engine project ID or number to use. May be omitted if project is defined within persistent API credentials obtained via
earthengine authenticate. -
ee_use_highvolume(bool, default:True) –Whether to use the high volume server (https://earthengine-highvolume.googleapis.com).
-
tpi_outer_radius(int, default:100) –The outer radius of the annulus kernel for the tpi calculation in m. Defaults to 100m.
-
tpi_inner_radius(int, default:0) –The inner radius of the annulus kernel for the tpi calculation in m. Defaults to 0.
-
patch_size(int, default:1024) –The patch size to use for inference. Defaults to 1024.
-
overlap(int, default:256) –The overlap to use for inference. Defaults to 16.
-
batch_size(int, default:8) –The batch size to use for inference. Defaults to 8.
-
reflection(int, default:0) –The reflection padding to use for inference. Defaults to 0.
-
binarization_threshold(float, default:0.5) –The threshold to binarize the probabilities. Defaults to 0.5.
-
mask_erosion_size(int, default:10) –The size of the disk to use for mask erosion and the edge-cropping. Defaults to 10.
-
min_object_size(int, default:32) –The minimum object size to keep in pixel. Defaults to 32.
-
quality_level(int | typing.Literal['high_quality', 'low_quality', 'none'], default:1) –The quality level to use for the segmentation. Can also be an int. In this case 0="none" 1="low_quality" 2="high_quality". Defaults to 1.
-
export_bands(list[str], default:(lambda: ['probabilities', 'binarized', 'polygonized', 'extent', 'thumbnail'])()) –The bands to export. Can be a list of "probabilities", "binarized", "polygonized", "extent", "thumbnail", "optical", "dem", "tcvis" or concrete band-names. Defaults to ["probabilities", "binarized", "polygonized", "extent", "thumbnail"].
-
write_model_outputs(bool, default:False) –Also save the model outputs, not only the ensemble result. Defaults to False.
-
overwrite(bool, default:False) –Whether to overwrite existing files. Defaults to False.
arcticdem_dir
class-attribute
instance-attribute
¶
export_bands
class-attribute
instance-attribute
¶
export_bands: list[str] = dataclasses.field(
default_factory=lambda: [
"probabilities",
"binarized",
"polygonized",
"extent",
"thumbnail",
]
)
orthotiles_dir
class-attribute
instance-attribute
¶
output_data_dir
class-attribute
instance-attribute
¶
quality_level
class-attribute
instance-attribute
¶
scenes_dir
class-attribute
instance-attribute
¶
tcvis_dir
class-attribute
instance-attribute
¶
cli
staticmethod
¶
cli(*, pipeline: darts.pipelines.ray_v2.PlanetRayPipeline)
run
¶
Source code in darts/src/darts/pipelines/ray_v2.py
94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 | |
Sentinel2Pipeline
dataclass
¶
Sentinel2Pipeline(
model_files: list[pathlib.Path] = None,
default_dirs: darts_utils.paths.DefaultPaths = (
lambda: darts_utils.paths.DefaultPaths()
)(),
output_data_dir: pathlib.Path | None = None,
arcticdem_dir: pathlib.Path | None = None,
tcvis_dir: pathlib.Path | None = None,
device: typing.Literal["cuda", "cpu", "auto"]
| int
| None = None,
ee_project: str | None = None,
ee_use_highvolume: bool = True,
tpi_outer_radius: int = 100,
tpi_inner_radius: int = 0,
patch_size: int = 1024,
overlap: int = 256,
batch_size: int = 8,
reflection: int = 0,
binarization_threshold: float = 0.5,
mask_erosion_size: int = 10,
edge_erosion_size: int | None = None,
min_object_size: int = 32,
quality_level: int
| typing.Literal[
"high_quality", "low_quality", "none"
] = 1,
export_bands: list[str] = (
lambda: [
"probabilities",
"binarized",
"polygonized",
"extent",
"thumbnail",
]
)(),
write_model_outputs: bool = False,
overwrite: bool = False,
offline: bool = False,
debug_data: bool = False,
scene_ids: list[str] | None = None,
scene_id_file: pathlib.Path | None = None,
tile_ids: list[str] | None = None,
aoi_file: pathlib.Path | None = None,
start_date: str | None = None,
end_date: str | None = None,
max_cloud_cover: int | None = 10,
max_snow_cover: int | None = 10,
months: list[int] | None = None,
years: list[int] | None = None,
prep_data_scene_id_file: pathlib.Path | None = None,
sentinel2_grid_dir: pathlib.Path | None = None,
raw_data_store: pathlib.Path | None = None,
no_raw_data_store: bool = False,
raw_data_source: typing.Literal["gee", "cdse"] = "cdse",
)
Bases: darts.pipelines.sequential_v2._BasePipeline
Pipeline for processing Sentinel-2 data.
Processes Sentinel-2 Surface Reflectance (SR) imagery from either CDSE or Google Earth Engine. Supports multiple scene selection methods and flexible filtering options.
Source Selection
The data source is specified via the raw_data_source parameter:
- "cdse": Copernicus Data Space Ecosystem (CDSE)
- "gee": Google Earth Engine (GEE)
Both sources require accounts and proper credential setup on the system.
Scene Selection
Scenes can be selected using one of four mutually exclusive methods (priority order):
scene_ids: Direct list of Sentinel-2 scene IDsscene_id_file: JSON file containing scene IDstile_ids: List of Sentinel-2 tile IDs (e.g., "33UVP") with optional filtersaoi_file: Shapefile defining area of interest with optional filters
Offline Processing
Use cli_prepare_data to download data for offline use.
The prep_data_scene_id_file stores scene IDs from queries for offline reuse.
Parameters:
-
scene_ids(list[str] | None, default:None) –Direct list of Sentinel-2 scene IDs to process. Defaults to None.
-
scene_id_file(pathlib.Path | None, default:None) –JSON file containing scene IDs to process. Defaults to None.
-
tile_ids(list[str] | None, default:None) –List of Sentinel-2 tile IDs (requires filtering params). Defaults to None.
-
aoi_file(pathlib.Path | None, default:None) –Shapefile with area of interest (requires filtering params). Defaults to None.
-
start_date(str | None, default:None) –Start date for filtering (YYYY-MM-DD format). Defaults to None.
-
end_date(str | None, default:None) –End date for filtering (YYYY-MM-DD format). Defaults to None.
-
max_cloud_cover(int | None, default:10) –Maximum cloud cover percentage (0-100). Defaults to 10.
-
max_snow_cover(int | None, default:10) –Maximum snow cover percentage (0-100). Defaults to 10.
-
months(list[int] | None, default:None) –Filter by months (1-12). Defaults to None.
-
years(list[int] | None, default:None) –Filter by years. Defaults to None.
-
prep_data_scene_id_file(pathlib.Path | None, default:None) –File to store/load scene IDs for offline processing. Written during
prepare_data, read during offlinerun. Defaults to None. -
sentinel2_grid_dir(pathlib.Path | None, default:None) –Directory for Sentinel-2 grid shapefiles. Used only in
prepare_datawithtile_ids. If None, uses default path. Defaults to None. -
raw_data_store(pathlib.Path | None, default:None) –Directory for storing raw Sentinel-2 data locally. If None, uses default path based on
raw_data_source. Defaults to None. -
no_raw_data_store(bool, default:False) –If True, processes data in-memory without local storage. Overrides
raw_data_store. Defaults to False. -
raw_data_source(typing.Literal['gee', 'cdse'], default:'cdse') –Data source to use. Defaults to "cdse".
-
model_files(pathlib.Path | list[pathlib.Path] | None, default:None) –Path(s) to model file(s) for segmentation. Single Path implies
write_model_outputs=False. If None, searches default model directory for all .pt files. Defaults to None. -
output_data_dir(pathlib.Path | None, default:None) –Output directory for results. If None, uses
{default_out}/sentinel2-{raw_data_source}. Defaults to None. -
arcticdem_dir(pathlib.Path | None, default:None) –Directory for ArcticDEM datacube. Will be created/downloaded if needed. If None, uses default path. Defaults to None.
-
tcvis_dir(pathlib.Path | None, default:None) –Directory for TCVis data. If None, uses default path. Defaults to None.
-
device(typing.Literal['cuda', 'cpu', 'auto'] | int | None, default:None) –Computation device. "cuda" uses GPU 0, int specifies GPU index, "auto" selects free GPU. Defaults to None.
-
ee_project(str | None, default:None) –Earth Engine project ID. May be omitted if defined in persistent credentials. Defaults to None.
-
ee_use_highvolume(bool, default:True) –Whether to use EE high-volume server. Defaults to True.
-
tpi_outer_radius(int, default:100) –Outer radius (m) for TPI calculation. Defaults to 100.
-
tpi_inner_radius(int, default:0) –Inner radius (m) for TPI calculation. Defaults to 0.
-
patch_size(int, default:1024) –Patch size for inference. Defaults to 1024.
-
overlap(int, default:256) –Overlap between patches. Defaults to 256.
-
batch_size(int, default:8) –Batch size for inference. Defaults to 8.
-
reflection(int, default:0) –Reflection padding for inference. Defaults to 0.
-
binarization_threshold(float, default:0.5) –Threshold for binarizing probabilities. Defaults to 0.5.
-
mask_erosion_size(int, default:10) –Disk size for mask erosion and inner edge cropping. Defaults to 10.
-
edge_erosion_size(int | None, default:None) –Size for outer edge cropping. If None, uses
mask_erosion_size. Defaults to None. -
min_object_size(int, default:32) –Minimum object size (pixels) to keep. Defaults to 32.
-
quality_level(int | typing.Literal['high_quality', 'low_quality', 'none'], default:1) –Quality filtering level. 0="none", 1="low_quality", 2="high_quality". Defaults to 1.
-
export_bands(list[str], default:(lambda: ['probabilities', 'binarized', 'polygonized', 'extent', 'thumbnail'])()) –Bands to export. Can include "probabilities", "binarized", "polygonized", "extent", "thumbnail", "optical", "dem", "tcvis", "metadata", or specific band names. Defaults to ["probabilities", "binarized", "polygonized", "extent", "thumbnail"].
-
write_model_outputs(bool, default:False) –Save individual model outputs (not just ensemble). Defaults to False.
-
overwrite(bool, default:False) –Overwrite existing output files. Defaults to False.
-
offline(bool, default:False) –Skip downloading missing data. Requires pre-downloaded data. Defaults to False.
-
debug_data(bool, default:False) –Write intermediate debugging data to output directory. Defaults to False.
default_dirs
class-attribute
instance-attribute
¶
default_dirs: darts_utils.paths.DefaultPaths = dataclasses.field(
default_factory=lambda: darts_utils.paths.DefaultPaths()
)
device
class-attribute
instance-attribute
¶
export_bands
class-attribute
instance-attribute
¶
export_bands: list[str] = dataclasses.field(
default_factory=lambda: [
"probabilities",
"binarized",
"polygonized",
"extent",
"thumbnail",
]
)
prep_data_scene_id_file
class-attribute
instance-attribute
¶
quality_level
class-attribute
instance-attribute
¶
raw_data_source
class-attribute
instance-attribute
¶
sentinel2_grid_dir
class-attribute
instance-attribute
¶
__post_init__
¶
Source code in darts/src/darts/pipelines/sequential_v2.py
cli
staticmethod
¶
cli(
*,
pipeline: darts.pipelines.sequential_v2.Sentinel2Pipeline,
)
Run the sequential pipeline for Sentinel-2 data.
Parameters:
-
pipeline(darts.pipelines.sequential_v2.Sentinel2Pipeline) –Configured Sentinel2Pipeline instance.
Source code in darts/src/darts/pipelines/sequential_v2.py
cli_prepare_data
staticmethod
¶
cli_prepare_data(
*,
pipeline: darts.pipelines.sequential_v2.Sentinel2Pipeline,
optical: bool = False,
aux: bool = False,
force: bool = False,
)
Download all necessary data for offline processing.
Queries the data source (CDSE or GEE) for scene IDs and downloads optical and/or auxiliary data.
Stores scene IDs in prep_data_scene_id_file if specified for later offline use.
Parameters:
-
pipeline(darts.pipelines.sequential_v2.Sentinel2Pipeline) –Configured Sentinel2Pipeline instance.
-
optical(bool, default:False) –If True, downloads optical (Sentinel-2) imagery. Defaults to False.
-
aux(bool, default:False) –If True, downloads auxiliary data (ArcticDEM, TCVis). Defaults to False.
-
force(bool, default:False) –If True, downloads all possible data, independent of
opticalandauxflags or model needs. Defaults to False.
Source code in darts/src/darts/pipelines/sequential_v2.py
prepare_data
¶
Download and prepare data for offline processing.
Validates configuration, determines data requirements from models, and downloads requested data (optical imagery and/or auxiliary data).
Parameters:
-
optical(bool, default:False) –If True, downloads optical imagery. Defaults to False.
-
aux(bool, default:False) –If True, downloads auxiliary data (ArcticDEM, TCVis) as needed. Defaults to False.
-
force(bool, default:False) –If True, downloads all possible data, independent of
opticalandauxflags or model needs. Defaults to False.
Raises:
-
KeyboardInterrupt–If user interrupts execution.
-
SystemExit–If the process is terminated.
-
SystemError–If a system error occurs.
Source code in darts/src/darts/pipelines/sequential_v2.py
369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 | |
run
¶
Run the complete segmentation pipeline.
Executes the full pipeline including: 1. Configuration validation and dumping 2. Loading ensemble models 3. Creating/loading auxiliary datacubes 4. Processing each tile: - Loading optical data - Loading auxiliary data (ArcticDEM, TCVis) as needed - Preprocessing - Segmentation - Postprocessing - Exporting results 5. Saving results and timing information
Results are saved to the output directory with timestamped configuration, results parquet file, and timing information.
Raises:
-
KeyboardInterrupt–If user interrupts execution.
Source code in darts/src/darts/pipelines/sequential_v2.py
450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 | |
Sentinel2RayPipeline
dataclass
¶
Sentinel2RayPipeline(
model_files: list[pathlib.Path] = None,
output_data_dir: pathlib.Path = pathlib.Path(
"data/output"
),
arcticdem_dir: pathlib.Path = pathlib.Path(
"data/download/arcticdem"
),
tcvis_dir: pathlib.Path = pathlib.Path(
"data/download/tcvis"
),
num_cpus: int = 1,
devices: list[int] | None = None,
ee_project: str | None = None,
ee_use_highvolume: bool = True,
tpi_outer_radius: int = 100,
tpi_inner_radius: int = 0,
patch_size: int = 1024,
overlap: int = 256,
batch_size: int = 8,
reflection: int = 0,
binarization_threshold: float = 0.5,
mask_erosion_size: int = 10,
min_object_size: int = 32,
quality_level: int
| typing.Literal[
"high_quality", "low_quality", "none"
] = 1,
export_bands: list[str] = (
lambda: [
"probabilities",
"binarized",
"polygonized",
"extent",
"thumbnail",
]
)(),
write_model_outputs: bool = False,
overwrite: bool = False,
aoi_shapefile: pathlib.Path = None,
start_date: str = None,
end_date: str = None,
max_cloud_cover: int = 10,
input_cache: pathlib.Path = pathlib.Path(
"data/cache/input"
),
)
Bases: darts.pipelines.ray_v2._BaseRayPipeline
Pipeline for Sentinel 2 data based on an area of interest.
Parameters:
-
aoi_shapefile(pathlib.Path, default:None) –The shapefile containing the area of interest.
-
start_date(str, default:None) –The start date of the time series in YYYY-MM-DD format.
-
end_date(str, default:None) –The end date of the time series in YYYY-MM-DD format.
-
max_cloud_cover(int, default:10) –The maximum cloud cover percentage to use for filtering the Sentinel 2 scenes. Defaults to 10.
-
input_cache(pathlib.Path, default:pathlib.Path('data/cache/input')) –The directory to use for caching the input data. Defaults to Path("data/cache/input").
-
model_files(pathlib.Path | list[pathlib.Path], default:None) –The path to the models to use for segmentation. Can also be a single Path to only use one model. This implies
write_model_outputs=FalseIf a list is provided, will use an ensemble of the models. -
output_data_dir(pathlib.Path, default:pathlib.Path('data/output')) –The "output" directory. Defaults to Path("data/output").
-
arcticdem_dir(pathlib.Path, default:pathlib.Path('data/download/arcticdem')) –The directory containing the ArcticDEM data (the datacube and the extent files). Will be created and downloaded if it does not exist. Defaults to Path("data/download/arcticdem").
-
tcvis_dir(pathlib.Path, default:pathlib.Path('data/download/tcvis')) –The directory containing the TCVis data. Defaults to Path("data/download/tcvis").
-
device(typing.Literal['cuda', 'cpu'] | int) –The device to run the model on. If "cuda" take the first device (0), if int take the specified device. If "auto" try to automatically select a free GPU (<50% memory usage). Defaults to "cuda" if available, else "cpu".
-
ee_project(str, default:None) –The Earth Engine project ID or number to use. May be omitted if project is defined within persistent API credentials obtained via
earthengine authenticate. -
ee_use_highvolume(bool, default:True) –Whether to use the high volume server (https://earthengine-highvolume.googleapis.com).
-
tpi_outer_radius(int, default:100) –The outer radius of the annulus kernel for the tpi calculation in m. Defaults to 100m.
-
tpi_inner_radius(int, default:0) –The inner radius of the annulus kernel for the tpi calculation in m. Defaults to 0.
-
patch_size(int, default:1024) –The patch size to use for inference. Defaults to 1024.
-
overlap(int, default:256) –The overlap to use for inference. Defaults to 16.
-
batch_size(int, default:8) –The batch size to use for inference. Defaults to 8.
-
reflection(int, default:0) –The reflection padding to use for inference. Defaults to 0.
-
binarization_threshold(float, default:0.5) –The threshold to binarize the probabilities. Defaults to 0.5.
-
mask_erosion_size(int, default:10) –The size of the disk to use for mask erosion and the edge-cropping. Defaults to 10.
-
min_object_size(int, default:32) –The minimum object size to keep in pixel. Defaults to 32.
-
quality_level(int | typing.Literal['high_quality', 'low_quality', 'none'], default:1) –The quality level to use for the segmentation. Can also be an int. In this case 0="none" 1="low_quality" 2="high_quality". Defaults to 1.
-
export_bands(list[str], default:(lambda: ['probabilities', 'binarized', 'polygonized', 'extent', 'thumbnail'])()) –The bands to export. Can be a list of "probabilities", "binarized", "polygonized", "extent", "thumbnail", "optical", "dem", "tcvis" or concrete band-names. Defaults to ["probabilities", "binarized", "polygonized", "extent", "thumbnail"].
-
write_model_outputs(bool, default:False) –Also save the model outputs, not only the ensemble result. Defaults to False.
-
overwrite(bool, default:False) –Whether to overwrite existing files. Defaults to False.
arcticdem_dir
class-attribute
instance-attribute
¶
export_bands
class-attribute
instance-attribute
¶
export_bands: list[str] = dataclasses.field(
default_factory=lambda: [
"probabilities",
"binarized",
"polygonized",
"extent",
"thumbnail",
]
)
input_cache
class-attribute
instance-attribute
¶
output_data_dir
class-attribute
instance-attribute
¶
quality_level
class-attribute
instance-attribute
¶
tcvis_dir
class-attribute
instance-attribute
¶
cli
staticmethod
¶
cli(
*, pipeline: darts.pipelines.ray_v2.Sentinel2RayPipeline
)
run
¶
Source code in darts/src/darts/pipelines/ray_v2.py
94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 | |
VerbosityLevel
¶
Enum for verbosity levels.
from_cli
classmethod
¶
Get the verbosity level from CLI flags.
Parameters:
-
verbose(bool) –Whether the verbose flag is set.
-
very_verbose(bool) –Whether the very verbose flag is set.
-
debug(bool) –Whether the debug flag is set.
Returns:
-
VerbosityLevel(darts.utils.logging.VerbosityLevel) –The corresponding verbosity level.
Source code in darts/src/darts/utils/logging.py
benchviz
¶
Visulize benchmark based on a Stopuhr data file produced by a pipeline run.
Note
This function changes the seaborn theme to "whitegrid" for better visualization.
Parameters:
-
stopuhr_data(pathlib.Path) –Path to the Stopuhr data file.
-
viz_dir(pathlib.Path | None, default:None) –Path to the directory where the visualization will be saved. If None, the defaults to the parent directory of the Stopuhr data file. Defaults to None.
Returns:
-
–
plt.Figure: A matplotlib figure containing the benchmark visualization.
Source code in darts/src/darts/utils/bench.py
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 | |
debug_default_paths
¶
debug_default_paths(
default_paths: darts_utils.paths.DefaultPaths = darts_utils.paths.DefaultPaths(),
pipeline_paths: darts.pipelines.sequential_v2.PipelineV2Paths = darts.pipelines.sequential_v2.PipelineV2Paths(),
)
Debug and print the current DARTS paths.
Parameters:
-
default_paths(darts_utils.paths.DefaultPaths, default:darts_utils.paths.DefaultPaths()) –Default paths to set before logging. Defaults to DefaultPaths().
-
pipeline_paths(darts.pipelines.sequential_v2.PipelineV2Paths, default:darts.pipelines.sequential_v2.PipelineV2Paths()) –Pipeline paths to log. Defaults to PipelineV2Paths().
Source code in darts/src/darts/cli.py
env_info
¶
hello
¶
Say hello to someone.
Parameters:
-
name(str) –The name of the person to say hello to
-
n(int, default:1) –The number of times to say hello. Defaults to 1.
Raises:
-
ValueError–If n is 3.
Source code in darts/src/darts/cli.py
help
¶
launcher
¶
launcher(
*tokens: str,
log_dir: pathlib.Path = pathlib.Path("logs"),
config_file: pathlib.Path = pathlib.Path("config.toml"),
verbose: bool = False,
very_verbose: bool = False,
debug: bool = False,
log_plain: bool = False,
)
Source code in darts/src/darts/cli.py
preprocess_planet_train_data
¶
preprocess_planet_train_data(
*,
data_dir: pathlib.Path,
labels_dir: pathlib.Path,
default_dirs: darts_utils.paths.DefaultPaths = darts_utils.paths.DefaultPaths(),
train_data_dir: pathlib.Path | None = None,
arcticdem_dir: pathlib.Path | None = None,
tcvis_dir: pathlib.Path | None = None,
admin_dir: pathlib.Path | None = None,
preprocess_cache: pathlib.Path | None = None,
force_preprocess: bool = False,
append: bool = True,
device: typing.Literal["cuda", "cpu", "auto"]
| int
| None = None,
ee_project: str | None = None,
ee_use_highvolume: bool = True,
tpi_outer_radius: int = 100,
tpi_inner_radius: int = 0,
patch_size: int = 1024,
overlap: int = 16,
exclude_nopositive: bool = False,
exclude_nan: bool = True,
)
Preprocess Planet data for training.
This function preprocesses Planet scenes into a training-ready format by creating fixed-size patches and storing them in a zarr array for efficient random access during training. All data is stored in a single zarr group with associated metadata.
The preprocessing creates patches of the specified size from each Planet scene and stores them as: - A zarr group containing 'x' (input data) and 'y' (labels) arrays - A geopandas dataframe with metadata including region, position, and label statistics - A configuration file with preprocessing parameters
The x dataarray contains the input data with shape (n_patches, n_bands, patch_size, patch_size). The y dataarray contains the labels with shape (n_patches, patch_size, patch_size). Both dataarrays are chunked along the n_patches dimension with chunk size 1, resulting in each patch being stored in a separate file for super fast random access.
The metadata dataframe contains information about each patch including: - sample_id: Identifier for the source Planet scene - region: Administrative region name - geometry: Spatial extent of the patch - empty: Whether the patch contains positive labeled pixels - Additional metadata as specified
Through exclude_nopositve and exclude_nan, respective patches can be excluded from the final data.
A config.toml file is saved in the train_data_dir containing the configuration used for the
preprocessing. Additionally, a timestamp-based CLI configuration file is saved for reproducibility.
The final directory structure of train_data_dir will look like this:
train_data_dir/
├── config.toml
├── data.zarr/
│ ├── x/ # Input patches [n_patches, n_bands, patch_size, patch_size]
│ └── y/ # Label patches [n_patches, patch_size, patch_size]
├── metadata.parquet
└── {timestamp}.cli.toml
Parameters:
-
data_dir(pathlib.Path) –The directory containing the Planet scenes and orthotiles.
-
labels_dir(pathlib.Path) –The directory containing the labels and footprints / extents.
-
default_dirs(darts_utils.paths.DefaultPaths, default:darts_utils.paths.DefaultPaths()) –The default directories for DARTS. Defaults to a config filled with None.
-
train_data_dir(pathlib.Path | None, default:None) –The "output" directory where the tensors are written to. If None, will use the default training data directory based on the DARTS paths. Defaults to None.
-
arcticdem_dir(pathlib.Path | None, default:None) –The directory containing the ArcticDEM data (the datacube and the extent files). Will be created and downloaded if it does not exist. If None, will use the default auxiliary directory based on the DARTS paths. Defaults to None.
-
tcvis_dir(pathlib.Path | None, default:None) –The directory containing the TCVis data. If None, will use the default TCVis directory based on the DARTS paths. Defaults to None.
-
admin_dir(pathlib.Path | None, default:None) –The directory containing the admin files. If None, will use the default auxiliary directory based on the DARTS paths. Defaults to None.
-
preprocess_cache(pathlib.Path | None, default:None) –The directory to store the preprocessed data. If None, will neither use nor store preprocessed data. Defaults to None.
-
force_preprocess(bool, default:False) –Whether to force the preprocessing of the data. Defaults to False.
-
append(bool, default:True) –Whether to append the data to the existing data. Defaults to True.
-
device(typing.Literal['cuda', 'cpu'] | int, default:None) –The device to run the model on. If "cuda" take the first device (0), if int take the specified device. If "auto" try to automatically select a free GPU (<50% memory usage). Defaults to "cuda" if available, else "cpu".
-
ee_project(str, default:None) –The Earth Engine project ID or number to use. May be omitted if project is defined within persistent API credentials obtained via
earthengine authenticate. -
ee_use_highvolume(bool, default:True) –Whether to use the high volume server (https://earthengine-highvolume.googleapis.com).
-
tpi_outer_radius(int, default:100) –The outer radius of the annulus kernel for the tpi calculation in m. Defaults to 100m.
-
tpi_inner_radius(int, default:0) –The inner radius of the annulus kernel for the tpi calculation in m. Defaults to 0.
-
patch_size(int, default:1024) –The patch size to use for inference. Defaults to 1024.
-
overlap(int, default:16) –The overlap to use for inference. Defaults to 16.
-
exclude_nopositive(bool, default:False) –Whether to exclude patches where the labels do not contain positives. Defaults to False.
-
exclude_nan(bool, default:True) –Whether to exclude patches where the input data has nan values. Defaults to True.
Source code in darts/src/darts/training/preprocess_planet_v2.py
55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 | |
preprocess_planet_train_data_pingo
¶
preprocess_planet_train_data_pingo(
*,
data_dir: pathlib.Path,
labels_dir: pathlib.Path,
default_dirs: darts_utils.paths.DefaultPaths = darts_utils.paths.DefaultPaths(),
train_data_dir: pathlib.Path | None = None,
arcticdem_dir: pathlib.Path | None = None,
tcvis_dir: pathlib.Path | None = None,
admin_dir: pathlib.Path | None = None,
preprocess_cache: pathlib.Path | None = None,
force_preprocess: bool = False,
device: typing.Literal["cuda", "cpu", "auto"]
| int
| None = None,
ee_project: str | None = None,
ee_use_highvolume: bool = True,
tpi_outer_radius: int = 100,
tpi_inner_radius: int = 0,
patch_size: int = 1024,
overlap: int = 16,
exclude_nopositive: bool = False,
exclude_nan: bool = True,
)
Preprocess Planet data for training (Pingo version).
This function preprocesses Planet scenes into a training-ready format by creating fixed-size patches and storing them in a zarr array for efficient random access during training. All data is stored in a single zarr group with associated metadata.
The preprocessing creates patches of the specified size from each Planet scene and stores them as: - A zarr group containing 'x' (input data) and 'y' (labels) arrays - A geopandas dataframe with metadata including region, position, and label statistics - A configuration file with preprocessing parameters
The x dataarray contains the input data with shape (n_patches, n_bands, patch_size, patch_size). The y dataarray contains the labels with shape (n_patches, patch_size, patch_size). Both dataarrays are chunked along the n_patches dimension with chunk size 1, resulting in each patch being stored in a separate file for super fast random access.
The metadata dataframe contains information about each patch including: - sample_id: Identifier for the source Planet scene - region: Administrative region name - geometry: Spatial extent of the patch - empty: Whether the patch contains positive labeled pixels - Additional metadata as specified
Through exclude_nopositive and exclude_nan, respective patches can be excluded from the final data.
A config.toml file is saved in the train_data_dir containing the configuration used for the
preprocessing. Additionally, a timestamp-based CLI configuration file is saved for reproducibility.
The final directory structure of train_data_dir will look like this:
train_data_dir/
├── config.toml
├── data.zarr/
│ ├── x/ # Input patches [n_patches, n_bands, patch_size, patch_size]
│ └── y/ # Label patches [n_patches, patch_size, patch_size]
├── metadata.parquet
└── {timestamp}.cli.json
Parameters:
-
data_dir(pathlib.Path) –The directory containing the Planet scenes and orthotiles.
-
labels_dir(pathlib.Path) –The directory containing the labels and footprints / extents.
-
default_dirs(darts_utils.paths.DefaultPaths, default:darts_utils.paths.DefaultPaths()) –The default directories for DARTS. Defaults to a config filled with None.
-
train_data_dir(pathlib.Path | None, default:None) –The "output" directory where the tensors are written to. If None, will use the default training data directory based on the DARTS paths. Defaults to None.
-
arcticdem_dir(pathlib.Path | None, default:None) –The directory containing the ArcticDEM data (the datacube and the extent files). Will be created and downloaded if it does not exist. If None, will use the default auxiliary directory based on the DARTS paths. Defaults to None.
-
tcvis_dir(pathlib.Path | None, default:None) –The directory containing the TCVis data. If None, will use the default TCVis directory based on the DARTS paths. Defaults to None.
-
admin_dir(pathlib.Path | None, default:None) –The directory containing the admin files. If None, will use the default auxiliary directory based on the DARTS paths. Defaults to None.
-
preprocess_cache(pathlib.Path | None, default:None) –The directory to store the preprocessed data. If None, will neither use nor store preprocessed data. Defaults to None.
-
force_preprocess(bool, default:False) –Whether to force the preprocessing of the data. Defaults to False.
-
device(typing.Literal['cuda', 'cpu'] | int, default:None) –The device to run the model on. If "cuda" take the first device (0), if int take the specified device. If "auto" try to automatically select a free GPU (<50% memory usage). Defaults to "cuda" if available, else "cpu".
-
ee_project(str, default:None) –The Earth Engine project ID or number to use. May be omitted if project is defined within persistent API credentials obtained via
earthengine authenticate. -
ee_use_highvolume(bool, default:True) –Whether to use the high volume server (https://earthengine-highvolume.googleapis.com).
-
tpi_outer_radius(int, default:100) –The outer radius of the annulus kernel for the tpi calculation in m. Defaults to 100m.
-
tpi_inner_radius(int, default:0) –The inner radius of the annulus kernel for the tpi calculation in m. Defaults to 0.
-
patch_size(int, default:1024) –The patch size to use for inference. Defaults to 1024.
-
overlap(int, default:16) –The overlap to use for inference. Defaults to 16.
-
exclude_nopositive(bool, default:False) –Whether to exclude patches where the labels do not contain positives. Defaults to False.
-
exclude_nan(bool, default:True) –Whether to exclude patches where the input data has nan values. Defaults to True.
Source code in darts/src/darts/training/preprocess_planet_v2_pingo.py
39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 | |
preprocess_s2_train_data
¶
preprocess_s2_train_data(
*,
labels_dir: pathlib.Path,
default_dirs: darts_utils.paths.DefaultPaths = darts_utils.paths.DefaultPaths(),
train_data_dir: pathlib.Path | None = None,
arcticdem_dir: pathlib.Path | None = None,
tcvis_dir: pathlib.Path | None = None,
admin_dir: pathlib.Path | None = None,
planet_data_dir: pathlib.Path | None = None,
raw_data_store: pathlib.Path | None = None,
no_raw_data_store: bool = False,
preprocess_cache: pathlib.Path | None = None,
matching_cache: pathlib.Path | None = None,
no_matching_cache: bool = False,
force_preprocess: bool = False,
append: bool = True,
device: typing.Literal["cuda", "cpu", "auto"]
| int
| None = None,
ee_project: str | None = None,
ee_use_highvolume: bool = True,
matching_day_range: int = 7,
matching_max_cloud_cover: int = 10,
matching_min_intersects: float = 0.7,
tpi_outer_radius: int = 100,
tpi_inner_radius: int = 0,
patch_size: int = 1024,
overlap: int = 16,
exclude_nopositive: bool = False,
exclude_nan: bool = True,
save_matching_scores: bool = False,
)
Preprocess Sentinel-2 data for training.
This function preprocesses Sentinel-2 scenes matched to Planet footprints into a training-ready format by creating fixed-size patches and storing them in a zarr array for efficient random access during training. All data is stored in a single zarr group with associated metadata.
The preprocessing matches Sentinel-2 scenes to Planet footprints based on temporal and spatial criteria, optionally aligns them spatially to Planet data, and creates patches of the specified size. The data is stored as: - A zarr group containing 'x' (input data) and 'y' (labels) arrays - A geopandas dataframe with metadata including region, position, and label statistics - A configuration file with preprocessing parameters
The x dataarray contains the input data with shape (n_patches, n_bands, patch_size, patch_size). The y dataarray contains the labels with shape (n_patches, patch_size, patch_size). Both dataarrays are chunked along the n_patches dimension with chunk size 1, resulting in each patch being stored in a separate file for super fast random access.
The metadata dataframe contains information about each patch including: - sample_id: Combined identifier for the S2 scene and Planet footprint - region: Administrative region name - geometry: Spatial extent of the patch - empty: Whether the patch contains positive labeled pixels - planet_id: Original Planet scene identifier - s2_id: Sentinel-2 scene identifier - Additional alignment and matching metadata
Through exclude_nopositive and exclude_nan, respective patches can be excluded from the final data.
A config.toml file is saved in the train_data_dir containing the configuration used for the
preprocessing. Additionally, a timestamp-based CLI configuration file is saved for reproducibility.
The final directory structure of train_data_dir will look like this:
train_data_dir/
├── config.toml
├── data.zarr/
│ ├── x/ # Input patches [n_patches, n_bands, patch_size, patch_size]
│ └── y/ # Label patches [n_patches, patch_size, patch_size]
├── metadata.parquet
├── matching-cache.json # Optional matching cache
├── matching-scores.parquet # Optional matching scores
└── {timestamp}.cli.toml
Parameters:
-
labels_dir(pathlib.Path) –The directory containing the labels and footprints / extents.
-
default_dirs(darts_utils.paths.DefaultPaths, default:darts_utils.paths.DefaultPaths()) –The default directories for DARTS. Defaults to a config filled with None.
-
train_data_dir(pathlib.Path | None, default:None) –The "output" directory where the tensors are written to. If None, will use the default training data directory based on the DARTS paths. Defaults to None.
-
arcticdem_dir(pathlib.Path | None, default:None) –The directory containing the ArcticDEM data (the datacube and the extent files). Will be created and downloaded if it does not exist. If None, will use the default auxiliary directory based on the DARTS paths. Defaults to None.
-
tcvis_dir(pathlib.Path | None, default:None) –The directory containing the TCVis data. If None, will use the default TCVis directory based on the DARTS paths. Defaults to None.
-
admin_dir(pathlib.Path | None, default:None) –The directory containing the admin files. If None, will use the default auxiliary directory based on the DARTS paths. Defaults to None.
-
planet_data_dir(pathlib.Path, default:None) –The directory containing the Planet scenes and orthotiles. The planet data is used to align the Sentinel-2 data to the Planet data, spatially. Can be set to None if no alignment is wished. Defaults to None.
-
raw_data_store(pathlib.Path | None, default:None) –The directory to use for storing the raw Sentinel 2 data locally. If None, will use the default raw data directory based on the DARTS paths. Defaults to None.
-
no_raw_data_store(bool, default:False) –If True, will not store any raw data locally. This overrides the
raw_data_storeparameter. Defaults to False. -
preprocess_cache(pathlib.Path | None, default:None) –The directory to store the preprocessed data. If None, will neither use nor store preprocessed data. Defaults to None.
-
matching_cache(pathlib.Path | None, default:None) –The path to a file where the matchings are stored. Note: this is different from the matching scores. If None, will query the sentinel 2 STAC and calculate the best match based on the criteria. Defaults to None.
-
no_matching_cache(bool, default:False) –If True, will not use or store any matching cache. This overrides the
matching_cacheparameter. Defaults to False. -
force_preprocess(bool, default:False) –Whether to force the preprocessing of the data. Defaults to False.
-
append(bool, default:True) –Whether to append the data to the existing data. Defaults to True.
-
device(typing.Literal['cuda', 'cpu'] | int, default:None) –The device to run the model on. If "cuda" take the first device (0), if int take the specified device. If "auto" try to automatically select a free GPU (<50% memory usage). Defaults to "cuda" if available, else "cpu".
-
ee_project(str, default:None) –The Earth Engine project ID or number to use. May be omitted if project is defined within persistent API credentials obtained via
earthengine authenticate. -
ee_use_highvolume(bool, default:True) –Whether to use the high volume server (https://earthengine-highvolume.googleapis.com). Defaults to True.
-
matching_day_range(int, default:7) –The day range to use for matching S2 scenes to Planet footprints. Defaults to 7.
-
matching_max_cloud_cover(int, default:10) –The maximum cloud cover percentage to use for matching S2 scenes to Planet footprints. Defaults to 10.
-
matching_min_intersects(float, default:0.7) –The minimum intersection percentage to use for matching S2 scenes to Planet footprints. Defaults to 0.7.
-
tpi_outer_radius(int, default:100) –The outer radius of the annulus kernel for the tpi calculation in m. Defaults to 100m.
-
tpi_inner_radius(int, default:0) –The inner radius of the annulus kernel for the tpi calculation in m. Defaults to 0.
-
patch_size(int, default:1024) –The patch size to use for inference. Defaults to 1024.
-
overlap(int, default:16) –The overlap to use for inference. Defaults to 16.
-
exclude_nopositive(bool, default:False) –Whether to exclude patches where the labels do not contain positives. Defaults to False.
-
exclude_nan(bool, default:True) –Whether to exclude patches where the input data has nan values. Defaults to True.
-
save_matching_scores(bool, default:False) –Whether to save the matching scores. Defaults to False.
Source code in darts/src/darts/training/preprocess_sentinel2_v2.py
117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 | |
shell
¶
start_app
¶
Wrapp to start the app.