Pipeline v2¶

The following document describes the pipeline which lead to the DARTS v1 dataset and will potentially lead to the DARTS v2 dataset. The orginial v1 dataset was not created with this repository, however, a newer, faster version of this pipeline is implemented here, which still uses the exact same pipeline-steps. Hence, it should be possible to re-create the DARTS v1 dataset with this repository. The implemented pipeline in this repository could potentially be used for future iterations and releases of the DARTS dataset.

In addition to the PLANET version of the DARTS dataset, the pipeline also supports Sentinel 2 imagery as optical input, resulting in a lower spatial resolution (10m instead of 3m).

Note

The v1 / v2 pipeline is also aliased by legacy pipeline somewhere deep in the code.

As of right now, four basic realizations of the v2 pipeline are implemented:

darts inference sentinel2-sequential
darts inference planet-sequential
darts inference sentinel2-ray
darts inference planet-ray

The sequential variants run without any parallelization framework, while the ray variants use Ray for distributed computing.

The pipeline currently consists of the following steps:

Load the optical and auxiliary data This step depends on the realization of the pipeline. Either darts_acquisition.load_planet_scene, darts_acquisition.load_gee_s2_sr_scene or darts_acquisition.load_cdse_s2_sr_scene. Also loads the masks if not loaded from GEE or CDSE: darts_acquisition.load_planet_masks, for the GEE and CDSE versions the masks are already included. For the auxiliary data: darts_acquisition.load_arcticdem and darts_acquisition.load_tcvis
Preprocess the optical data: darts_preprocessing.preprocess_v2.
Segment the optical data: darts_ensemble.EnsembleV1.segment_tile.
Postprocess the segmentation and make it ready for export: darts_postprocessing.prepare_export.
Export the data: darts_export.export_tile.

A very simplified version of this implementation looks like this:

from darts_acquisition import load_arcticdem, load_tcvis
from darts_acquisition.s2 import load_gee_s2_sr_scene
from darts_ensemble import EnsembleV1
from darts_export import export_tile
from darts_postprocessing import prepare_export
from darts_preprocessing import preprocess_v2

s2id = "20230701T194909_20230701T195350_T11XNA"
arcticdem_dir = "/path/to/arcticdem"
tcvis_dir = "/path/to/tcvis"
model_files = {
    "model1": "/path/to/model1.pt",
    "model2": "/path/to/model2.pt",
}
outpath = "/path/to/output"

ensemble = EnsembleV1(model_files)

tile = load_gee_s2_sr_scene(s2id)

arcticdem = load_arcticdem(
    tile.odc.geobox,
    arcticdem_dir,
    resolution=10,
    buffer=ceil(100 / 10 * sqrt(2)),
)

tcvis = load_tcvis(tile.odc.geobox, tcvis_dir)

tile = preprocess_v2(tile, arcticdem, tcvis, tpi_outer_radius=100, tpi_inner_radius=0)

tile = ensemble.segment_tile(tile, patch_size=1024, overlap=256, batch_size=8)

tile = prepare_export(tile, bin_threshold=0.5, mask_erosion_size=10, min_object_size=32)

export_tile(tile, outpath)

Minimal configuration example¶

For Sentinel-2 processing:

[darts]
ee-project = "ee-tobias-hoelzer"
model-files = ["./models/s2-tcvis-final-large_2025-02-12.ckpt"]
aoi-file = "./data/myaoi.gpkg"
start-date = "2024-07-01"
end-date = "2024-09-30"

Or using scene IDs directly:

[darts.inference.sentinel2-sequential]
model-files = ["./models/model1.pt", "./models/model2.pt"]
scene-ids = ["20230701T194909_20230701T195350_T11XNA", "20230704T195909_20230704T200350_T11XNA"]
output-data-dir = "./output"

Full configuration explanation¶

The pipeline can be configured via command-line arguments or a TOML configuration file. All parameters from the CLI help output are available as configuration options.

Core Parameters¶

model-files: List of model file paths for ensemble segmentation. If a single model is provided, ensemble features are disabled.
output-data-dir: Directory where processed outputs will be saved.
arcticdem-dir: Directory containing ArcticDEM datacube (auto-downloaded if missing).
tcvis-dir: Directory containing TCVis data.
device: Computing device ("cuda", "cpu", "auto", or specific GPU index).

Scene Selection (Sentinel-2)¶

Four mutually exclusive methods:

scene-ids: Direct list of Sentinel-2 scene IDs
scene-id-file: Path to file containing scene IDs (one per line)
tile-ids: List of Sentinel-2 tile IDs + filtering parameters
aoi-file: Shapefile with area of interest + filtering parameters

Filtering Parameters¶

start-date / end-date: Date range in YYYY-MM-DD format
max-cloud-cover: Maximum cloud cover percentage (default: 10)
max-snow-cover: Maximum snow cover percentage (default: 10)
months: List of months (1-12) for filtering
years: List of years for filtering

Processing Parameters¶

tpi-outer-radius: Outer radius for TPI calculation in meters (default: 100)
tpi-inner-radius: Inner radius for TPI calculation in meters (default: 0)
patch-size: Tile size for inference (default: 1024)
overlap: Overlap between patches (default: 256)
batch-size: Batch size for inference (default: 8)
reflection: Reflection padding for inference (default: 0)

Postprocessing Parameters¶

binarization-threshold: Threshold for converting probabilities to binary (default: 0.5)
mask-erosion-size: Size of disk for mask erosion (default: 10)
edge-erosion-size: Size for edge cropping, defaults to mask-erosion-size
min-object-size: Minimum object size in pixels (default: 32)
quality-level: Quality mask level: "high_quality", "low_quality", "none", or int 0-2 (default: 1)

Export Parameters¶

export-bands: List of bands to export (default: ["probabilities", "binarized", "polygonized", "extent", "thumbnail"])
Available: "probabilities", "binarized", "polygonized", "extent", "thumbnail", "optical", "dem", "tcvis", "metadata", or specific band names
write-model-outputs: Save individual model outputs in addition to ensemble (default: False)

Data Source Parameters (Sentinel-2)¶

raw-data-source: Source for S2 data: "cdse" or "gee" (default: "cdse")
raw-data-store: Directory for storing raw S2 data locally
no-raw-data-store: Disable local storage of raw data (default: False)
ee-project: Earth Engine project ID (required for GEE source)
ee-use-highvolume: Use EE high-volume server (default: True)

Operational Flags¶

overwrite: Overwrite existing outputs (default: False)
offline: Run without downloading data (default: False)
debug-data: Write intermediate debug data (default: False)

Usage Examples¶

Command Line¶

# Using an AOI file with date filtering
darts inference sentinel2-sequential \
    --aoi-file ./data/myaoi.gpkg \
    --start-date 2024-07-01 \
    --end-date 2024-09-30 \
    --model-files ./models/model1.pt ./models/model2.pt \
    --output-data-dir ./output

# Using specific scene IDs
darts inference sentinel2-sequential \
    --scene-ids 20230701T194909_20230701T195350_T11XNA \
    --model-files ./models/model.pt

# Planet pipeline
darts inference planet-sequential \
    --orthotiles-dir ./data/planet/orthotiles \
    --scenes-dir ./data/planet/scenes \
    --model-files ./models/planet_model.pt

Configuration File¶

Create a config.toml:

[darts.inference.sentinel2-sequential]
aoi-file = "./data/myaoi.gpkg"
start-date = "2024-07-01"
end-date = "2024-09-30"
max-cloud-cover = 15
max-snow-cover = 5
model-files = ["./models/model1.pt", "./models/model2.pt"]
output-data-dir = "./output"
patch-size = 1024
overlap = 256
batch-size = 8
export-bands = ["probabilities", "binarized", "polygonized", "thumbnail"]
overwrite = false

Run with:

darts --config-file config.toml inference sentinel2-sequential

Offline Processing¶

First, prepare data:

darts inference prep-data sentinel2 \
    --aoi-file ./data/myaoi.gpkg \
    --start-date 2024-07-01 \
    --end-date 2024-09-30 \
    --raw-data-store ./raw_data \
    --sentinel2-grid-dir ./aux_data/s2_grid

Then run offline:

darts inference sentinel2-sequential \
    --offline \
    --raw-data-store ./raw_data \
    --prep-data-scene-id-file ./raw_data/scene_ids.txt \
    --model-files ./models/model.pt