darts_postprocessing
¶
Postprocessing steps for the DARTS dataset.
binarize
¶
binarize(
probs: xarray.DataArray,
threshold: float,
min_object_size: int,
mask: xarray.DataArray,
device: typing.Literal["cuda", "cpu"] | int,
) -> xarray.DataArray
Binarize segmentation probabilities with quality-based filtering.
This function converts continuous probability predictions to binary segmentation masks by applying thresholding, removing edge artifacts, and filtering small objects.
Processing steps
- Threshold probabilities (prob > threshold → 1, else → 0)
- Identify and remove objects touching invalid regions or tile edges
- Remove objects smaller than min_object_size
Parameters:
-
probs(xarray.DataArray) –Segmentation probabilities (float32, range [0-1]). NaN values are treated as 0 (no detection).
-
threshold(float) –Probability threshold for binarization. Typical values: 0.3-0.7.
-
min_object_size(int) –Minimum object size in pixels. Objects with fewer pixels are removed.
-
mask(xarray.DataArray) –Quality mask (uint8, 1=valid, 0=invalid). Objects overlapping invalid regions are removed to avoid artifacts at data boundaries.
-
device(typing.Literal['cuda', 'cpu'] | int) –Device for processing. GPU acceleration recommended for large tiles.
Returns:
Note
Edge and boundary handling: - Objects touching tile edges or invalid data regions (mask==0) are completely removed - This prevents partial objects at boundaries from being misclassified - Uses connected component analysis (8-connectivity) to identify touching objects
GPU acceleration: - Object removal operations are significantly faster on GPU - Automatically handles dask arrays by persisting before GPU operations
Example
Binarize with standard parameters:
from darts_postprocessing import binarize, erode_mask
# Erode quality mask first
eroded_mask = erode_mask(
tile["quality_data_mask"] == 2, # High quality only
size=10,
device="cuda"
)
# Binarize predictions
binary_mask = binarize(
probs=tile["probabilities"],
threshold=0.5,
min_object_size=32, # Remove objects < 32 pixels
mask=eroded_mask,
device="cuda"
)
Source code in darts-postprocessing/src/darts_postprocessing/postprocess.py
127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 | |
erode_mask
¶
erode_mask(
mask: xarray.DataArray,
size: int,
device: typing.Literal["cuda", "cpu"] | int,
edge_size: int | None = None,
) -> xarray.DataArray
Erode a binary mask and invalidate edge regions.
This function applies morphological erosion to shrink valid regions in a mask and additionally sets a border region around the entire mask to invalid. This is useful for removing unreliable predictions near tile edges and data boundaries.
Parameters:
-
mask(xarray.DataArray) –Binary mask to erode (1=valid, 0=invalid). Will be converted to uint8.
-
size(int) –Radius of the disk structuring element for erosion in pixels. Also used as the width of the edge region to invalidate (unless edge_size is specified).
-
device(typing.Literal['cuda', 'cpu'] | int) –Device for processing. Use "cuda" for GPU acceleration, "cpu" for CPU processing, or an integer to specify a GPU device number.
-
edge_size(int | None, default:None) –Width of the edge region to set to invalid in pixels. If None, uses the
sizeparameter. Defaults to None.
Returns:
Note
GPU acceleration (requires cucim): - Significantly faster for large masks - Automatically falls back to CPU if cucim is not available - Handles both in-memory and dask arrays
The erosion operation shrinks valid regions by removing pixels within size distance
from invalid regions. Edge invalidation then sets the outermost edge_size pixels
on all sides to 0.
Example
Erode mask to remove edge effects:
Source code in darts-postprocessing/src/darts_postprocessing/postprocess.py
30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 | |
prepare_export
¶
prepare_export(
tile: xarray.Dataset,
bin_threshold: float = 0.5,
mask_erosion_size: int = 10,
min_object_size: int = 32,
quality_level: int
| typing.Literal[
"high_quality", "low_quality", "none"
] = 0,
ensemble_subsets: list[str] = [],
device: typing.Literal["cuda", "cpu"]
| int = darts_postprocessing.postprocess.DEFAULT_DEVICE,
edge_erosion_size: int | None = None,
) -> xarray.Dataset
Prepare segmentation results for export by applying quality filtering and binarization.
This is a wrapper function that orchestrates the complete postprocessing pipeline: mask erosion, probability masking, and binarization. It processes both ensemble-averaged predictions and individual model outputs if present.
Parameters:
-
tile(xarray.Dataset) –Input tile from inference containing: - probabilities (float32): Segmentation probabilities [0-1] - quality_data_mask (uint8): Quality mask (0=invalid, 1=low quality, 2=high quality) - probabilities-{subset} (float32): Optional individual model predictions
-
bin_threshold(float, default:0.5) –Probability threshold for binarization. Defaults to 0.5.
-
mask_erosion_size(int, default:10) –Erosion radius for quality mask in pixels. Also used for edge invalidation unless edge_erosion_size is specified. Defaults to 10.
-
min_object_size(int, default:32) –Minimum object size in pixels. Smaller objects are removed. Defaults to 32.
-
quality_level(int | typing.Literal['high_quality', 'low_quality', 'none'], default:0) –Quality threshold for masking. Maps to quality_data_mask values: - "high_quality" or 2: Only use high quality pixels - "low_quality" or 1: Use low and high quality pixels - "none" or 0: No quality masking Defaults to 0 (no masking).
-
ensemble_subsets(list[str], default:[]) –Names of individual models in the ensemble to process (e.g., ["with_tcvis", "without_tcvis"]). Defaults to [].
-
device(typing.Literal['cuda', 'cpu'] | int, default:darts_postprocessing.postprocess.DEFAULT_DEVICE) –Device for processing. Defaults to GPU if available.
-
edge_erosion_size(int | None, default:None) –Separate erosion width for tile edges in pixels. If None, uses mask_erosion_size. Defaults to None.
Returns:
-
xarray.Dataset–xr.Dataset: Input tile augmented with:
-
xarray.Dataset–Added data variables: - extent (uint8): Valid extent mask after erosion (1=valid, 0=invalid). Attributes: long_name="Extent of the segmentation" - binarized_segmentation (bool): Binary segmentation mask (True=object). Attributes: long_name="Binarized Segmentation" - binarized_segmentation-{subset} (bool): Binary masks for each ensemble subset (only if ensemble_subsets provided)
-
xarray.Dataset–Modified data variables: - probabilities (float32): Now masked to valid extent (NaN outside) - probabilities-{subset} (float32): Masked individual model predictions
Note
Processing pipeline: 1. Filter quality mask to specified quality_level 2. Erode quality mask to remove boundary artifacts 3. Create extent mask from eroded quality mask 4. Mask probabilities to valid extent (NaN invalid regions) 5. Binarize masked probabilities with threshold and min object size filter 6. Repeat steps 4-5 for each ensemble subset if provided
The extent variable defines the reliable prediction region and should be used to clip exported results.
Example
Complete postprocessing workflow:
from darts_postprocessing import prepare_export
# After ensemble inference
processed_tile = prepare_export(
tile=ensemble_result,
bin_threshold=0.5,
mask_erosion_size=10,
min_object_size=32,
quality_level="high_quality", # Only high quality pixels
ensemble_subsets=["with_tcvis", "without_tcvis"],
device="cuda"
)
# Now ready for export with:
# - processed_tile["binarized_segmentation"]: Main binary result
# - processed_tile["extent"]: Valid data extent
# - processed_tile["probabilities"]: Masked probabilities
Source code in darts-postprocessing/src/darts_postprocessing/postprocess.py
237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 | |