cv

darts_segmentation.training.cv ¶

Cross-validation implementation for binary segmentation.

available_devices `module-attribute` ¶

available_devices = multiprocessing.Queue()

logger `module-attribute` ¶

logger = logging.getLogger(
    __name__.replace("darts_", "darts.")
)

CrossValidationConfig `dataclass` ¶

CrossValidationConfig(
    n_folds: int | None = None,
    n_randoms: int = 3,
    scoring_metric: list[str] = lambda: [
        "val/JaccardIndex",
        "val/Recall",
    ](),
    multi_score_strategy: typing.Literal[
        "harmonic", "arithmetic", "geometric", "min"
    ] = "harmonic",
)

Configuration for cross-validation.

This is used to configure the cross-validation process. It is used by the cross_validation_smp function.

Attributes:

n_folds (int | None) –

Number of folds to perform in cross-validation. If None, all folds (total_folds) will be used. Defaults to None.
n_randoms (int) –

Number of random seeds to perform in cross-validation. First three seeds are always 42, 21, 69, further seeds are deterministic generated. Defaults to 3.
scoring_metric (list[str]) –

Metric(s) to use for scoring. Defaults to ["val/JaccardIndex", "val/Recall"].
multi_score_strategy (typing.Literal['harmonic', 'arithmetic', 'geometric', 'min']) –

Strategy for combining multiple metrics. Defaults to "harmonic".

multi_score_strategy `class-attribute` `instance-attribute` ¶

multi_score_strategy: typing.Literal[
    "harmonic", "arithmetic", "geometric", "min"
] = "harmonic"

n_folds `class-attribute` `instance-attribute` ¶

n_folds: int | None = None

n_randoms `class-attribute` `instance-attribute` ¶

n_randoms: int = 3

rng_seeds `property` ¶

rng_seeds: list[int]

Generate a list of seeds for cross-validation.

Returns:

list[int] –

list[int]: A list of seeds for cross-validation.
list[int] –

The first three seeds are always 42, 21, 69, further seeds are deterministically generated.

scoring_metric `class-attribute` `instance-attribute` ¶

scoring_metric: list[str] = dataclasses.field(
    default_factory=lambda: [
        "val/JaccardIndex",
        "val/Recall",
    ]
)

DataConfig `dataclass` ¶

DataConfig(
    train_data_dir: pathlib.Path = pathlib.Path("train"),
    data_split_method: typing.Literal[
        "random", "region", "sample"
    ]
    | None = None,
    data_split_by: list[str | float] | None = None,
    fold_method: typing.Literal[
        "kfold",
        "shuffle",
        "stratified",
        "region",
        "region-stratified",
    ] = "kfold",
    total_folds: int = 5,
    subsample: int | None = None,
)

Data related parameters for training.

Defines the script inputs for the training script and can be propagated by the cross-validation and tuning scripts.

Attributes:

train_data_dir (pathlib.Path) –

The path (top-level) to the data to be used for training. Expects a directory containing: 1. a zarr group called "data.zarr" containing a "x" and "y" array 2. a geoparquet file called "metadata.parquet" containing the metadata for the data. This metadata should contain at least the following columns: - "sample_id": The id of the sample - "region": The region the sample belongs to - "empty": Whether the image is empty The index should refer to the index of the sample in the zarr data. This directory should be created by a preprocessing script. Defaults to "train".
batch_size (int) –

Batch size for training and validation.
data_split_method (typing.Literal['random', 'region', 'sample'] | None) –

The method to use for splitting the data into a train and a test set. "random" will split the data randomly, the seed is always 42 and the test size can be specified by providing a list with a single a float between 0 and 1 to data_split_by This will be the fraction of the data to be used for testing. E.g. [0.2] will use 20% of the data for testing. "region" will split the data by one or multiple regions, which can be specified by providing a str or list of str to data_split_by. "sample" will split the data by sample ids, which can also be specified similar to "region". If None, no split is done and the complete dataset is used for both training and testing. The train split will further be split in the cross validation process. Defaults to None.
data_split_by (list[str | float] | None) –

Select by which regions/samples to split or the size of test set. Defaults to None.
fold_method (typing.Literal['kfold', 'shuffle', 'stratified', 'region', 'region-stratified']) –

Method for cross-validation split. Defaults to "kfold".
total_folds (int) –

Total number of folds in cross-validation. Defaults to 5.
subsample (int | None) –

If set, will subsample the dataset to this number of samples. This is useful for debugging and testing. Defaults to None.

data_split_by `class-attribute` `instance-attribute` ¶

data_split_by: list[str | float] | None = None

data_split_method `class-attribute` `instance-attribute` ¶

data_split_method: (
    typing.Literal["random", "region", "sample"] | None
) = None

fold_method `class-attribute` `instance-attribute` ¶

fold_method: typing.Literal[
    "kfold",
    "shuffle",
    "stratified",
    "region",
    "region-stratified",
] = "kfold"

subsample `class-attribute` `instance-attribute` ¶

subsample: int | None = None

total_folds `class-attribute` `instance-attribute` ¶

total_folds: int = 5

train_data_dir `class-attribute` `instance-attribute` ¶

train_data_dir: pathlib.Path = pathlib.Path('train')

DeviceConfig `dataclass` ¶

DeviceConfig(
    accelerator: typing.Literal[
        "auto", "cpu", "gpu", "mps", "tpu"
    ] = "auto",
    strategy: typing.Literal[
        "auto",
        "ddp",
        "ddp_fork",
        "ddp_notebook",
        "fsdp",
        "cv-parallel",
        "tune-parallel",
    ] = "auto",
    devices: list[int | str] = lambda: ["auto"](),
    num_nodes: int = 1,
)

Device and Distributed Strategy related parameters.

Attributes:

accelerator (typing.Literal['auto', 'cpu', 'gpu', 'mps', 'tpu']) –

Accelerator to use. Defaults to "auto".
strategy (typing.Literal['auto', 'ddp', 'ddp_fork', 'ddp_notebook', 'fsdp', 'cv-parallel', 'tune-parallel', 'cv-parallel', 'tune-parallel']) –

Distributed strategy to use. Defaults to "auto".
devices (list[int | str]) –

List of devices to use. Defaults to ["auto"].
num_nodes (int) –

Number of nodes to use for distributed training. Defaults to 1.

accelerator `class-attribute` `instance-attribute` ¶

accelerator: typing.Literal[
    "auto", "cpu", "gpu", "mps", "tpu"
] = "auto"

devices `class-attribute` `instance-attribute` ¶

devices: list[int | str] = dataclasses.field(
    default_factory=lambda: ["auto"]
)

lightning_strategy `property` ¶

lightning_strategy: str

Get the Lightning strategy for the current configuration.

Returns:

str ( str ) –

The Lightning strategy to use.

num_nodes `class-attribute` `instance-attribute` ¶

num_nodes: int = 1

strategy `class-attribute` `instance-attribute` ¶

strategy: typing.Literal[
    "auto",
    "ddp",
    "ddp_fork",
    "ddp_notebook",
    "fsdp",
    "cv-parallel",
    "tune-parallel",
] = "auto"

in_parallel ¶

in_parallel(
    device: int | str | None = None,
) -> darts_segmentation.training.train.DeviceConfig

Turn the current configuration into a suitable configuration for parallel training.

Parameters:

device (int | str | None, default: None ) –

The device to use for parallel training. If None, assumes non-multiprocessing parallel training and propagate all devices. Defaults to None.

Returns:

DeviceConfig ( darts_segmentation.training.train.DeviceConfig ) –

A new DeviceConfig instance that is suitable for parallel training.

Source code in darts-segmentation/src/darts_segmentation/training/train.py

def in_parallel(self, device: int | str | None = None) -> "DeviceConfig":
    """Turn the current configuration into a suitable configuration for parallel training.

    Args:
        device (int | str | None, optional): The device to use for parallel training.
            If None, assumes non-multiprocessing parallel training and propagate all devices.
            Defaults to None.

    Returns:
        DeviceConfig: A new DeviceConfig instance that is suitable for parallel training.

    """
    # In case of parallel training via multiprocessing, only few strategies are allowed.
    if self.strategy in ["ddp", "ddp_fork", "ddp_notebook", "fsdp"]:
        logger.warning("Using 'ddp_fork' instead of 'ddp' for multiprocessing.")
        return DeviceConfig(
            accelerator=self.accelerator,
            strategy="ddp_fork",  # Fork is the only supported strategy for multiprocessing
            devices=self.devices,
            num_nodes=self.num_nodes,
        )
    elif device is not None:
        return DeviceConfig(
            accelerator=self.accelerator,
            strategy=self.strategy,
            # If a device is specified, we assume that we want to run on a single device
            devices=[device],
            num_nodes=1,
        )
    else:
        return self

Hyperparameters `dataclass` ¶

Hyperparameters(
    model_arch: str = "Unet",
    model_encoder: str = "dpn107",
    model_encoder_weights: str | None = None,
    augment: list[
        darts_segmentation.training.augmentations.Augmentation
    ]
    | None = None,
    learning_rate: float = 0.001,
    gamma: float = 0.9,
    focal_loss_alpha: float | None = None,
    focal_loss_gamma: float = 2.0,
    batch_size: int = 8,
    bands: list[str] | None = None,
)

Hyperparameters for Cyclopts CLI.

Attributes:

model_arch (str) –

Architecture of the model to use.
model_encoder (str) –

Encoder type for the model.
model_encoder_weights (str | None) –

Weights for the encoder, if any.
augment (list[darts_segmentation.training.augmentations.Augmentation] | None) –

List of augmentations to apply.
learning_rate (float) –

Learning rate for training.
gamma (float) –

Decay factor for learning rate.
focal_loss_alpha (float | None) –

Alpha parameter for focal loss, if using.
focal_loss_gamma (float) –

Gamma parameter for focal loss.
batch_size (int) –

Batch size for training.
bands (list[str] | None) –

List of bands to use. Defaults to None.

augment `class-attribute` `instance-attribute` ¶

augment: (
    list[
        darts_segmentation.training.augmentations.Augmentation
    ]
    | None
) = None

bands `class-attribute` `instance-attribute` ¶

bands: list[str] | None = None

batch_size `class-attribute` `instance-attribute` ¶

batch_size: int = 8

focal_loss_alpha `class-attribute` `instance-attribute` ¶

focal_loss_alpha: float | None = None

focal_loss_gamma `class-attribute` `instance-attribute` ¶

focal_loss_gamma: float = 2.0

gamma `class-attribute` `instance-attribute` ¶

gamma: float = 0.9

learning_rate `class-attribute` `instance-attribute` ¶

learning_rate: float = 0.001

model_arch `class-attribute` `instance-attribute` ¶

model_arch: str = 'Unet'

model_encoder `class-attribute` `instance-attribute` ¶

model_encoder: str = 'dpn107'

model_encoder_weights `class-attribute` `instance-attribute` ¶

model_encoder_weights: str | None = None

LoggingConfig `dataclass` ¶

LoggingConfig(
    artifact_dir: pathlib.Path = pathlib.Path("artifacts"),
    log_every_n_steps: int = 10,
    check_val_every_n_epoch: int = 3,
    plot_every_n_val_epochs: int = 5,
    wandb_entity: str | None = None,
    wandb_project: str | None = None,
)

Logging related parameters for training.

Defines the script inputs for the training script and can be propagated by the cross-validation and tuning scripts.

Attributes:

artifact_dir (pathlib.Path) –

Top-level path to the training output directory. Will contain checkpoints and metrics. Defaults to Path("artifacts").
log_every_n_steps (int) –

Log every n steps. Defaults to 10.
check_val_every_n_epoch (int) –

Check validation every n epochs. Defaults to 3.
plot_every_n_val_epochs (int) –

Plot validation samples every n epochs. Defaults to 5.
wandb_entity (str | None) –

Weights and Biases Entity. Defaults to None.
wandb_project (str | None) –

Weights and Biases Project. Defaults to None.

artifact_dir `class-attribute` `instance-attribute` ¶

artifact_dir: pathlib.Path = pathlib.Path('artifacts')

check_val_every_n_epoch `class-attribute` `instance-attribute` ¶

check_val_every_n_epoch: int = 3

log_every_n_steps `class-attribute` `instance-attribute` ¶

log_every_n_steps: int = 10

plot_every_n_val_epochs `class-attribute` `instance-attribute` ¶

plot_every_n_val_epochs: int = 5

wandb_entity `class-attribute` `instance-attribute` ¶

wandb_entity: str | None = None

wandb_project `class-attribute` `instance-attribute` ¶

wandb_project: str | None = None

artifact_dir_at_cv ¶

artifact_dir_at_cv(tune_name: str | None) -> pathlib.Path

Nest the artifact directory for cross-validation runs.

Similar to parse_artifact_dir_for_run, but meant to be used by the cross-validation script.

Also creates the directory if it does not exist.

Parameters:

tune_name (str | None) –

Name of the tuning, if applicable.

Returns:

Path ( pathlib.Path ) –

The nested artifact directory path for cross-validation runs.

Source code in darts-segmentation/src/darts_segmentation/training/train.py

def artifact_dir_at_cv(self, tune_name: str | None) -> Path:
    """Nest the artifact directory for cross-validation runs.

    Similar to `parse_artifact_dir_for_run`, but meant to be used by the cross-validation script.

    Also creates the directory if it does not exist.

    Args:
        tune_name (str | None): Name of the tuning, if applicable.

    Returns:
        Path: The nested artifact directory path for cross-validation runs.

    """
    artifact_dir = self.artifact_dir / tune_name if tune_name else self.artifact_dir / "_cross_validations"
    artifact_dir.mkdir(parents=True, exist_ok=True)
    return artifact_dir

artifact_dir_at_run ¶

artifact_dir_at_run(
    cv_name: str | None, tune_name: str | None
) -> pathlib.Path

Nest the artifact directory to avoid cluttering the root directory.

For cv it is expected that the cv function already nests the artifact directory Meaning for cv the artifact_dir of this function should be either {artifact_dir}/_cross_validations/{cv_name} or {artifact_dir}/{tune_name}/{cv_name}

Also creates the directory if it does not exist.

Parameters:

cv_name (str | None) –

Name of the cross-validation.
tune_name (str | None) –

Name of the tuning.

Raises:

ValueError –

If tune_name is specified, but cv_name is not, which is invalid.

Returns:

Path ( pathlib.Path ) –

The nested artifact directory path.

Source code in darts-segmentation/src/darts_segmentation/training/train.py

def artifact_dir_at_run(self, cv_name: str | None, tune_name: str | None) -> Path:
    """Nest the artifact directory to avoid cluttering the root directory.

    For cv it is expected that the cv function already nests the artifact directory
    Meaning for cv the artifact_dir of this function should be either
    {artifact_dir}/_cross_validations/{cv_name} or {artifact_dir}/{tune_name}/{cv_name}

    Also creates the directory if it does not exist.

    Args:
        cv_name (str | None): Name of the cross-validation.
        tune_name (str | None): Name of the tuning.

    Raises:
        ValueError: If tune_name is specified, but cv_name is not, which is invalid.

    Returns:
        Path: The nested artifact directory path.

    """
    # Run only
    if cv_name is None and tune_name is None:
        artifact_dir = self.artifact_dir / "_runs"
    # Cross-validation only
    elif cv_name is not None and tune_name is None:
        artifact_dir = self.artifact_dir / "_cross_validations" / cv_name
    # Cross-validation and tuning
    elif cv_name is not None and tune_name is not None:
        artifact_dir = self.artifact_dir / tune_name / cv_name
    # Tuning only (invalid)
    else:
        raise ValueError(
            "Cannot parse artifact directory for cross-validation and tuning. "
            "Please specify either cv_name or tune_name, but not both."
        )
    artifact_dir.mkdir(parents=True, exist_ok=True)
    return artifact_dir

TrainRunConfig `dataclass` ¶

TrainRunConfig(
    name: str | None = None,
    cv_name: str | None = None,
    tune_name: str | None = None,
    fold: int = 0,
    random_seed: int = 42,
)

Run related parameters for training.

Defines the script inputs for the training script. Must be build by the cross-validation and tuning scripts.

Attributes:

name (str | None) –

Name of the run. If None is generated automatically. Defaults to None.
cv_name (str | None) –

Name of the cross-validation. Should only be specified by a cross-validation script. Defaults to None.
tune_name (str | None) –

Name of the tuning. Should only be specified by a tuning script. Defaults to None.
fold (int) –

Index of the current fold. Defaults to 0.
random_seed (int) –

Random seed for deterministic training. Defaults to 42.

cv_name `class-attribute` `instance-attribute` ¶

cv_name: str | None = None

fold `class-attribute` `instance-attribute` ¶

fold: int = 0

name `class-attribute` `instance-attribute` ¶

name: str | None = None

random_seed `class-attribute` `instance-attribute` ¶

random_seed: int = 42

tune_name `class-attribute` `instance-attribute` ¶

tune_name: str | None = None

TrainingConfig `dataclass` ¶

TrainingConfig(
    continue_from_checkpoint: pathlib.Path | None = None,
    max_epochs: int = 100,
    early_stopping_patience: int = 5,
    num_workers: int = 0,
)

Training related parameters for training.

Defines the script inputs for the training script and can be propagated by the cross-validation and tuning scripts.

Attributes:

continue_from_checkpoint (pathlib.Path | None) –

Path to a checkpoint to continue training from. Defaults to None.
max_epochs (int) –

Maximum number of epochs to train. Defaults to 100.
early_stopping_patience (int) –

Number of epochs to wait for improvement before stopping. Defaults to 5.
num_workers (int) –

Number of Dataloader workers. Defaults to 0.

continue_from_checkpoint `class-attribute` `instance-attribute` ¶

continue_from_checkpoint: pathlib.Path | None = None

early_stopping_patience `class-attribute` `instance-attribute` ¶

early_stopping_patience: int = 5

max_epochs `class-attribute` `instance-attribute` ¶

max_epochs: int = 100

num_workers `class-attribute` `instance-attribute` ¶

num_workers: int = 0

_ProcessInputs `dataclass` ¶

_ProcessInputs(
    current: int,
    total: int,
    seed: int,
    fold: int,
    cv: darts_segmentation.training.cv.CrossValidationConfig,
    run: darts_segmentation.training.train.TrainRunConfig,
    training_config: darts_segmentation.training.train.TrainingConfig,
    logging_config: darts_segmentation.training.train.LoggingConfig,
    data_config: darts_segmentation.training.train.DataConfig,
    device_config: darts_segmentation.training.train.DeviceConfig,
    hparams: darts_segmentation.training.hparams.Hyperparameters,
)

current `instance-attribute` ¶

current: int

cv `instance-attribute` ¶

cv: darts_segmentation.training.cv.CrossValidationConfig

data_config `instance-attribute` ¶

data_config: darts_segmentation.training.train.DataConfig

device_config `instance-attribute` ¶

device_config: (
    darts_segmentation.training.train.DeviceConfig
)

fold `instance-attribute` ¶

fold: int

hparams `instance-attribute` ¶

hparams: darts_segmentation.training.hparams.Hyperparameters

logging_config `instance-attribute` ¶

logging_config: (
    darts_segmentation.training.train.LoggingConfig
)

run `instance-attribute` ¶

run: darts_segmentation.training.train.TrainRunConfig

seed `instance-attribute` ¶

seed: int

total `instance-attribute` ¶

total: int

training_config `instance-attribute` ¶

training_config: (
    darts_segmentation.training.train.TrainingConfig
)

_ProcessOutputs `dataclass` ¶

_ProcessOutputs(run_info: dict)

run_info `instance-attribute` ¶

run_info: dict

_run_training ¶

_run_training(
    inp: darts_segmentation.training.cv._ProcessInputs,
)

Source code in darts-segmentation/src/darts_segmentation/training/cv.py

def _run_training(inp: _ProcessInputs):
    # Wrapper function for handling parallel multiprocessing training runs.
    import torch

    from darts_segmentation.training.scoring import check_score_is_unstable
    from darts_segmentation.training.train import train_smp

    # Setup device configuration: If strategy is "cv-parallel" expect a mp scenario:
    # Wait for a device to become available.
    # Otherwise, expect a serial scenario, where the devices and strategy are set by the user.
    is_parallel = inp.device_config.strategy == "cv-parallel"
    if is_parallel:
        device = available_devices.get()
        device_config = inp.device_config.in_parallel(device)
        logger.debug(f"Starting run {inp.run.name} ({inp.current + 1}/{inp.total}) on device {device}.")
    else:
        device = None
        device_config = inp.device_config.in_parallel()
        logger.debug(f"Starting run {inp.run.name} ({inp.current + 1}/{inp.total}).")

    try:
        tick_rstart = time.time()
        trainer = train_smp(
            run=inp.run,
            training_config=inp.training_config,
            data_config=inp.data_config,
            device_config=device_config,
            hparams=inp.hparams,
            logging_config=inp.logging_config,
        )
        tick_rend = time.time()

        run_info = {
            "run_name": inp.run.name,
            "run_id": trainer.lightning_module.hparams["run_id"],
            "seed": inp.seed,
            "fold": inp.fold,
            "duration": tick_rend - tick_rstart,
        }
        for metric, value in trainer.logged_metrics.items():
            run_info[metric] = value.item() if isinstance(value, torch.Tensor) else value
        if trainer.checkpoint_callback:
            run_info["checkpoint"] = trainer.checkpoint_callback.best_model_path
        run_info["is_unstable"] = check_score_is_unstable(run_info, inp.cv.scoring_metric)

        logger.debug(f"{run_info=}")
        output = _ProcessOutputs(run_info=run_info)
    finally:
        # If we are in parallel mode, we need to return the device to the queue.
        if is_parallel:
            logger.debug(f"Free device {device} for cv {inp.run.name}")
            available_devices.put(device)
    return output

cross_validation_smp ¶

cross_validation_smp(
    *,
    name: str | None = None,
    tune_name: str | None = None,
    cv: darts_segmentation.training.cv.CrossValidationConfig = darts_segmentation.training.cv.CrossValidationConfig(),
    training_config: darts_segmentation.training.train.TrainingConfig = darts_segmentation.training.train.TrainingConfig(),
    data_config: darts_segmentation.training.train.DataConfig = darts_segmentation.training.train.DataConfig(),
    device_config: darts_segmentation.training.train.DeviceConfig = darts_segmentation.training.train.DeviceConfig(),
    hparams: darts_segmentation.training.hparams.Hyperparameters = darts_segmentation.training.hparams.Hyperparameters(),
    logging_config: darts_segmentation.training.train.LoggingConfig = darts_segmentation.training.train.LoggingConfig(),
)

Perform cross-validation for a model with given hyperparameters.

Please see https://smp.readthedocs.io/en/latest/index.html for model configurations of architecture and encoder.

Please also consider reading our training guide (docs/guides/training.md).

This cross-validation function is designed to evaluate the performance of a single model configuration. It can be used by a tuning script to tune hyperparameters. It calls the training function, hence most functionality is the same as the training function. In general, it does perform this:

for seed in seeds:
    for fold in folds:
        train_model(seed=seed, fold=fold, ...)

and calculates a score from the results.

To specify on which metric(s) the score is calculated, the scoring_metric parameter can be specified. Each score can be provided by either ":higher" or ":lower" to indicate the direction of the metrics. This allows to correctly combine multiple metrics by doing 1/metric before calculation if a metric is ":lower". If no direction is provided, it is assumed to be ":higher". Has no real effect on the single score calculation, since only the mean is calculated there.

In a multi-score setting, the score is calculated by combine-then-reduce the metrics. Meaning that first for each fold the metrics are combined using the specified strategy, and then the results are reduced via mean. Please refer to the documentation to understand the different multi-score strategies.

If one of the metrics of any of the runs contains NaN, Inf, -Inf or is 0 the score is reported to be "unstable".

Artifacts are stored under {artifact_dir}/{tune_name} for tunes (meaning if tune_name is not None) else {artifact_dir}/_cross_validation.

You can specify the frequency on how often logs will be written and validation will be performed. - log_every_n_steps specifies how often train-logs will be written. This does not affect validation. - check_val_every_n_epoch specifies how often validation will be performed. This will also affect early stopping. - early_stopping_patience specifies how many epochs to wait for improvement before stopping. In epochs, this would be check_val_every_n_epoch * early_stopping_patience. - plot_every_n_val_epochs specifies how often validation samples will be plotted. Since plotting is quite costly, you can reduce the frequency. Works similar like early stopping. In epochs, this would be check_val_every_n_epoch * plot_every_n_val_epochs. Example: There are 400 training samples and the batch size is 2, resulting in 200 training steps per epoch. If log_every_n_steps is set to 50 then the training logs and metrics will be logged 4 times per epoch. If check_val_every_n_epoch is set to 5 then validation will be performed every 5 epochs. If plot_every_n_val_epochs is set to 2 then validation samples will be plotted every 10 epochs. If early_stopping_patience is set to 3 then early stopping will be performed after 15 epochs without improvement.

The data structure of the training data expects the "preprocessing" step to be done beforehand, which results in the following data structure:

preprocessed-data/ # the top-level directory
├── config.toml
├── data.zarr/ # this zarr group contains the dataarrays x and y
├── metadata.parquet # this contains information necessary to split the data into train, val, and test sets.
└── labels.geojson

Parameters:

name (str | None, default: None ) –

Name of the cross-validation. If None, a name is generated automatically. Defaults to None.
tune_name (str | None, default: None ) –

Name of the tuning. Should only be specified by a tuning script. Defaults to None.
cv (darts_segmentation.training.cv.CrossValidationConfig, default: darts_segmentation.training.cv.CrossValidationConfig() ) –

Configuration for cross-validation.
training_config (darts_segmentation.training.train.TrainingConfig, default: darts_segmentation.training.train.TrainingConfig() ) –

Configuration for the training.
data_config (darts_segmentation.training.train.DataConfig, default: darts_segmentation.training.train.DataConfig() ) –

Configuration for the data.
device_config (darts_segmentation.training.train.DeviceConfig, default: darts_segmentation.training.train.DeviceConfig() ) –

Configuration for the devices to use.
hparams (darts_segmentation.training.hparams.Hyperparameters, default: darts_segmentation.training.hparams.Hyperparameters() ) –

Hyperparameters for the training.
logging_config (darts_segmentation.training.train.LoggingConfig, default: darts_segmentation.training.train.LoggingConfig() ) –

Logging configuration.

Returns:

–

tuple[float, bool, pd.DataFrame]: A single score, a boolean indicating if the score is unstable, and a DataFrame containing run info (seed, fold, metrics, duration, checkpoint)

Raises:

ValueError –

If no runs were performed, meaning the configuration is invalid or no data was found.

Source code in darts-segmentation/src/darts_segmentation/training/cv.py

def cross_validation_smp(
    *,
    name: str | None = None,
    tune_name: str | None = None,
    cv: CrossValidationConfig = CrossValidationConfig(),
    training_config: TrainingConfig = TrainingConfig(),
    data_config: DataConfig = DataConfig(),
    device_config: DeviceConfig = DeviceConfig(),
    hparams: Hyperparameters = Hyperparameters(),
    logging_config: LoggingConfig = LoggingConfig(),
):
    """Perform cross-validation for a model with given hyperparameters.

    Please see https://smp.readthedocs.io/en/latest/index.html for model configurations of architecture and encoder.

    Please also consider reading our training guide (docs/guides/training.md).

    This cross-validation function is designed to evaluate the performance of a single model configuration.
    It can be used by a tuning script to tune hyperparameters.
    It calls the training function, hence most functionality is the same as the training function.
    In general, it does perform this:

    ```py
    for seed in seeds:
        for fold in folds:
            train_model(seed=seed, fold=fold, ...)
    ```

    and calculates a score from the results.

    To specify on which metric(s) the score is calculated, the `scoring_metric` parameter can be specified.
    Each score can be provided by either ":higher" or ":lower" to indicate the direction of the metrics.
    This allows to correctly combine multiple metrics by doing 1/metric before calculation if a metric is ":lower".
    If no direction is provided, it is assumed to be ":higher".
    Has no real effect on the single score calculation, since only the mean is calculated there.

    In a multi-score setting, the score is calculated by combine-then-reduce the metrics.
    Meaning that first for each fold the metrics are combined using the specified strategy,
    and then the results are reduced via mean.
    Please refer to the documentation to understand the different multi-score strategies.

    If one of the metrics of any of the runs contains NaN, Inf, -Inf or is 0 the score is reported to be "unstable".

    Artifacts are stored under `{artifact_dir}/{tune_name}` for tunes (meaning if `tune_name` is not None)
    else `{artifact_dir}/_cross_validation`.

    You can specify the frequency on how often logs will be written and validation will be performed.
        - `log_every_n_steps` specifies how often train-logs will be written. This does not affect validation.
        - `check_val_every_n_epoch` specifies how often validation will be performed.
            This will also affect early stopping.
        - `early_stopping_patience` specifies how many epochs to wait for improvement before stopping.
            In epochs, this would be `check_val_every_n_epoch * early_stopping_patience`.
        - `plot_every_n_val_epochs` specifies how often validation samples will be plotted.
            Since plotting is quite costly, you can reduce the frequency. Works similar like early stopping.
            In epochs, this would be `check_val_every_n_epoch * plot_every_n_val_epochs`.
    Example: There are 400 training samples and the batch size is 2, resulting in 200 training steps per epoch.
    If `log_every_n_steps` is set to 50 then the training logs and metrics will be logged 4 times per epoch.
    If `check_val_every_n_epoch` is set to 5 then validation will be performed every 5 epochs.
    If `plot_every_n_val_epochs` is set to 2 then validation samples will be plotted every 10 epochs.
    If `early_stopping_patience` is set to 3 then early stopping will be performed after 15 epochs without improvement.

    The data structure of the training data expects the "preprocessing" step to be done beforehand,
    which results in the following data structure:

    ```sh
    preprocessed-data/ # the top-level directory
    ├── config.toml
    ├── data.zarr/ # this zarr group contains the dataarrays x and y
    ├── metadata.parquet # this contains information necessary to split the data into train, val, and test sets.
    └── labels.geojson
    ```

    Args:
        name (str | None, optional): Name of the cross-validation. If None, a name is generated automatically.
            Defaults to None.
        tune_name (str | None, optional): Name of the tuning. Should only be specified by a tuning script.
            Defaults to None.
        cv (CrossValidationConfig): Configuration for cross-validation.
        training_config (TrainingConfig): Configuration for the training.
        data_config (DataConfig): Configuration for the data.
        device_config (DeviceConfig): Configuration for the devices to use.
        hparams (Hyperparameters): Hyperparameters for the training.
        logging_config (LoggingConfig): Logging configuration.

    Returns:
        tuple[float, bool, pd.DataFrame]: A single score, a boolean indicating if the score is unstable,
            and a DataFrame containing run info (seed, fold, metrics, duration, checkpoint)

    Raises:
        ValueError: If no runs were performed, meaning the configuration is invalid or no data was found.

    """
    import pandas as pd
    from darts_utils.namegen import generate_counted_name

    from darts_segmentation.training.adp import _adp
    from darts_segmentation.training.scoring import score_from_runs

    tick_fstart = time.perf_counter()

    artifact_dir = logging_config.artifact_dir_at_cv(tune_name)
    cv_name = name or generate_counted_name(artifact_dir)
    artifact_dir = artifact_dir / cv_name
    artifact_dir.mkdir(parents=True, exist_ok=True)

    n_folds = cv.n_folds or data_config.total_folds

    logger.info(
        f"Starting cross-validation '{cv_name}' with data from {data_config.train_data_dir.resolve()}."
        f" Artifacts will be saved to {artifact_dir.resolve()}."
        f" Will run n_randoms*n_folds = {cv.n_randoms}*{n_folds} = {cv.n_randoms * n_folds} experiments."
    )

    seeds = cv.rng_seeds
    logger.debug(f"Using seeds: {seeds}")

    # Plan which runs to perform. These are later consumed based on the parallelization strategy.
    process_inputs: list[_ProcessInputs] = []
    for i, seed in enumerate(seeds):
        for fold in range(n_folds):
            current = i * len(seeds) + fold
            total = n_folds * len(seeds)
            run = TrainRunConfig(
                name=f"{cv_name}-run-f{fold}s{seed}",
                cv_name=cv_name,
                tune_name=tune_name,
                fold=fold,
                random_seed=seed,
            )
            process_inputs.append(
                _ProcessInputs(
                    current=current,
                    total=total,
                    seed=seed,
                    fold=fold,
                    cv=cv,
                    run=run,
                    training_config=training_config,
                    logging_config=logging_config,
                    data_config=data_config,
                    device_config=device_config,
                    hparams=hparams,
                )
            )

    run_infos = []
    # This function abstracts away common logic for running multiprocessing
    for inp, output in _adp(
        process_inputs=process_inputs,
        is_parallel=device_config.strategy == "cv-parallel",
        devices=device_config.devices,
        available_devices=available_devices,
        _run=_run_training,
    ):
        run_infos.append(output.run_info)

    if len(run_infos) == 0:
        raise ValueError(
            "No runs were performed. Please check your configuration and data."
            " If you are using a tuning script, make sure to specify the correct parameters."
        )

    logger.debug(f"{run_infos=}")
    score = score_from_runs(run_infos, cv.scoring_metric, cv.multi_score_strategy)

    run_infos = pd.DataFrame(run_infos)
    run_infos["score"] = score
    is_unstable = run_infos["is_unstable"].any()
    run_infos["score_is_unstable"] = is_unstable
    if is_unstable:
        logger.warning("Score is unstable, meaning at least one of the metrics is NaN, Inf, -Inf or 0.")
    run_infos.to_parquet(artifact_dir / "run_infos.parquet")
    logger.debug(f"Saved run infos to {artifact_dir / 'run_infos.parquet'}")

    tick_fend = time.perf_counter()
    logger.info(
        f"Finished cross-validation '{cv_name}' in {tick_fend - tick_fstart:.2f}s"
        f" with {score=:.4f} ({'stable' if not is_unstable else 'unstable'})."
    )

    return score, is_unstable, run_infos

cv

darts_segmentation.training.cv ¶

available_devices module-attribute ¶

logger module-attribute ¶

CrossValidationConfig dataclass ¶

multi_score_strategy class-attribute instance-attribute ¶

n_folds class-attribute instance-attribute ¶

n_randoms class-attribute instance-attribute ¶

rng_seeds property ¶

scoring_metric class-attribute instance-attribute ¶

DataConfig dataclass ¶

data_split_by class-attribute instance-attribute ¶

data_split_method class-attribute instance-attribute ¶

fold_method class-attribute instance-attribute ¶

subsample class-attribute instance-attribute ¶

total_folds class-attribute instance-attribute ¶

train_data_dir class-attribute instance-attribute ¶

DeviceConfig dataclass ¶

accelerator class-attribute instance-attribute ¶

devices class-attribute instance-attribute ¶

lightning_strategy property ¶

num_nodes class-attribute instance-attribute ¶

strategy class-attribute instance-attribute ¶

in_parallel ¶

Hyperparameters dataclass ¶

augment class-attribute instance-attribute ¶

bands class-attribute instance-attribute ¶

batch_size class-attribute instance-attribute ¶

focal_loss_alpha class-attribute instance-attribute ¶

focal_loss_gamma class-attribute instance-attribute ¶

gamma class-attribute instance-attribute ¶

learning_rate class-attribute instance-attribute ¶

model_arch class-attribute instance-attribute ¶

model_encoder class-attribute instance-attribute ¶

model_encoder_weights class-attribute instance-attribute ¶

LoggingConfig dataclass ¶

artifact_dir class-attribute instance-attribute ¶

check_val_every_n_epoch class-attribute instance-attribute ¶

log_every_n_steps class-attribute instance-attribute ¶

plot_every_n_val_epochs class-attribute instance-attribute ¶

wandb_entity class-attribute instance-attribute ¶

wandb_project class-attribute instance-attribute ¶

artifact_dir_at_cv ¶

artifact_dir_at_run ¶

TrainRunConfig dataclass ¶

cv_name class-attribute instance-attribute ¶

fold class-attribute instance-attribute ¶

name class-attribute instance-attribute ¶

random_seed class-attribute instance-attribute ¶

tune_name class-attribute instance-attribute ¶

TrainingConfig dataclass ¶

continue_from_checkpoint class-attribute instance-attribute ¶

early_stopping_patience class-attribute instance-attribute ¶

max_epochs class-attribute instance-attribute ¶

num_workers class-attribute instance-attribute ¶

_ProcessInputs dataclass ¶

current instance-attribute ¶

cv instance-attribute ¶

data_config instance-attribute ¶

device_config instance-attribute ¶

fold instance-attribute ¶

hparams instance-attribute ¶

logging_config instance-attribute ¶

run instance-attribute ¶

seed instance-attribute ¶

total instance-attribute ¶

training_config instance-attribute ¶

_ProcessOutputs dataclass ¶

run_info instance-attribute ¶

_run_training ¶

cross_validation_smp ¶

available_devices `module-attribute` ¶

logger `module-attribute` ¶

CrossValidationConfig `dataclass` ¶

multi_score_strategy `class-attribute` `instance-attribute` ¶

n_folds `class-attribute` `instance-attribute` ¶

n_randoms `class-attribute` `instance-attribute` ¶

rng_seeds `property` ¶

scoring_metric `class-attribute` `instance-attribute` ¶

DataConfig `dataclass` ¶

data_split_by `class-attribute` `instance-attribute` ¶

data_split_method `class-attribute` `instance-attribute` ¶

fold_method `class-attribute` `instance-attribute` ¶

subsample `class-attribute` `instance-attribute` ¶

total_folds `class-attribute` `instance-attribute` ¶

train_data_dir `class-attribute` `instance-attribute` ¶

DeviceConfig `dataclass` ¶

accelerator `class-attribute` `instance-attribute` ¶

devices `class-attribute` `instance-attribute` ¶

lightning_strategy `property` ¶

num_nodes `class-attribute` `instance-attribute` ¶

strategy `class-attribute` `instance-attribute` ¶

Hyperparameters `dataclass` ¶

augment `class-attribute` `instance-attribute` ¶

bands `class-attribute` `instance-attribute` ¶

batch_size `class-attribute` `instance-attribute` ¶

focal_loss_alpha `class-attribute` `instance-attribute` ¶

focal_loss_gamma `class-attribute` `instance-attribute` ¶

gamma `class-attribute` `instance-attribute` ¶

learning_rate `class-attribute` `instance-attribute` ¶

model_arch `class-attribute` `instance-attribute` ¶

model_encoder `class-attribute` `instance-attribute` ¶

model_encoder_weights `class-attribute` `instance-attribute` ¶

LoggingConfig `dataclass` ¶

artifact_dir `class-attribute` `instance-attribute` ¶

check_val_every_n_epoch `class-attribute` `instance-attribute` ¶

log_every_n_steps `class-attribute` `instance-attribute` ¶

plot_every_n_val_epochs `class-attribute` `instance-attribute` ¶

wandb_entity `class-attribute` `instance-attribute` ¶

wandb_project `class-attribute` `instance-attribute` ¶

TrainRunConfig `dataclass` ¶

cv_name `class-attribute` `instance-attribute` ¶

fold `class-attribute` `instance-attribute` ¶

name `class-attribute` `instance-attribute` ¶

random_seed `class-attribute` `instance-attribute` ¶

tune_name `class-attribute` `instance-attribute` ¶

TrainingConfig `dataclass` ¶

continue_from_checkpoint `class-attribute` `instance-attribute` ¶

early_stopping_patience `class-attribute` `instance-attribute` ¶

max_epochs `class-attribute` `instance-attribute` ¶

num_workers `class-attribute` `instance-attribute` ¶

_ProcessInputs `dataclass` ¶

current `instance-attribute` ¶

cv `instance-attribute` ¶

data_config `instance-attribute` ¶

device_config `instance-attribute` ¶

fold `instance-attribute` ¶

hparams `instance-attribute` ¶

logging_config `instance-attribute` ¶

run `instance-attribute` ¶

seed `instance-attribute` ¶

total `instance-attribute` ¶

training_config `instance-attribute` ¶

_ProcessOutputs `dataclass` ¶

run_info `instance-attribute` ¶