prepare_training
darts_segmentation.training.prepare_training
¶
Functions to prepare the training data for the segmentation model training.
Bands
¶
Bases: collections.UserList[darts_segmentation.utils.Band]
Wrapper for the list of bands.
factors
property
¶
names
property
¶
offsets
property
¶
__reduce__
¶
Source code in darts-segmentation/src/darts_segmentation/utils.py
filter
¶
filter(
band_names: list[str],
) -> darts_segmentation.utils.Bands
Filter the bands by name.
Parameters:
Returns:
-
Bands
(darts_segmentation.utils.Bands
) –The filtered Bands object.
Source code in darts-segmentation/src/darts_segmentation/utils.py
from_config
classmethod
¶
from_config(
config: dict[
typing.Literal[
"bands", "band_factors", "band_offsets"
],
list,
]
| dict[str, tuple[float, float]],
) -> darts_segmentation.utils.Bands
Create a Bands object from a config dictionary.
Parameters:
-
config
(dict
) –The config dictionary containing the band information. Expects config to be a dictionary with keys "bands", "band_factors" and "band_offsets", with the values to be lists of the same length.
Returns:
-
Bands
(darts_segmentation.utils.Bands
) –The Bands object.
Source code in darts-segmentation/src/darts_segmentation/utils.py
from_dict
classmethod
¶
Create a Bands object from a dictionary.
Parameters:
-
config
(dict[str, tuple[float, float]]
) –The dictionary containing the band information. Expects the keys to be the band names and the values to be tuples of (factor, offset). Example: {"band1": (1.0, 0.0), "band2": (2.0, 1.0)}
Returns:
-
Bands
(darts_segmentation.utils.Bands
) –The Bands object.
Source code in darts-segmentation/src/darts_segmentation/utils.py
to_config
¶
Convert the Bands object to a config dictionary.
Returns:
-
dict
(dict[typing.Literal['bands', 'band_factors', 'band_offsets'], list]
) –The config dictionary containing the band information.
Source code in darts-segmentation/src/darts_segmentation/utils.py
PatchCoords
dataclass
¶
Wrapper which stores the coordinate information of a patch in the original image.
from_tensor
classmethod
¶
from_tensor(
coords: torch.Tensor, patch_size: int
) -> (
darts_segmentation.training.prepare_training.PatchCoords
)
Create a PatchCoords object from the returned coord tensor of create_patches
.
Parameters:
-
coords
(torch.Tensor
) –The coordinates of the patch in the original image, from
create_patches
. -
patch_size
(int
) –The size of the patch.
Returns:
-
PatchCoords
(darts_segmentation.training.prepare_training.PatchCoords
) –The coordinates of the patch in the original image.
Source code in darts-segmentation/src/darts_segmentation/training/prepare_training.py
TrainDatasetBuilder
dataclass
¶
TrainDatasetBuilder(
train_data_dir: pathlib.Path,
patch_size: int,
overlap: int,
bands: darts_segmentation.utils.Bands,
exclude_nopositive: bool,
exclude_nan: bool,
mask_erosion_size: int,
device: typing.Literal["cuda", "cpu"] | int,
append: bool = False,
)
Helper class to create all necessary files for a DARTS training dataset.
__post_init__
¶
Initialize the TrainDatasetBuilder class based on provided dataclass params.
This will setup everything needed to add patches to the dataset:
- Create the train_data_dir if it does not exist
- Create an emtpy zarr store
- Initialize the metadata list
Source code in darts-segmentation/src/darts_segmentation/training/prepare_training.py
add_tile
¶
add_tile(
tile: xarray.Dataset,
labels: geopandas.GeoDataFrame,
region: str,
sample_id: str,
metadata: dict[str, str] | None = None,
)
Add a tile to the dataset.
Parameters:
-
tile
(xarray.Dataset
) –The input tile, containing preprocessed, harmonized data.
-
labels
(geopandas.GeoDataFrame
) –The labels to be used for training.
-
region
(str
) –The region of the tile.
-
sample_id
(str
) –The sample id of the tile.
-
metadata
(dict[str, str]
, default:None
) –Any metadata to be added to the metadata file. Will not be used for the training, but can be used for better debugging or reproducibility.
Source code in darts-segmentation/src/darts_segmentation/training/prepare_training.py
finalize
¶
Finalize the dataset by saving the metadata and the config file.
Parameters:
-
data_config
(dict[str, str]
, default:None
) –The data config to be saved in the config file. This should contain all the information needed to recreate the dataset. It will be saved as a toml file, along with the configuration provided in this dataclass.
Raises:
-
ValueError
–If no patches were found in the dataset.
Source code in darts-segmentation/src/darts_segmentation/training/prepare_training.py
create_patches
¶
create_patches(
tensor_tiles: torch.Tensor,
patch_size: int,
overlap: int,
return_coords: bool = False,
) -> torch.Tensor
Create patches from a tensor.
Parameters:
-
tensor_tiles
(torch.Tensor
) –The input tensor. Shape: (BS, C, H, W).
-
patch_size
(int
) –The size of the patches.
-
overlap
(int
) –The size of the overlap.
-
return_coords
(bool
, default:False
) –Whether to return the coordinates of the patches. Can be used for debugging. Defaults to False.
Returns:
Source code in darts-segmentation/src/darts_segmentation/utils.py
create_training_patches
¶
create_training_patches(
tile: xarray.Dataset,
labels: geopandas.GeoDataFrame,
bands: darts_segmentation.utils.Bands,
patch_size: int,
overlap: int,
exclude_nopositive: bool,
exclude_nan: bool,
device: typing.Literal["cuda", "cpu"] | int,
mask_erosion_size: int,
) -> tuple[
torch.tensor,
torch.tensor,
list[
darts_segmentation.training.prepare_training.PatchCoords
],
]
Create training patches from a tile and labels.
Parameters:
-
tile
(xarray.Dataset
) –The input tile, containing preprocessed, harmonized data.
-
labels
(geopandas.GeoDataFrame
) –The labels to be used for training.
-
bands
(darts_segmentation.utils.Bands
) –The bands to be used for training.
-
patch_size
(int
) –The size of the patches.
-
overlap
(int
) –The size of the overlap.
-
exclude_nopositive
(bool
) –Whether to exclude patches where the labels do not contain positives.
-
exclude_nan
(bool
) –Whether to exclude patches where the input data has nan values.
-
device
(typing.Literal['cuda', 'cpu'] | int
) –The device to use for the erosion.
-
mask_erosion_size
(int
) –The size of the disk to use for erosion.
Returns:
-
tuple[torch.tensor, torch.tensor, list[darts_segmentation.training.prepare_training.PatchCoords]]
–tuple[torch.tensor, torch.tensor, list[PatchCoords]]: A tuple containing the input, the labels and the coords. The input has the format (C, H, W), the labels (H, W).
Raises:
-
ValueError
–If a band is not found in the preprocessed data.
Source code in darts-segmentation/src/darts_segmentation/training/prepare_training.py
57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 |
|