prepare_training
darts_segmentation.training.prepare_training
¶
Functions to prepare the training data for the segmentation model training.
PatchCoords
dataclass
¶
Wrapper which stores the coordinate information of a patch in the original image.
from_tensor
classmethod
¶
from_tensor(
coords: torch.Tensor, patch_size: int
) -> (
darts_segmentation.training.prepare_training.PatchCoords
)
Create a PatchCoords object from the returned coord tensor of create_patches
.
Parameters:
-
coords
(torch.Tensor
) –The coordinates of the patch in the original image, from
create_patches
. -
patch_size
(int
) –The size of the patch.
Returns:
-
PatchCoords
(darts_segmentation.training.prepare_training.PatchCoords
) –The coordinates of the patch in the original image.
Source code in darts-segmentation/src/darts_segmentation/training/prepare_training.py
TrainDatasetBuilder
dataclass
¶
TrainDatasetBuilder(
train_data_dir: pathlib.Path,
patch_size: int,
overlap: int,
bands: list[str],
exclude_nopositive: bool,
exclude_nan: bool,
device: typing.Literal["cuda", "cpu"] | int,
append: bool = False,
)
Helper class to create all necessary files for a DARTS training dataset.
__len__
¶
__post_init__
¶
Initialize the TrainDatasetBuilder class based on provided dataclass params.
This will setup everything needed to add patches to the dataset:
- Create the train_data_dir if it does not exist
- Create an emtpy zarr store
- Initialize the metadata list
Source code in darts-segmentation/src/darts_segmentation/training/prepare_training.py
add_tile
¶
add_tile(
tile: xarray.Dataset,
labels: geopandas.GeoDataFrame,
region: str,
sample_id: str,
extent: geopandas.GeoDataFrame | None = None,
metadata: dict[str, str] | None = None,
)
Add a tile to the dataset.
Parameters:
-
tile
(xarray.Dataset
) –The input tile, containing preprocessed, harmonized data.
-
labels
(geopandas.GeoDataFrame
) –The labels to be used for training.
-
region
(str
) –The region of the tile.
-
sample_id
(str
) –The sample id of the tile.
-
extent
(geopandas.GeoDataFrame | None
, default:None
) –The extent of the labels. The tile will be cropped to this extent. If None, the tile will not be cropped.
-
metadata
(dict[str, str]
, default:None
) –Any metadata to be added to the metadata file. Will not be used for the training, but can be used for better debugging or reproducibility.
Source code in darts-segmentation/src/darts_segmentation/training/prepare_training.py
finalize
¶
Finalize the dataset by saving the metadata and the config file.
Parameters:
-
data_config
(dict[str, str]
, default:None
) –The data config to be saved in the config file. This should contain all the information needed to recreate the dataset. It will be saved as a toml file, along with the configuration provided in this dataclass.
Raises:
-
ValueError
–If no patches were found in the dataset.
Source code in darts-segmentation/src/darts_segmentation/training/prepare_training.py
create_labels
¶
create_labels(
tile: xarray.Dataset,
labels: geopandas.GeoDataFrame,
extent: geopandas.GeoDataFrame | None = None,
)
Create labels from the tile and labels.
Parameters:
-
tile
(xarray.Dataset
) –The input tile, containing preprocessed, harmonized data.
-
labels
(geopandas.GeoDataFrame
) –The labels to be used for training.
-
extent
(geopandas.GeoDataFrame | None
, default:None
) –The extent of the labels. The tile will be cropped to this extent. If None, the tile will not be cropped.
Returns:
-
–
xr.DataArray: The rasterized labels.
Source code in darts-segmentation/src/darts_segmentation/training/prepare_training.py
create_patches
¶
create_patches(
tensor_tiles: torch.Tensor,
patch_size: int,
overlap: int,
return_coords: bool = False,
) -> torch.Tensor
Create patches from a tensor.
Parameters:
-
tensor_tiles
(torch.Tensor
) –The input tensor. Shape: (BS, C, H, W).
-
patch_size
(int
) –The size of the patches.
-
overlap
(int
) –The size of the overlap.
-
return_coords
(bool
, default:False
) –Whether to return the coordinates of the patches. Can be used for debugging. Defaults to False.
Returns:
Source code in darts-segmentation/src/darts_segmentation/inference.py
create_training_patches
¶
create_training_patches(
tile: xarray.Dataset,
labels: geopandas.GeoDataFrame,
extent: geopandas.GeoDataFrame | None,
bands: list[str],
patch_size: int,
overlap: int,
exclude_nopositive: bool,
exclude_nan: bool,
device: typing.Literal["cuda", "cpu"] | int,
) -> tuple[
torch.tensor,
torch.tensor,
list[
darts_segmentation.training.prepare_training.PatchCoords
],
]
Create training patches from a tile and labels.
Parameters:
-
tile
(xarray.Dataset
) –The input tile, containing preprocessed, harmonized data.
-
labels
(geopandas.GeoDataFrame
) –The labels to be used for training.
-
extent
(geopandas.GeoDataFrame | None
) –The extent of the labels. The tile will be cropped to this extent. If None, the tile will not be cropped.
-
bands
(list[str]
) –The bands to be used for training.
-
patch_size
(int
) –The size of the patches.
-
overlap
(int
) –The size of the overlap.
-
exclude_nopositive
(bool
) –Whether to exclude patches where the labels do not contain positives.
-
exclude_nan
(bool
) –Whether to exclude patches where the input data has nan values.
-
device
(typing.Literal['cuda', 'cpu'] | int
) –The device to use
Returns:
-
tuple[torch.tensor, torch.tensor, list[darts_segmentation.training.prepare_training.PatchCoords]]
–tuple[torch.tensor, torch.tensor, list[PatchCoords]]: A tuple containing the input, the labels and the coords. The input has the format (C, H, W), the labels (H, W).
Source code in darts-segmentation/src/darts_segmentation/training/prepare_training.py
102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 |
|