Skip to content

darts_acquisition

Acquisition of data from various sources for the DARTS dataset.

Functions:

  • download_admin_files

    Download the admin files for the regions.

  • load_arcticdem

    Load the ArcticDEM for the given geobox, fetch new data from the STAC server if necessary.

  • load_planet_masks

    Load the valid and quality data masks from a Planet scene.

  • load_planet_scene

    Load a PlanetScope satellite GeoTIFF file and return it as an xarray datset.

  • load_s2_from_gee

    Load a Sentinel 2 scene from Google Earth Engine and return it as an xarray dataset.

  • load_s2_from_stac

    Load a Sentinel 2 scene from the Copernicus STAC API and return it as an xarray dataset.

  • load_s2_masks

    Load the valid and quality data masks from a Sentinel 2 scene.

  • load_s2_scene

    Load a Sentinel 2 satellite GeoTIFF file and return it as an xarray datset.

  • load_tcvis

    Load the TCVIS for the given geobox, fetch new data from GEE if necessary.

  • parse_planet_type

    Parse the type of Planet data from the directory path.

  • parse_s2_tile_id

    Parse the Sentinel 2 tile ID from a file path.

Attributes:

__version__ module-attribute

__version__ = importlib.metadata.version('darts-nextgen')

download_admin_files

download_admin_files(admin_dir: pathlib.Path)

Download the admin files for the regions.

Files will be stored under [admin_dir]/adm1.shp and [admin_dir]/adm2.shp.

Parameters:

  • admin_dir (pathlib.Path) –

    The path to the admin files.

Source code in darts-acquisition/src/darts_acquisition/admin.py
@stopuhr.funkuhr("Downloading admin files", printer=logger.debug, print_kwargs=True)
def download_admin_files(admin_dir: Path):
    """Download the admin files for the regions.

    Files will be stored under [admin_dir]/adm1.shp and [admin_dir]/adm2.shp.

    Args:
        admin_dir (Path): The path to the admin files.

    """
    # Download the admin files
    admin_1_url = "https://github.com/wmgeolab/geoBoundaries/raw/main/releaseData/CGAZ/geoBoundariesCGAZ_ADM1.zip"
    admin_2_url = "https://github.com/wmgeolab/geoBoundaries/raw/main/releaseData/CGAZ/geoBoundariesCGAZ_ADM2.zip"

    admin_dir.mkdir(exist_ok=True, parents=True)

    logger.debug(f"Downloading {admin_1_url} to {admin_dir.resolve()}")
    _download_zip(admin_1_url, admin_dir)

    logger.debug(f"Downloading {admin_2_url} to {admin_dir.resolve()}")
    _download_zip(admin_2_url, admin_dir)

load_arcticdem

load_arcticdem(
    geobox: odc.geo.geobox.GeoBox,
    data_dir: pathlib.Path | str,
    resolution: darts_acquisition.arcticdem.RESOLUTIONS,
    buffer: int = 0,
    persist: bool = True,
) -> xarray.Dataset

Load the ArcticDEM for the given geobox, fetch new data from the STAC server if necessary.

Parameters:

  • geobox (odc.geo.geobox.GeoBox) –

    The geobox for which the tile should be loaded.

  • data_dir (pathlib.Path | str) –

    The directory where the ArcticDEM data is stored.

  • resolution (typing.Literal[2, 10, 32]) –

    The resolution of the ArcticDEM data in m.

  • buffer (int, default: 0 ) –

    The buffer around the projected (epsg:3413) geobox in pixels. Defaults to 0.

  • persist (bool, default: True ) –

    If the data should be persisted in memory. If not, this will return a Dask backed Dataset. Defaults to True.

Returns:

  • xarray.Dataset

    xr.Dataset: The ArcticDEM tile, with a buffer applied. Note: The buffer is applied in the arcticdem dataset's CRS, hence the orientation might be different. Final dataset is NOT matched to the reference CRS and resolution.

Warning

Geobox must be in a meter based CRS.

Usage

Since the API of the load_arcticdem is based on GeoBox, one can load a specific ROI based on an existing Xarray DataArray:

import xarray as xr
import odc.geo.xr

from darts_aquisition import load_arcticdem

# Assume "optical" is an already loaded s2 based dataarray

arcticdem = load_arcticdem(
    optical.odc.geobox,
    "/path/to/arcticdem-parent-directory",
    resolution=2,
    buffer=ceil(self.tpi_outer_radius / 2 * sqrt(2))
)

# Now we can for example match the resolution and extent of the optical data:
arcticdem = arcticdem.odc.reproject(optical.odc.geobox, resampling="cubic")

The buffer parameter is used to extend the region of interest by a certain amount of pixels. This comes handy when calculating e.g. the Topographic Position Index (TPI), which requires a buffer around the region of interest to remove edge effects.

Raises:

  • ValueError

    If the resolution is not supported.

Source code in darts-acquisition/src/darts_acquisition/arcticdem.py
@stopuhr.funkuhr("Loading ArcticDEM", printer=logger.debug, print_kwargs=True)
def load_arcticdem(
    geobox: GeoBox, data_dir: Path | str, resolution: RESOLUTIONS, buffer: int = 0, persist: bool = True
) -> xr.Dataset:
    """Load the ArcticDEM for the given geobox, fetch new data from the STAC server if necessary.

    Args:
        geobox (GeoBox): The geobox for which the tile should be loaded.
        data_dir (Path | str): The directory where the ArcticDEM data is stored.
        resolution (Literal[2, 10, 32]): The resolution of the ArcticDEM data in m.
        buffer (int, optional): The buffer around the projected (epsg:3413) geobox in pixels. Defaults to 0.
        persist (bool, optional): If the data should be persisted in memory.
            If not, this will return a Dask backed Dataset. Defaults to True.

    Returns:
        xr.Dataset: The ArcticDEM tile, with a buffer applied.
            Note: The buffer is applied in the arcticdem dataset's CRS, hence the orientation might be different.
            Final dataset is NOT matched to the reference CRS and resolution.

    Warning:
        Geobox must be in a meter based CRS.

    Usage:
        Since the API of the `load_arcticdem` is based on GeoBox, one can load a specific ROI based on an existing Xarray DataArray:

        ```python
        import xarray as xr
        import odc.geo.xr

        from darts_aquisition import load_arcticdem

        # Assume "optical" is an already loaded s2 based dataarray

        arcticdem = load_arcticdem(
            optical.odc.geobox,
            "/path/to/arcticdem-parent-directory",
            resolution=2,
            buffer=ceil(self.tpi_outer_radius / 2 * sqrt(2))
        )

        # Now we can for example match the resolution and extent of the optical data:
        arcticdem = arcticdem.odc.reproject(optical.odc.geobox, resampling="cubic")
        ```

        The `buffer` parameter is used to extend the region of interest by a certain amount of pixels.
        This comes handy when calculating e.g. the Topographic Position Index (TPI), which requires a buffer around the region of interest to remove edge effects.

    Raises:
        ValueError: If the resolution is not supported.

    """  # noqa: E501
    odc.stac.configure_rio(cloud_defaults=True, aws={"aws_unsigned": True})

    match resolution:
        case 2:
            accessor = smart_geocubes.ArcticDEM2m(data_dir)
        case 10:
            accessor = smart_geocubes.ArcticDEM10m(data_dir)
        case 32:
            accessor = smart_geocubes.ArcticDEM32m(data_dir)
        case _:
            raise ValueError(f"Resolution {resolution} not supported, only 2m, 10m and 32m are supported")

    accessor.assert_created()

    arcticdem = accessor.load(geobox, buffer=buffer, persist=persist)

    # Change dtype of the datamask to uint8 for later reproject_match
    arcticdem["datamask"] = arcticdem.datamask.astype("uint8")

    return arcticdem

load_planet_masks

load_planet_masks(
    fpath: str | pathlib.Path,
) -> xarray.Dataset

Load the valid and quality data masks from a Planet scene.

Parameters:

  • fpath (str | pathlib.Path) –

    The file path to the Planet scene from which to derive the masks.

Raises:

  • FileNotFoundError

    If no matching UDM-2 TIFF file is found in the specified path.

Returns:

  • xarray.Dataset

    xr.Dataset: A merged xarray Dataset containing two data masks: - 'valid_data_mask': A mask indicating valid (1) and no data (0). - 'quality_data_mask': A mask indicating high quality (1) and low quality (0).

Source code in darts-acquisition/src/darts_acquisition/planet.py
@stopuhr.funkuhr("Loading Planet masks", printer=logger.debug, print_kwargs=True)
def load_planet_masks(fpath: str | Path) -> xr.Dataset:
    """Load the valid and quality data masks from a Planet scene.

    Args:
        fpath (str | Path): The file path to the Planet scene from which to derive the masks.

    Raises:
        FileNotFoundError: If no matching UDM-2 TIFF file is found in the specified path.

    Returns:
        xr.Dataset: A merged xarray Dataset containing two data masks:
            - 'valid_data_mask': A mask indicating valid (1) and no data (0).
            - 'quality_data_mask': A mask indicating high quality (1) and low quality (0).

    """
    # Convert to Path object if a string is provided
    fpath = fpath if isinstance(fpath, Path) else Path(fpath)

    logger.debug(f"Loading data masks from {fpath.resolve()}")

    # Get imagepath
    udm_path = next(fpath.glob("*_udm2.tif"), None)
    if not udm_path:
        udm_path = next(fpath.glob("*_udm2_clip.tif"), None)
    if not udm_path:
        raise FileNotFoundError(f"No matching UDM-2 TIFF files found in {fpath.resolve()} (.glob('*_udm2.tif'))")

    # See udm classes here: https://developers.planet.com/docs/data/udm-2/
    da_udm = xr.open_dataarray(udm_path)

    invalids = da_udm.sel(band=8).fillna(0) != 0
    low_quality = da_udm.sel(band=[2, 3, 4, 5, 6]).max(axis=0) == 1
    high_quality = ~low_quality & ~invalids
    qa_ds = xr.Dataset(coords={c: da_udm.coords[c] for c in da_udm.coords})
    qa_ds["quality_data_mask"] = (
        xr.zeros_like(da_udm.sel(band=8)).where(invalids, 0).where(low_quality, 1).where(high_quality, 2)
    )
    qa_ds["quality_data_mask"].attrs = {
        "data_source": "planet",
        "long_name": "Quality data mask",
        "description": "0 = Invalid, 1 = Low Quality, 2 = High Quality",
    }
    return qa_ds

load_planet_scene

load_planet_scene(
    fpath: str | pathlib.Path,
) -> xarray.Dataset

Load a PlanetScope satellite GeoTIFF file and return it as an xarray datset.

Parameters:

  • fpath (str | pathlib.Path) –

    The path to the directory containing the TIFF files or a specific path to the TIFF file.

Returns:

Raises:

Source code in darts-acquisition/src/darts_acquisition/planet.py
@stopuhr.funkuhr("Loading Planet scene", printer=logger.debug, print_kwargs=True)
def load_planet_scene(fpath: str | Path) -> xr.Dataset:
    """Load a PlanetScope satellite GeoTIFF file and return it as an xarray datset.

    Args:
        fpath (str | Path): The path to the directory containing the TIFF files or a specific path to the TIFF file.

    Returns:
        xr.Dataset: The loaded dataset

    Raises:
        FileNotFoundError: If no matching TIFF file is found in the specified path.

    """
    # Convert to Path object if a string is provided
    fpath = fpath if isinstance(fpath, Path) else Path(fpath)

    # Check if the directory contains a PSOrthoTile or PSScene
    planet_type = parse_planet_type(fpath)
    logger.debug(f"Loading Planet PS {planet_type.capitalize()} from {fpath.resolve()}")

    # Get imagepath
    ps_image = next(fpath.glob("*_SR.tif"), None)
    if not ps_image:
        ps_image = next(fpath.glob("*_SR_clip.tif"), None)
    if not ps_image:
        raise FileNotFoundError(f"No matching TIFF files found in {fpath.resolve()} (.glob('*_SR.tif'))")

    # Define band names and corresponding indices
    planet_da = xr.open_dataarray(ps_image)

    # Create a dataset with the bands
    bands = ["blue", "green", "red", "nir"]
    ds_planet = (
        planet_da.fillna(0).rio.write_nodata(0).astype("uint16").assign_coords({"band": bands}).to_dataset(dim="band")
    )
    for var in ds_planet.variables:
        ds_planet[var].assign_attrs(
            {
                "long_name": f"PLANET {var.capitalize()}",
                "data_source": "planet",
                "planet_type": planet_type,
                "units": "Reflectance",
            }
        )
    ds_planet.attrs = {"tile_id": fpath.parent.stem if planet_type == "orthotile" else fpath.stem}
    return ds_planet

load_s2_from_gee

load_s2_from_gee(
    img: str | ee.Image,
    bands_mapping: dict = {
        "B2": "blue",
        "B3": "green",
        "B4": "red",
        "B8": "nir",
    },
    scale_and_offset: bool | tuple[float, float] = True,
    cache: pathlib.Path | None = None,
) -> xarray.Dataset

Load a Sentinel 2 scene from Google Earth Engine and return it as an xarray dataset.

Parameters:

  • img (str | ee.Image) –

    The Sentinel 2 image ID or the ee image object.

  • bands_mapping (dict[str, str], default: {'B2': 'blue', 'B3': 'green', 'B4': 'red', 'B8': 'nir'} ) –

    A mapping from bands to obtain. Will be renamed to the corresponding band names. Defaults to {"B2": "blue", "B3": "green", "B4": "red", "B8": "nir"}.

  • scale_and_offset (bool | tuple[float, float], default: True ) –

    Whether to apply the scale and offset to the bands. If a tuple is provided, it will be used as the (scale, offset) values with band * scale + offset. If True, use the default values of scale = 0.0001 and offset = 0, taken from ee_extra. Defaults to True.

  • cache (pathlib.Path | None, default: None ) –

    The path to the cache directory. If None, no caching will be done. Defaults to None.

Returns:

Source code in darts-acquisition/src/darts_acquisition/s2.py
@stopuhr.funkuhr("Loading Sentinel 2 scene from GEE", printer=logger.debug, print_kwargs=["img"])
def load_s2_from_gee(
    img: str | ee.Image,
    bands_mapping: dict = {"B2": "blue", "B3": "green", "B4": "red", "B8": "nir"},
    scale_and_offset: bool | tuple[float, float] = True,
    cache: Path | None = None,
) -> xr.Dataset:
    """Load a Sentinel 2 scene from Google Earth Engine and return it as an xarray dataset.

    Args:
        img (str | ee.Image): The Sentinel 2 image ID or the ee image object.
        bands_mapping (dict[str, str], optional): A mapping from bands to obtain.
            Will be renamed to the corresponding band names.
            Defaults to {"B2": "blue", "B3": "green", "B4": "red", "B8": "nir"}.
        scale_and_offset (bool | tuple[float, float], optional): Whether to apply the scale and offset to the bands.
            If a tuple is provided, it will be used as the (`scale`, `offset`) values with `band * scale + offset`.
            If True, use the default values of `scale` = 0.0001 and `offset` = 0, taken from ee_extra.
            Defaults to True.
        cache (Path | None, optional): The path to the cache directory. If None, no caching will be done.
            Defaults to None.

    Returns:
        xr.Dataset: The loaded dataset

    """
    if isinstance(img, str):
        s2id = img
        img = ee.Image(f"COPERNICUS/S2_SR_HARMONIZED/{s2id}")
    else:
        s2id = img.id().getInfo().split("/")[-1]
    logger.debug(f"Loading Sentinel 2 tile {s2id=} from GEE")

    if "SCL" not in bands_mapping.keys():
        bands_mapping["SCL"] = "scl"

    cache_file = None if cache is None else cache / f"gee-s2srh-{s2id}-{''.join(bands_mapping.keys())}.nc"
    if cache_file is not None and cache_file.exists():
        ds_s2 = xr.open_dataset(cache_file, engine="h5netcdf").set_coords("spatial_ref")
        ds_s2.load()
        logger.debug(f"Loaded {s2id=} from cache.")
    else:
        img = img.select(list(bands_mapping.keys()))
        ds_s2 = xr.open_dataset(
            img,
            engine="ee",
            geometry=img.geometry(),
            crs=img.select(0).projection().crs().getInfo(),
            scale=10,
        )
        ds_s2.attrs["time"] = str(ds_s2.time.values[0])
        ds_s2 = ds_s2.isel(time=0).drop_vars("time").rename({"X": "x", "Y": "y"}).transpose("y", "x")
        ds_s2 = ds_s2.odc.assign_crs(ds_s2.attrs["crs"])
        logger.debug(
            f"Found dataset with shape {ds_s2.sizes} for tile {s2id=}."
            "Start downloading data from GEE. This may take a while."
        )

        with stopuhr.stopuhr(f"Downloading data from GEE for {s2id=}", printer=logger.debug):
            ds_s2.load()
            if cache_file is not None:
                ds_s2.to_netcdf(cache_file, engine="h5netcdf")

    ds_s2 = ds_s2.rename_vars(bands_mapping)

    for var in ds_s2.data_vars:
        ds_s2[var].attrs["data_source"] = "s2-gee"
        ds_s2[var].attrs["long_name"] = f"Sentinel 2 {var.capitalize()}"
        ds_s2[var].attrs["units"] = "Reflectance"

    ds_s2 = convert_masks(ds_s2)

    # For some reason, there are some spatially random nan values in the data, not only at the borders
    # To workaround this, set all nan values to 0 and add this information to the quality_data_mask
    # This workaround is quite computational expensive, but it works for now
    # TODO: Find other solutions for this problem!
    with stopuhr.stopuhr(f"Fixing nan values in {s2id=}", printer=logger.debug):
        for band in set(bands_mapping.values()) - {"scl"}:
            ds_s2["quality_data_mask"] = xr.where(ds_s2[band].isnull(), 0, ds_s2["quality_data_mask"])
            ds_s2[band] = ds_s2[band].fillna(0)
            # Turn real nan values (scl is nan) into invalid data
            ds_s2[band] = ds_s2[band].where(~ds_s2["scl"].isnull())

    if scale_and_offset:
        if isinstance(scale_and_offset, tuple):
            scale, offset = scale_and_offset
        else:
            scale, offset = 0.0001, 0
        logger.debug(f"Applying {scale=} and {offset=} to {s2id=} optical data")
        for band in set(bands_mapping.values()) - {"scl"}:
            ds_s2[band] = ds_s2[band] * scale + offset

    ds_s2.attrs["s2_tile_id"] = s2id
    ds_s2.attrs["tile_id"] = s2id

    return ds_s2

load_s2_from_stac

load_s2_from_stac(
    s2id: str,
    bands_mapping: dict = {
        "B02_10m": "blue",
        "B03_10m": "green",
        "B04_10m": "red",
        "B08_10m": "nir",
    },
    scale_and_offset: bool | tuple[float, float] = True,
    cache: pathlib.Path | None = None,
) -> xarray.Dataset

Load a Sentinel 2 scene from the Copernicus STAC API and return it as an xarray dataset.

Parameters:

  • s2id (str) –

    The Sentinel 2 image ID.

  • bands_mapping (dict[str, str], default: {'B02_10m': 'blue', 'B03_10m': 'green', 'B04_10m': 'red', 'B08_10m': 'nir'} ) –

    A mapping from bands to obtain. Will be renamed to the corresponding band names. Defaults to {"B2": "blue", "B3": "green", "B4": "red", "B8": "nir"}.

  • scale_and_offset (bool | tuple[float, float], default: True ) –

    Whether to apply the scale and offset to the bands. If a tuple is provided, it will be used as the (scale, offset) values with band * scale + offset. If True, use the default values of scale = 0.0001 and offset = 0, taken from ee_extra. Defaults to True.

  • cache (pathlib.Path | None, default: None ) –

    The path to the cache directory. If None, no caching will be done. Defaults to None.

Returns:

Source code in darts-acquisition/src/darts_acquisition/s2.py
@stopuhr.funkuhr("Loading Sentinel 2 scene from STAC", printer=logger.debug, print_kwargs=["s2id"])
def load_s2_from_stac(
    s2id: str,
    bands_mapping: dict = {"B02_10m": "blue", "B03_10m": "green", "B04_10m": "red", "B08_10m": "nir"},
    scale_and_offset: bool | tuple[float, float] = True,
    cache: Path | None = None,
) -> xr.Dataset:
    """Load a Sentinel 2 scene from the Copernicus STAC API and return it as an xarray dataset.

    Args:
        s2id (str): The Sentinel 2 image ID.
        bands_mapping (dict[str, str], optional): A mapping from bands to obtain.
            Will be renamed to the corresponding band names.
            Defaults to {"B2": "blue", "B3": "green", "B4": "red", "B8": "nir"}.
        scale_and_offset (bool | tuple[float, float], optional): Whether to apply the scale and offset to the bands.
            If a tuple is provided, it will be used as the (`scale`, `offset`) values with `band * scale + offset`.
            If True, use the default values of `scale` = 0.0001 and `offset` = 0, taken from ee_extra.
            Defaults to True.
        cache (Path | None, optional): The path to the cache directory. If None, no caching will be done.
            Defaults to None.

    Returns:
        xr.Dataset: The loaded dataset

    """
    if "SCL_20m" not in bands_mapping.keys():
        bands_mapping["SCL_20m"] = "scl"

    catalog = Client.open("https://stac.dataspace.copernicus.eu/v1/")
    search = catalog.search(
        collections=["sentinel-2-l2a"],
        ids=[s2id],
    )

    cache_file = None if cache is None else cache / f"gee-s2srh-{s2id}-{''.join(bands_mapping.keys())}.nc"
    if cache_file is not None and cache_file.exists():
        ds_s2 = xr.open_dataset(cache_file, engine="h5netcdf").set_coords("spatial_ref")
        ds_s2.load()
        logger.debug(f"Loaded {s2id=} from cache.")
    else:
        ds_s2 = xr.open_dataset(
            search,
            engine="stac",
            backend_kwargs={"crs": "utm", "resolution": 10, "bands": list(bands_mapping.keys())},
        )
        ds_s2.attrs["time"] = str(ds_s2.time.values[0])
        ds_s2 = ds_s2.isel(time=0).drop_vars("time")
        logger.debug(
            f"Found a dataset with shape {ds_s2.sizes} for tile {s2id=}."
            "Start downloading data from STAC. This may take a while."
        )

        with stopuhr.stopuhr(f"Downloading data from STAC for {s2id=}", printer=logger.debug):
            # Need double loading since the first load transforms lazy-stac to dask and second actually downloads
            ds_s2.load().load()
            if cache_file is not None:
                ds_s2.to_netcdf(cache_file, engine="h5netcdf")

    ds_s2 = ds_s2.rename_vars(bands_mapping)
    for var in ds_s2.data_vars:
        ds_s2[var].attrs["data_source"] = "s2-stac"
        ds_s2[var].attrs["long_name"] = f"Sentinel 2 {var.capitalize()}"
        ds_s2[var].attrs["units"] = "Reflectance"

    ds_s2 = convert_masks(ds_s2)

    if scale_and_offset:
        if isinstance(scale_and_offset, tuple):
            scale, offset = scale_and_offset
        else:
            scale, offset = 0.0001, 0
        logger.debug(f"Applying {scale=} and {offset=} to {s2id=} optical data")
        for band in set(bands_mapping.values()) - {"scl"}:
            ds_s2[band] = ds_s2[band] * scale + offset

    ds_s2.attrs["s2_tile_id"] = s2id
    ds_s2.attrs["tile_id"] = s2id

    return ds_s2

load_s2_masks

load_s2_masks(
    fpath: str | pathlib.Path,
    reference_geobox: odc.geo.geobox.GeoBox,
) -> xarray.Dataset

Load the valid and quality data masks from a Sentinel 2 scene.

Parameters:

  • fpath (str | pathlib.Path) –

    The path to the directory containing the TIFF files.

  • reference_geobox (odc.geo.geobox.GeoBox) –

    The reference geobox to reproject, resample and crop the masks data to.

Returns:

  • xarray.Dataset

    xr.Dataset: A merged xarray Dataset containing two data masks: - 'valid_data_mask': A mask indicating valid (1) and no data (0). - 'quality_data_mask': A mask indicating high quality (1) and low quality (0).

Source code in darts-acquisition/src/darts_acquisition/s2.py
@stopuhr.funkuhr("Loading Sentinel 2 masks", printer=logger.debug, print_kwargs=["fpath"])
def load_s2_masks(fpath: str | Path, reference_geobox: GeoBox) -> xr.Dataset:
    """Load the valid and quality data masks from a Sentinel 2 scene.

    Args:
        fpath (str | Path): The path to the directory containing the TIFF files.
        reference_geobox (GeoBox): The reference geobox to reproject, resample and crop the masks data to.


    Returns:
        xr.Dataset: A merged xarray Dataset containing two data masks:
            - 'valid_data_mask': A mask indicating valid (1) and no data (0).
            - 'quality_data_mask': A mask indicating high quality (1) and low quality (0).

    """
    # Convert to Path object if a string is provided
    fpath = fpath if isinstance(fpath, Path) else Path(fpath)

    logger.debug(f"Loading data masks from {fpath.resolve()}")

    # TODO: SCL band in SR file
    try:
        scl_path = next(fpath.glob("*_SCL*.tif"))
    except StopIteration:
        logger.warning("Found no data quality mask (SCL). No masking will occur.")
        valid_data_mask = (odc.geo.xr.xr_zeros(reference_geobox, dtype="uint8") + 1).to_dataset(name="valid_data_mask")
        valid_data_mask.attrs = {"data_source": "s2", "long_name": "Valid Data Mask"}
        quality_data_mask = odc.geo.xr.xr_zeros(reference_geobox, dtype="uint8").to_dataset(name="quality_data_mask")
        quality_data_mask.attrs = {"data_source": "s2", "long_name": "Quality Data Mask"}
        qa_ds = xr.merge([valid_data_mask, quality_data_mask])
        return qa_ds

    # See scene classes here: https://custom-scripts.sentinel-hub.com/custom-scripts/sentinel-2/scene-classification/
    da_scl = xr.open_dataarray(scl_path)

    da_scl = da_scl.odc.reproject(reference_geobox, sampling="nearest")

    # Match crs
    da_scl = da_scl.rio.write_crs(reference_geobox.crs)

    da_scl = xr.Dataset({"scl": da_scl.sel(band=1).fillna(0).drop_vars("band").astype("uint8")})
    da_scl = convert_masks(da_scl)

    return da_scl

load_s2_scene

load_s2_scene(fpath: str | pathlib.Path) -> xarray.Dataset

Load a Sentinel 2 satellite GeoTIFF file and return it as an xarray datset.

Parameters:

  • fpath (str | pathlib.Path) –

    The path to the directory containing the TIFF files.

Returns:

Raises:

Source code in darts-acquisition/src/darts_acquisition/s2.py
@stopuhr.funkuhr("Loading Sentinel 2 scene from file", printer=logger.debug, print_kwargs=True)
def load_s2_scene(fpath: str | Path) -> xr.Dataset:
    """Load a Sentinel 2 satellite GeoTIFF file and return it as an xarray datset.

    Args:
        fpath (str | Path): The path to the directory containing the TIFF files.

    Returns:
        xr.Dataset: The loaded dataset

    Raises:
        FileNotFoundError: If no matching TIFF file is found in the specified path.

    """
    # Convert to Path object if a string is provided
    fpath = fpath if isinstance(fpath, Path) else Path(fpath)

    logger.debug(f"Loading Sentinel 2 scene from {fpath.resolve()}")

    # Get imagepath
    try:
        s2_image = next(fpath.glob("*_SR*.tif"))
    except StopIteration:
        raise FileNotFoundError(f"No matching TIFF files found in {fpath.resolve()} (.glob('*_SR*.tif'))")

    # Define band names and corresponding indices
    s2_da = xr.open_dataarray(s2_image)

    # Create a dataset with the bands
    bands = ["blue", "green", "red", "nir"]
    ds_s2 = s2_da.fillna(0).rio.write_nodata(0).astype("uint16").assign_coords({"band": bands}).to_dataset(dim="band")

    for var in ds_s2.data_vars:
        ds_s2[var].attrs["data_source"] = "s2"
        ds_s2[var].attrs["long_name"] = f"Sentinel 2 {var.capitalize()}"
        ds_s2[var].attrs["units"] = "Reflectance"

    planet_crop_id, s2_tile_id, tile_id = parse_s2_tile_id(fpath)
    ds_s2.attrs["planet_crop_id"] = planet_crop_id
    ds_s2.attrs["s2_tile_id"] = s2_tile_id
    ds_s2.attrs["tile_id"] = tile_id
    return ds_s2

load_tcvis

load_tcvis(
    geobox: odc.geo.geobox.GeoBox,
    data_dir: pathlib.Path | str,
    buffer: int = 0,
    persist: bool = True,
) -> xarray.Dataset

Load the TCVIS for the given geobox, fetch new data from GEE if necessary.

Parameters:

  • geobox (odc.geo.geobox.GeoBox) –

    The geobox to load the data for.

  • data_dir (pathlib.Path | str) –

    The directory to store the downloaded data for faster access for consecutive calls.

  • buffer (int, default: 0 ) –

    The buffer around the geobox in pixels. Defaults to 0.

  • persist (bool, default: True ) –

    If the data should be persisted in memory. If not, this will return a Dask backed Dataset. Defaults to True.

Returns:

Usage

Since the API of the load_tcvis is based on GeoBox, one can load a specific ROI based on an existing Xarray DataArray:

import xarray as xr
import odc.geo.xr

from darts_aquisition import load_tcvis

# Assume "optical" is an already loaded s2 based dataarray

tcvis = load_tcvis(
    optical.odc.geobox,
    "/path/to/tcvis-parent-directory",
)

# Now we can for example match the resolution and extent of the optical data:
tcvis = tcvis.odc.reproject(optical.odc.geobox, resampling="cubic")
Source code in darts-acquisition/src/darts_acquisition/tcvis.py
@stopuhr.funkuhr("Loading TCVIS", printer=logger.debug, print_kwargs=True)
def load_tcvis(
    geobox: GeoBox,
    data_dir: Path | str,
    buffer: int = 0,
    persist: bool = True,
) -> xr.Dataset:
    """Load the TCVIS for the given geobox, fetch new data from GEE if necessary.

    Args:
        geobox (GeoBox): The geobox to load the data for.
        data_dir (Path | str): The directory to store the downloaded data for faster access for consecutive calls.
        buffer (int, optional): The buffer around the geobox in pixels. Defaults to 0.
        persist (bool, optional): If the data should be persisted in memory.
            If not, this will return a Dask backed Dataset. Defaults to True.

    Returns:
        xr.Dataset: The TCVIS dataset.

    Usage:
        Since the API of the `load_tcvis` is based on GeoBox, one can load a specific ROI based on an existing Xarray DataArray:

        ```python
        import xarray as xr
        import odc.geo.xr

        from darts_aquisition import load_tcvis

        # Assume "optical" is an already loaded s2 based dataarray

        tcvis = load_tcvis(
            optical.odc.geobox,
            "/path/to/tcvis-parent-directory",
        )

        # Now we can for example match the resolution and extent of the optical data:
        tcvis = tcvis.odc.reproject(optical.odc.geobox, resampling="cubic")
        ```

    """  # noqa: E501
    accessor = smart_geocubes.TCTrend(data_dir, create_icechunk_storage=False)

    # We want to assume that the datacube is already created to be save in a multi-process environment
    accessor.assert_created()

    tcvis = accessor.load(geobox, buffer=buffer, persist=persist)

    # Rename to follow our conventions
    tcvis = tcvis.rename_vars(
        {
            "TCB_slope": "tc_brightness",
            "TCG_slope": "tc_greenness",
            "TCW_slope": "tc_wetness",
        }
    )

    return tcvis

parse_planet_type

parse_planet_type(
    fpath: pathlib.Path,
) -> typing.Literal["orthotile", "scene"]

Parse the type of Planet data from the directory path.

Parameters:

  • fpath (pathlib.Path) –

    The directory path to the Planet data.

Returns:

  • typing.Literal['orthotile', 'scene']

    Literal["orthotile", "scene"]: The type of Planet data.

Raises:

  • ValueError

    If the Planet data type cannot be parsed from the file path.

Source code in darts-acquisition/src/darts_acquisition/planet.py
def parse_planet_type(fpath: Path) -> Literal["orthotile", "scene"]:
    """Parse the type of Planet data from the directory path.

    Args:
        fpath (Path): The directory path to the Planet data.

    Returns:
        Literal["orthotile", "scene"]: The type of Planet data.

    Raises:
        ValueError: If the Planet data type cannot be parsed from the file path.

    """
    # Cases for Scenes:
    # - YYYYMMDD_HHMMSS_NN_XXXX
    # - YYYYMMDD_HHMMSS_XXXX

    # Cases for Orthotiles:
    # NNNNNNN/NNNNNNN_NNNNNNN_YYYY-MM-DD_XXXX
    # NNNNNNN_NNNNNNN_YYYY-MM-DD_XXXX

    assert fpath.is_dir(), "fpath must be the parent directory!"

    ps_name_parts = fpath.stem.split("_")

    if len(ps_name_parts) == 3:
        # Must be scene or invalid
        date, time, ident = ps_name_parts
        if _is_valid_date(date, "%Y%m%d") and _is_valid_date(time, "%H%M%S") and len(ident) == 4:
            return "scene"

    if len(ps_name_parts) == 4:
        # Assume scene
        date, time, n, ident = ps_name_parts
        if _is_valid_date(date, "%Y%m%d") and _is_valid_date(time, "%H%M%S") and n.isdigit() and len(ident) == 4:
            return "scene"
        # Is not scene, assume orthotile
        chunkid, tileid, date, ident = ps_name_parts
        if chunkid.isdigit() and tileid.isdigit() and _is_valid_date(date, "%Y-%m-%d") and len(ident) == 4:
            return "orthotile"

    raise ValueError(
        f"Could not parse Planet data type from {fpath}."
        f"Expected a format of YYYYMMDD_HHMMSS_NN_XXXX or YYYYMMDD_HHMMSS_XXXX for scene, "
        "or NNNNNNN/NNNNNNN_NNNNNNN_YYYY-MM-DD_XXXX or NNNNNNN_NNNNNNN_YYYY-MM-DD_XXXX for orthotile."
        f"Got {fpath.stem} instead."
        "Please ensure that the parent directory of the file is used, instead of the file itself."
    )

parse_s2_tile_id

parse_s2_tile_id(
    fpath: str | pathlib.Path,
) -> tuple[str, str, str]

Parse the Sentinel 2 tile ID from a file path.

Parameters:

  • fpath (str | pathlib.Path) –

    The path to the directory containing the TIFF files.

Returns:

  • tuple[str, str, str]

    tuple[str, str, str]: A tuple containing the Planet crop ID, the Sentinel 2 tile ID and the combined tile ID.

Raises:

Source code in darts-acquisition/src/darts_acquisition/s2.py
def parse_s2_tile_id(fpath: str | Path) -> tuple[str, str, str]:
    """Parse the Sentinel 2 tile ID from a file path.

    Args:
        fpath (str | Path): The path to the directory containing the TIFF files.

    Returns:
        tuple[str, str, str]: A tuple containing the Planet crop ID, the Sentinel 2 tile ID and the combined tile ID.

    Raises:
        FileNotFoundError: If no matching TIFF file is found in the specified path.

    """
    fpath = fpath if isinstance(fpath, Path) else Path(fpath)
    try:
        s2_image = next(fpath.glob("*_SR*.tif"))
    except StopIteration:
        raise FileNotFoundError(f"No matching TIFF files found in {fpath.resolve()} (.glob('*_SR*.tif'))")
    planet_crop_id = fpath.stem
    s2_tile_id = "_".join(s2_image.stem.split("_")[:3])
    tile_id = f"{planet_crop_id}_{s2_tile_id}"
    return planet_crop_id, s2_tile_id, tile_id