DataSplit Reference¶

DataSplit¶

DataSplits are collections of multiple DataSets, with each DataSet assigned to a specific role. i.e. training data, validation data, testing data, etc.

class dacapo.experiments.datasplits.DataSplit¶

class dacapo.experiments.datasplits.TrainValidateDataSplit(datasplit_config)¶: Configured with dacapo.datasplits.datasplits.TrainValidateDataSplitConfig

DataSet¶

DataSets define a spatial region containing the necessary data for training provided as multiple Arrays. This can include as much as raw, ground_truth, and a mask, or it could be just raw data in the case of self supervised models.

ABC:

class dacapo.experiments.datasplits.datasets.Dataset¶

Implementations:

class dacapo.experiments.datasplits.datasets.RawGTDataset(dataset_config)¶: Configured with dacapo.experiments.datasplits.datasets.RawGTDatasetConfig

Arrays¶

Arrays define the interface for a contiguous spatial region of data. This data can be raw, ground truth, a mask, or any other spatial data. Arrays can be a direct interface to some storage i.e. a zarr/n5 container, tiff stack, or other data storage, or can be a wrapper modifying another array. This might include operations such as normalizing intensities for raw data, binarizing labels to generate a mask, or upsampling and downsampling. Providing these operations as wrappers around allows us to lazily fetch and transform the data we need consistently in different contexts such as training or validation.

ABC:

class dacapo.experiments.datasplits.datasets.arrays.Array¶

abstract property attrs: Dict[str, Any]¶: Return a dictionary of metadata attributes stored on this array.

abstract property axes: List[str]¶

Returns the axes of this dataset as a string of charactes, as they are indexed. Permitted characters are:

abstract property data: ndarray¶: Get a numpy like readable and writable view into this array.

abstract property dims: int¶: Returns the number of spatial dimensions.

abstract property dtype: Any¶: The dtype of this array, in numpy dtypes

abstract property num_channels: int | None¶: The number of channels provided by this dataset. Should return None if the channel dimension doesn’t exist.

abstract property roi: Roi¶: The total ROI of this array, in world units.

abstract property voxel_size: Coordinate¶: The size of a voxel in physical units.

abstract property writable: bool¶: Can we write to this Array?

Implementations:

class dacapo.experiments.datasplits.datasets.arrays.ZarrArray(array_config)¶

This is a zarr array

Configured with dacapo.experiments.datasplits.datasets.arrays.ZarrArrayConfig

property attrs¶: Return a dictionary of metadata attributes stored on this array.

property axes¶

Returns the axes of this dataset as a string of charactes, as they are indexed. Permitted characters are:

classmethod create_from_array_identifier(array_identifier, axes, roi, num_channels, voxel_size, dtype, write_size=None, name=None)¶: Create a new ZarrArray given an array identifier. It is assumed that this array_identifier points to a dataset that does not yet exist

property data: Any¶: Get a numpy like readable and writable view into this array.

property dims: int¶: Returns the number of spatial dimensions.

property dtype: Any¶: The dtype of this array, in numpy dtypes

property num_channels: int | None¶: The number of channels provided by this dataset. Should return None if the channel dimension doesn’t exist.

property roi¶: The total ROI of this array, in world units.

property voxel_size¶: The size of a voxel in physical units.

property writable: bool¶: Can we write to this Array?

class dacapo.experiments.datasplits.datasets.arrays.BinarizeArray(array_config)¶

This is wrapper around a ZarrArray containing uint annotations. Because we often want to predict classes that are a combination of a set of labels we wrap a ZarrArray with the BinarizeArray and provide something like groupings=[(“mito”, [3,4,5])] where 4 corresponds to mito_membrane, 5 is mito_ribos, and 3 is everything else that is part of a mitochondria. The BinarizeArray will simply combine labels 3,4,5 into a single binary channel for th class of “mito”. We use a single channel per class because some classes may overlap. For example if you had groupings=[(“mito”, [3,4,5]), (“membrane”, [4, 8, 1])] where 4 is mito_membrane, 8 is er_membrane, and 1 is plasma_membrane. Now you can have a binary classification for membrane or not which in some cases overlaps with the channel for mitochondria which includes the mito membrane.

Configured with dacapo.experiments.datasplits.datasets.arrays.BinarizeArrayConfig

property attrs¶: Return a dictionary of metadata attributes stored on this array.

property axes¶

Returns the axes of this dataset as a string of charactes, as they are indexed. Permitted characters are:

property data¶: Get a numpy like readable and writable view into this array.

property dims: int¶: Returns the number of spatial dimensions.

property dtype¶: The dtype of this array, in numpy dtypes

property num_channels: int¶: The number of channels provided by this dataset. Should return None if the channel dimension doesn’t exist.

property roi: Roi¶: The total ROI of this array, in world units.

property voxel_size: Coordinate¶: The size of a voxel in physical units.

property writable: bool¶: Can we write to this Array?

class dacapo.experiments.datasplits.datasets.arrays.ResampledArray(array_config)¶

This is a zarr array

Configured with dacapo.experiments.datasplits.datasets.arrays.ResampledArrayConfig

property attrs¶: Return a dictionary of metadata attributes stored on this array.

property axes¶

Returns the axes of this dataset as a string of charactes, as they are indexed. Permitted characters are:

property data¶: Get a numpy like readable and writable view into this array.

property dims: int¶: Returns the number of spatial dimensions.

property dtype¶: The dtype of this array, in numpy dtypes

property num_channels: int¶: The number of channels provided by this dataset. Should return None if the channel dimension doesn’t exist.

property roi: Roi¶: The total ROI of this array, in world units.

property voxel_size: Coordinate¶: The size of a voxel in physical units.

property writable: bool¶: Can we write to this Array?

class dacapo.experiments.datasplits.datasets.arrays.IntensitiesArray(array_config)¶

This is wrapper another array that will normalize intensities to the range (0, 1) and convert to float32. Use this if you have your intensities stored as uint8 or similar and want your model to have floats as input.

Configured with dacapo.experiments.datasplits.datasets.arrays.IntensitiesArrayConfig

property attrs¶: Return a dictionary of metadata attributes stored on this array.

property axes¶

Returns the axes of this dataset as a string of charactes, as they are indexed. Permitted characters are:

property data¶: Get a numpy like readable and writable view into this array.

property dims: int¶: Returns the number of spatial dimensions.

property dtype¶: The dtype of this array, in numpy dtypes

property num_channels: int¶: The number of channels provided by this dataset. Should return None if the channel dimension doesn’t exist.

property roi: Roi¶: The total ROI of this array, in world units.

property voxel_size: Coordinate¶: The size of a voxel in physical units.

property writable: bool¶: Can we write to this Array?

class dacapo.experiments.datasplits.datasets.arrays.MissingAnnotationsMask(array_config)¶

This is wrapper around a ZarrArray containing uint annotations. Complementary to the BinarizeArray class where we convert labels into individual channels for training, we may find crops where a specific label is present, but not annotated. In that case you might want to avoid training specific channels for specific training volumes. See package fibsem_tools for appropriate metadata format for indicating presence of labels in your ground truth. “https://github.com/janelia-cosem/fibsem-tools”

Configured with dacapo.experiments.datasplits.datasets.arrays.MissintAnnotationsMaskConfig

property attrs¶: Return a dictionary of metadata attributes stored on this array.

property axes¶

Returns the axes of this dataset as a string of charactes, as they are indexed. Permitted characters are:

property data¶: Get a numpy like readable and writable view into this array.

property dims: int¶: Returns the number of spatial dimensions.

property dtype¶: The dtype of this array, in numpy dtypes

property num_channels: int¶: The number of channels provided by this dataset. Should return None if the channel dimension doesn’t exist.

property roi: Roi¶: The total ROI of this array, in world units.

property voxel_size: Coordinate¶: The size of a voxel in physical units.

property writable: bool¶: Can we write to this Array?

class dacapo.experiments.datasplits.datasets.arrays.OnesArray(array_config)¶

This is a wrapper around another source_array that simply provides ones with the same metadata as the source_array.

Configured with dacapo.experiments.datasplits.datasets.arrays.OnesArrayConfig

property attrs¶: Return a dictionary of metadata attributes stored on this array.

property axes¶

Returns the axes of this dataset as a string of charactes, as they are indexed. Permitted characters are:

property data¶: Get a numpy like readable and writable view into this array.

property dims¶: Returns the number of spatial dimensions.

property dtype¶: The dtype of this array, in numpy dtypes

property num_channels¶: The number of channels provided by this dataset. Should return None if the channel dimension doesn’t exist.

property roi¶: The total ROI of this array, in world units.

property voxel_size¶: The size of a voxel in physical units.

property writable: bool¶: Can we write to this Array?

class dacapo.experiments.datasplits.datasets.arrays.ConcatArray(array_config)¶

This is a wrapper around other source_arrays that concatenates them along the channel dimension.

Configured with dacapo.experiments.datasplits.datasets.arrays.ConcatArrayConfig

property attrs¶: Return a dictionary of metadata attributes stored on this array.

property axes¶

Returns the axes of this dataset as a string of charactes, as they are indexed. Permitted characters are:

property data¶: Get a numpy like readable and writable view into this array.

property dims¶: Returns the number of spatial dimensions.

property dtype¶: The dtype of this array, in numpy dtypes

property num_channels¶: The number of channels provided by this dataset. Should return None if the channel dimension doesn’t exist.

property roi¶: The total ROI of this array, in world units.

property voxel_size¶: The size of a voxel in physical units.

property writable: bool¶: Can we write to this Array?

class dacapo.experiments.datasplits.datasets.arrays.LogicalOrArray(array_config)¶

Configured with dacapo.experiments.datasplits.datasets.arrays.LogicalOrArrayConfig

property attrs¶: Return a dictionary of metadata attributes stored on this array.

property axes¶

Returns the axes of this dataset as a string of charactes, as they are indexed. Permitted characters are:

property data¶: Get a numpy like readable and writable view into this array.

property dims: int¶: Returns the number of spatial dimensions.

property dtype¶: The dtype of this array, in numpy dtypes

property num_channels¶: The number of channels provided by this dataset. Should return None if the channel dimension doesn’t exist.

property roi: Roi¶: The total ROI of this array, in world units.

property voxel_size: Coordinate¶: The size of a voxel in physical units.

property writable: bool¶: Can we write to this Array?

class dacapo.experiments.datasplits.datasets.arrays.CropArray(array_config)¶

Used to crop a larger array to a smaller array.

Configured with dacapo.experiments.datasplits.datasets.arrays.CropArrayConfig

property attrs¶: Return a dictionary of metadata attributes stored on this array.

property axes¶

Returns the axes of this dataset as a string of charactes, as they are indexed. Permitted characters are:

property data¶: Get a numpy like readable and writable view into this array.

property dims: int¶: Returns the number of spatial dimensions.

property dtype¶: The dtype of this array, in numpy dtypes

property num_channels: int¶: The number of channels provided by this dataset. Should return None if the channel dimension doesn’t exist.

property roi: Roi¶: The total ROI of this array, in world units.

property voxel_size: Coordinate¶: The size of a voxel in physical units.

property writable: bool¶: Can we write to this Array?