DataSplit Reference¶
DataSplit¶
DataSplits
are collections of multiple DataSets
, with each
DataSet
assigned to a specific role. i.e. training data, validation data,
testing data, etc.
- class dacapo.experiments.datasplits.DataSplit¶
- class dacapo.experiments.datasplits.TrainValidateDataSplit(datasplit_config)¶
Configured with
dacapo.datasplits.datasplits.TrainValidateDataSplitConfig
DataSet¶
DataSets
define a spatial region containing the necessary data
for training provided as multiple Arrays. This can include as much as
raw, ground_truth, and a mask, or it could be just raw data in the case
of self supervised models.
ABC:
- class dacapo.experiments.datasplits.datasets.Dataset¶
Implementations:
- class dacapo.experiments.datasplits.datasets.RawGTDataset(dataset_config)¶
Configured with
dacapo.experiments.datasplits.datasets.RawGTDatasetConfig
Arrays¶
Arrays
define the interface for a contiguous spatial region of
data. This data can be raw, ground truth, a mask, or any other spatial
data. Arrays
can be a direct interface to some storage i.e. a
zarr/n5 container, tiff stack, or other data storage, or can be a
wrapper modifying another array. This might include operations such
as normalizing intensities for raw data, binarizing labels to generate
a mask, or upsampling and downsampling. Providing these operations as
wrappers around allows us to lazily fetch and transform the data we
need consistently in different contexts such as training or validation.
ABC:
- class dacapo.experiments.datasplits.datasets.arrays.Array¶
- abstract property attrs: Dict[str, Any]¶
Return a dictionary of metadata attributes stored on this array.
- abstract property axes: List[str]¶
Returns the axes of this dataset as a string of charactes, as they are indexed. Permitted characters are:
zyx
for spatial dimensionsc
for channelss
for samples
- abstract property data: ndarray¶
Get a numpy like readable and writable view into this array.
- abstract property dims: int¶
Returns the number of spatial dimensions.
- abstract property dtype: Any¶
The dtype of this array, in numpy dtypes
- abstract property num_channels: int | None¶
The number of channels provided by this dataset. Should return None if the channel dimension doesn’t exist.
- abstract property roi: Roi¶
The total ROI of this array, in world units.
- abstract property voxel_size: Coordinate¶
The size of a voxel in physical units.
- abstract property writable: bool¶
Can we write to this Array?
Implementations:
- class dacapo.experiments.datasplits.datasets.arrays.ZarrArray(array_config)¶
This is a zarr array
Configured with
dacapo.experiments.datasplits.datasets.arrays.ZarrArrayConfig
- property attrs¶
Return a dictionary of metadata attributes stored on this array.
- property axes¶
Returns the axes of this dataset as a string of charactes, as they are indexed. Permitted characters are:
zyx
for spatial dimensionsc
for channelss
for samples
- classmethod create_from_array_identifier(array_identifier, axes, roi, num_channels, voxel_size, dtype, write_size=None, name=None)¶
Create a new ZarrArray given an array identifier. It is assumed that this array_identifier points to a dataset that does not yet exist
- property data: Any¶
Get a numpy like readable and writable view into this array.
- property dims: int¶
Returns the number of spatial dimensions.
- property dtype: Any¶
The dtype of this array, in numpy dtypes
- property num_channels: int | None¶
The number of channels provided by this dataset. Should return None if the channel dimension doesn’t exist.
- property roi¶
The total ROI of this array, in world units.
- property voxel_size¶
The size of a voxel in physical units.
- property writable: bool¶
Can we write to this Array?
- class dacapo.experiments.datasplits.datasets.arrays.BinarizeArray(array_config)¶
This is wrapper around a ZarrArray containing uint annotations. Because we often want to predict classes that are a combination of a set of labels we wrap a ZarrArray with the BinarizeArray and provide something like groupings=[(“mito”, [3,4,5])] where 4 corresponds to mito_membrane, 5 is mito_ribos, and 3 is everything else that is part of a mitochondria. The BinarizeArray will simply combine labels 3,4,5 into a single binary channel for th class of “mito”. We use a single channel per class because some classes may overlap. For example if you had groupings=[(“mito”, [3,4,5]), (“membrane”, [4, 8, 1])] where 4 is mito_membrane, 8 is er_membrane, and 1 is plasma_membrane. Now you can have a binary classification for membrane or not which in some cases overlaps with the channel for mitochondria which includes the mito membrane.
Configured with
dacapo.experiments.datasplits.datasets.arrays.BinarizeArrayConfig
- property attrs¶
Return a dictionary of metadata attributes stored on this array.
- property axes¶
Returns the axes of this dataset as a string of charactes, as they are indexed. Permitted characters are:
zyx
for spatial dimensionsc
for channelss
for samples
- property data¶
Get a numpy like readable and writable view into this array.
- property dims: int¶
Returns the number of spatial dimensions.
- property dtype¶
The dtype of this array, in numpy dtypes
- property num_channels: int¶
The number of channels provided by this dataset. Should return None if the channel dimension doesn’t exist.
- property roi: Roi¶
The total ROI of this array, in world units.
- property voxel_size: Coordinate¶
The size of a voxel in physical units.
- property writable: bool¶
Can we write to this Array?
- class dacapo.experiments.datasplits.datasets.arrays.ResampledArray(array_config)¶
This is a zarr array
Configured with
dacapo.experiments.datasplits.datasets.arrays.ResampledArrayConfig
- property attrs¶
Return a dictionary of metadata attributes stored on this array.
- property axes¶
Returns the axes of this dataset as a string of charactes, as they are indexed. Permitted characters are:
zyx
for spatial dimensionsc
for channelss
for samples
- property data¶
Get a numpy like readable and writable view into this array.
- property dims: int¶
Returns the number of spatial dimensions.
- property dtype¶
The dtype of this array, in numpy dtypes
- property num_channels: int¶
The number of channels provided by this dataset. Should return None if the channel dimension doesn’t exist.
- property roi: Roi¶
The total ROI of this array, in world units.
- property voxel_size: Coordinate¶
The size of a voxel in physical units.
- property writable: bool¶
Can we write to this Array?
- class dacapo.experiments.datasplits.datasets.arrays.IntensitiesArray(array_config)¶
This is wrapper another array that will normalize intensities to the range (0, 1) and convert to float32. Use this if you have your intensities stored as uint8 or similar and want your model to have floats as input.
Configured with
dacapo.experiments.datasplits.datasets.arrays.IntensitiesArrayConfig
- property attrs¶
Return a dictionary of metadata attributes stored on this array.
- property axes¶
Returns the axes of this dataset as a string of charactes, as they are indexed. Permitted characters are:
zyx
for spatial dimensionsc
for channelss
for samples
- property data¶
Get a numpy like readable and writable view into this array.
- property dims: int¶
Returns the number of spatial dimensions.
- property dtype¶
The dtype of this array, in numpy dtypes
- property num_channels: int¶
The number of channels provided by this dataset. Should return None if the channel dimension doesn’t exist.
- property roi: Roi¶
The total ROI of this array, in world units.
- property voxel_size: Coordinate¶
The size of a voxel in physical units.
- property writable: bool¶
Can we write to this Array?
- class dacapo.experiments.datasplits.datasets.arrays.MissingAnnotationsMask(array_config)¶
This is wrapper around a ZarrArray containing uint annotations. Complementary to the BinarizeArray class where we convert labels into individual channels for training, we may find crops where a specific label is present, but not annotated. In that case you might want to avoid training specific channels for specific training volumes. See package fibsem_tools for appropriate metadata format for indicating presence of labels in your ground truth. “https://github.com/janelia-cosem/fibsem-tools”
Configured with
dacapo.experiments.datasplits.datasets.arrays.MissintAnnotationsMaskConfig
- property attrs¶
Return a dictionary of metadata attributes stored on this array.
- property axes¶
Returns the axes of this dataset as a string of charactes, as they are indexed. Permitted characters are:
zyx
for spatial dimensionsc
for channelss
for samples
- property data¶
Get a numpy like readable and writable view into this array.
- property dims: int¶
Returns the number of spatial dimensions.
- property dtype¶
The dtype of this array, in numpy dtypes
- property num_channels: int¶
The number of channels provided by this dataset. Should return None if the channel dimension doesn’t exist.
- property roi: Roi¶
The total ROI of this array, in world units.
- property voxel_size: Coordinate¶
The size of a voxel in physical units.
- property writable: bool¶
Can we write to this Array?
- class dacapo.experiments.datasplits.datasets.arrays.OnesArray(array_config)¶
This is a wrapper around another source_array that simply provides ones with the same metadata as the source_array.
Configured with
dacapo.experiments.datasplits.datasets.arrays.OnesArrayConfig
- property attrs¶
Return a dictionary of metadata attributes stored on this array.
- property axes¶
Returns the axes of this dataset as a string of charactes, as they are indexed. Permitted characters are:
zyx
for spatial dimensionsc
for channelss
for samples
- property data¶
Get a numpy like readable and writable view into this array.
- property dims¶
Returns the number of spatial dimensions.
- property dtype¶
The dtype of this array, in numpy dtypes
- property num_channels¶
The number of channels provided by this dataset. Should return None if the channel dimension doesn’t exist.
- property roi¶
The total ROI of this array, in world units.
- property voxel_size¶
The size of a voxel in physical units.
- property writable: bool¶
Can we write to this Array?
- class dacapo.experiments.datasplits.datasets.arrays.ConcatArray(array_config)¶
This is a wrapper around other source_arrays that concatenates them along the channel dimension.
Configured with
dacapo.experiments.datasplits.datasets.arrays.ConcatArrayConfig
- property attrs¶
Return a dictionary of metadata attributes stored on this array.
- property axes¶
Returns the axes of this dataset as a string of charactes, as they are indexed. Permitted characters are:
zyx
for spatial dimensionsc
for channelss
for samples
- property data¶
Get a numpy like readable and writable view into this array.
- property dims¶
Returns the number of spatial dimensions.
- property dtype¶
The dtype of this array, in numpy dtypes
- property num_channels¶
The number of channels provided by this dataset. Should return None if the channel dimension doesn’t exist.
- property roi¶
The total ROI of this array, in world units.
- property voxel_size¶
The size of a voxel in physical units.
- property writable: bool¶
Can we write to this Array?
- class dacapo.experiments.datasplits.datasets.arrays.LogicalOrArray(array_config)¶
Configured with
dacapo.experiments.datasplits.datasets.arrays.LogicalOrArrayConfig
- property attrs¶
Return a dictionary of metadata attributes stored on this array.
- property axes¶
Returns the axes of this dataset as a string of charactes, as they are indexed. Permitted characters are:
zyx
for spatial dimensionsc
for channelss
for samples
- property data¶
Get a numpy like readable and writable view into this array.
- property dims: int¶
Returns the number of spatial dimensions.
- property dtype¶
The dtype of this array, in numpy dtypes
- property num_channels¶
The number of channels provided by this dataset. Should return None if the channel dimension doesn’t exist.
- property roi: Roi¶
The total ROI of this array, in world units.
- property voxel_size: Coordinate¶
The size of a voxel in physical units.
- property writable: bool¶
Can we write to this Array?
- class dacapo.experiments.datasplits.datasets.arrays.CropArray(array_config)¶
Used to crop a larger array to a smaller array.
Configured with
dacapo.experiments.datasplits.datasets.arrays.CropArrayConfig
- property attrs¶
Return a dictionary of metadata attributes stored on this array.
- property axes¶
Returns the axes of this dataset as a string of charactes, as they are indexed. Permitted characters are:
zyx
for spatial dimensionsc
for channelss
for samples
- property data¶
Get a numpy like readable and writable view into this array.
- property dims: int¶
Returns the number of spatial dimensions.
- property dtype¶
The dtype of this array, in numpy dtypes
- property num_channels: int¶
The number of channels provided by this dataset. Should return None if the channel dimension doesn’t exist.
- property roi: Roi¶
The total ROI of this array, in world units.
- property voxel_size: Coordinate¶
The size of a voxel in physical units.
- property writable: bool¶
Can we write to this Array?