Download Data¶
In this notebook, we will download data and convert it to a zarr dataset.
This tutorial was written by Henry Westmacott and Manan Lalit.
For demonstration, we will use a subset of images of Fluo-N2DL-HeLa
available
on the Cell Tracking Challenge
webpage.
Firstly, the tif
raw images are downloaded to a directory indicated by data_dir
.
In [1]:
Copied!
from pathlib import Path
from pathlib import Path
In [2]:
Copied!
import numpy as np
import tifffile
import zarr
from cellulus.utils.misc import extract_data
from csbdeep.utils import normalize
from tqdm import tqdm
import numpy as np
import tifffile
import zarr
from cellulus.utils.misc import extract_data
from csbdeep.utils import normalize
from tqdm import tqdm
In [3]:
Copied!
name = "2d-data-demo"
data_dir = "./data"
extract_data(
zip_url="https://github.com/funkelab/cellulus/releases/download/v0.0.1-tag/2d-data-demo.zip",
data_dir=data_dir,
project_name=name,
)
name = "2d-data-demo"
data_dir = "./data"
extract_data(
zip_url="https://github.com/funkelab/cellulus/releases/download/v0.0.1-tag/2d-data-demo.zip",
data_dir=data_dir,
project_name=name,
)
Created new directory ./data
Downloaded and unzipped data to the location ./data
Next, a channel dimension is added to these images and they are appended in a list.
In [4]:
Copied!
container_path = zarr.open(name + ".zarr")
dataset_name = "train/raw"
image_filenames = sorted((Path(data_dir) / name / "images").glob("*.tif"))
print(f"Number of raw images is {len(image_filenames)}")
image_list = []
for i in tqdm(range(len(image_filenames))):
im = normalize(
tifffile.imread(image_filenames[i]).astype(np.float32), 1, 99.8, axis=(0, 1)
)
image_list.append(im[np.newaxis, ...])
image_list = np.asarray(image_list)
container_path = zarr.open(name + ".zarr")
dataset_name = "train/raw"
image_filenames = sorted((Path(data_dir) / name / "images").glob("*.tif"))
print(f"Number of raw images is {len(image_filenames)}")
image_list = []
for i in tqdm(range(len(image_filenames))):
im = normalize(
tifffile.imread(image_filenames[i]).astype(np.float32), 1, 99.8, axis=(0, 1)
)
image_list.append(im[np.newaxis, ...])
image_list = np.asarray(image_list)
Number of raw images is 11
0%| | 0/11 [00:00<?, ?it/s]
55%|█████████████████████████████████████████████████████████████████████████████████▊ | 6/11 [00:00<00:00, 54.47it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 11/11 [00:00<00:00, 57.27it/s]
Lastly, the zarr dataset is populated, the axis names and resolution is specified.
In [5]:
Copied!
container_path[dataset_name] = image_list
container_path[dataset_name].attrs["resolution"] = (1, 1)
container_path[dataset_name].attrs["axis_names"] = ("s", "c", "y", "x")
container_path[dataset_name] = image_list
container_path[dataset_name].attrs["resolution"] = (1, 1)
container_path[dataset_name].attrs["axis_names"] = ("s", "c", "y", "x")