Data preparation

Supported data formats

We support square, 2D images. They can be either grayscale or RGB. The images must have one of the following file extensions: ["png", "jpg", "jpeg", "JPG", "tiff", "tif"]. All of your images should be of the same type.

Data splits

You must split your data into a train and a validation set with different images. The validation dataset is used to determine the best models to use. We also recommend a separate test dataset, which can be used to compute final metrics. Make sure that there is no overlap between the images in these three datasets, otherwise this may invalidate your results.

We recommend you to have one root directory for all of your data, and sub-directories for train,validation and test.

data/
├── train/
│   └── ...
├── validation/
│   └── ...
└── test/
    └── ...

File organization

Within the train, validation and test directories, the sub-directories define class membership. The images must be organized in directories named after the class they belong to. Only the first level of the directory structure is considered as the class.

Here is an example of a valid organization of the train directory in a case with three classes class_A, class_B and class_C.

train/
├── class_A/
│   ├── xxx.png
│   ├── xxy.png
│   ├── abc.png
│   ├── arbitrary_subdirectory/
│      ├── aaa.png
│      └── bcd.png
│   └── ...
├── class_B/
│   ├── 123.png
│   ├── nsdf3.png
│   └── ...
└── class_C/
    ├── asd.png
    ├── fgh.png
    └── hjk.png

Every image in the class_A directory will get the label for class_A, including if they are in sub-directories.

Attention

Make sure to use the same names for your class directories in the train, validation, and test directories. The names are used to determine the labels, and you will get incorrect results if they do not match, or if you have any additional class directories.

Example data

If you just want to try QuAC as a learning experience, you can use one of the datasets in this collection, and the pre-trained classifiers we provide.