In [ ]:
# Define labels of the classes and location of raw data :
data = {
'Class_1': '/raw/Class_1/',
'Class_2': '/raw/Class_2/',
'Class_3': '/raw/Class_3/',
}
# Define which filetype to be used in this raw data location:
filetype = 'cif'
# Select which channels to be included in the digested data:
channels = [3,6]
image_size = 48
# Desired location to save digested data :
directory = '/digested/'
split = {
'Training' : 0.8,
'Validation' : 0.1,
'Testing' : 0.1
}
In [ ]:
import digest
In [ ]:
digest.parse(filetype, directory, data, channels, image_size)
In [ ]:
digest.split(directory, split)
In [ ]:
digest.class_weights(directory, data)
Split ratio for different validation methods:
Split the whole collection of data into Training / Validation / Testing:
For example:
split = {
"Training" : 0.8,
"Validation" : 0.1,
"Testing" : 0.1
}
Split the collection of data into Training / Validation, select another dataset for Testing:
For example:
- First, set raw data location, output directory and split ratio for Training / Validation:
data = {
"Class_1": "/raw/Class_1/",
"Class_2": "/raw/Class_2/",
"Class_3": "/raw/Class_3/",
}
directory = '/digested_TRAIN/'
split = {
"Training" : 0.8,
"Validation" : 0.2,
"Testing" : 0
}
- Perform data digestion with this split:
digest.parse(directory, data, channels, image_size)
digest.class_weights(directory, data)
digest.split(directory, split)
- Then, set NEW raw data location, NEW output directory and NEW split ratio for Testing:
data = {
"Class_1": "/raw/Class_1/",
"Class_2": "/raw/Class_2/",
"Class_3": "/raw/Class_3/",
}
directory = '/digested_TEST/'
split = {
"Training" : 0,
"Validation" : 0,
"Testing" : 1
}
- Repeat data digestion with NEW inputs:
digest.parse(directory, data, channels, image_size)
digest.class_weights(directory, data)
digest.split(directory, split)
k-fold cross validation:
For example: for 5-fold cross validation
split = {
"Training" : 0.8,
"Validation" : 0.2,
"Testing" : 0
}
If user intends to use our built-in CNN, any number of channels are welcome.
If user intends to use pre-trained networks from Keras.applications (VGG, ResNet50, Inception), be warned that these networks are built for 3 channels of RGB images. Therefore, one should selectively choose maximum 3 channels that provide sufficient information for making prediction, and should omit the channels that may introduce noise.