ImageNet Data Loading¶
-
class
deepobs.imagenet.imagenet_input.data_loading(batch_size)[source]¶ Class providing the data loading functionality for the ImageNet data set.
Parameters: batch_size (int) -- Batch size of the input-output pairs. No default value is given. -
batch_size¶ Batch size of the input-output pairs.
Type: int
-
train_eval_size¶ Number of data points to evaluate during the train eval phase. Currently set to
50000the size of the test set.Type: int
-
D_train¶ The training data set.
Type: tf.data.Dataset
-
D_train_eval¶ The training evaluation data set. It is the same data as D_train but we go through it separately.
Type: tf.data.Dataset
-
D_test¶ The test data set.
Type: tf.data.Dataset
-
phase¶ Variable to describe which phase we are currently in. Can be "train", "train_eval" or "test". The phase variable can determine the behaviour of the network, for example deactivate dropout during evaluation.
Type: tf.Variable
-
iterator¶ A single iterator for all three data sets. We us the initialization operators (see below) to switch this iterator to the data sets.
Type: tf.data.Iterator
-
X¶ Tensor holding the ImageNet images. It has dimension batch_size x
224(image size) x224(image size) x3(rgb).Type: tf.Tensor
-
y¶ Label of the ImageNet images. It has dimension batch_size x
10(number of classes).Type: tf.Tensor
-
train_init_op¶ A TensorFlow operation to be performed before starting every training epoch. It sets the phase variable to "train" and initializes the iterator to the training data set.
Type: tf.Operation
-
train_eval_init_op¶ A TensorFlow operation to be performed before starting every training eval phase. It sets the phase variable to "train_eval" and initializes the iterator to the training eval data set.
Type: tf.Operation
-
test_init_op¶ A TensorFlow operation to be performed before starting every test evaluation phase. It sets the phase variable to "test" and initializes the iterator to the test data set.
Type: tf.Operation
-
aspect_preserving_resize(image, target_smaller_side)[source]¶ "Resize image such that the smaller size has size
target_smaller_siderwhile preserving the aspect ratio.Parameters: - image (tf.Tensor) -- Tensor containing the image to resize.
- target_smaller_side (int) -- Target size for the smaller side in pixel.
Returns: The resized image, with the same aspect ratio as the input.
Return type: tf.Tensor
-
color_distortion(image, scope=None)[source]¶ Distort the color of the image.
Parameters: - image (tf.Tensor) -- Tensor containing single image.
- scope (str) -- Optional scope for name_scope.
Returns: The color-distorted image.
Return type: tf.Tensor
-
decode_jpeg(image_buffer, scope=None)[source]¶ Decode a JPEG string into one 3-D float image Tensor.
Parameters: - image_buffer (tf.string) -- scalar string Tensor.
- scope (str) -- Optional scope for name_scope.
Returns: 3-D float Tensor with values ranging from [0, 1).
Return type: tf.Tensor
-
load()[source]¶ Returns the data (X (images) and y (labels)) and the phase variable.
Returns: Tupel consisting of the images (X), the label (y) and the phase variable (phase). Return type: tupel
-
make_dataset(filenames, batch_size, per_image_standardization=True, crop_size=224, random_crop=False, random_flip_left_right=False, distort_color=False, shuffle=True, shuffle_buffer_size=15000, one_hot=True, num_prefetched_batches=8, num_preprocessing_threads=16, data_set_size=-1)[source]¶ Creates a data set from filenames of the images and label files.
Parameters: - filenames (str) -- (List of) paths to the
.binfiles containing the images and labels. - batch_size (int) -- Batch size of the input-output pairs.
- crop_size (int) -- Crop size of each image. Defaults to
224. - per_image_standardization (bool) -- Switch to standardize each image to have zero mean and unit norm. Defaults to
True. - random_crop (bool) -- Switch if random crops should be used. Defaults to
False. - random_flip_left_right (bool) -- Switch to randomly flip the images horizontally. Defaults to
False. - distort_color (bool) -- Switch to use random brightness, saturation, hue and contrast on each image. Defaults to
False. - shuffle (bool) -- Switch to turn on or off shuffling of the data set. Defaults to
True. - shuffle_buffer_size (int) -- Size of the shuffle buffer. Defaults to
15000. - one_hot (bool) -- Switch to turn on or off one-hot encoding of the labels. Defaults to
True. - num_prefetched_batches (int) -- Number of prefeteched batches, defaults to
8. - num_preprocessing_threads (int) -- The number of elements to process in parallel while applying the image transformations. Defaults to
16. - data_set_size (int) -- Size of the data set to extract from the images and label files. Defaults to
-1meaning that the full data set is used.
Returns: Data set object created from the images and label files.
Return type: tf.data.Dataset
- filenames (str) -- (List of) paths to the
-
parse_example_proto(example_serialized)[source]¶ Parses an Example proto containing a training example of an image. The output of the build_image_data.py image preprocessing script is a dataset containing serialized Example protocol buffers. Each Example proto contains the following fields: image/height: 462 image/width: 581 image/colorspace: 'RGB' image/channels: 3 image/class/label: 615 image/class/synset: 'n03623198' image/class/text: 'knee pad' image/format: 'JPEG' image/filename: 'ILSVRC2012_val_00041207.JPEG' image/encoded: <JPEG encoded string>
Parameters: example_serialized (tf.string) -- Scalar Tensor tf.string containing a serialized Example protocol buffer. Returns: Tupel of image_buffer (tf.string) containing the contents of a JPEG file, the label (tf.int32) containing the label and text (tf.string) containing the human-readable label. Return type: tupel
-
test_dataset(batch_size)[source]¶ Creates the test data set.
Parameters: batch_size (int) -- Batch size of the input-output pairs. Returns: The test data set. Return type: tf.data.Dataset
-
train_dataset(batch_size, data_augmentation=True)[source]¶ Creates the training data set.
Parameters: - batch_size (int) -- Batch size of the input-output pairs.
- data_augmentation (bool) -- Switch to turn basic data augmentation on or off while training. Defaults to
true.
Returns: The training data set.
Return type: tf.data.Dataset
-
train_eval_dataset(batch_size, data_augmentation=True)[source]¶ Creates the train eval data set.
Parameters: - batch_size (int) -- Batch size of the input-output pairs.
- data_augmentation (bool) -- Switch to turn basic data augmentation on or off while evaluating the training data set. Defaults to
true.
Returns: The train eval data set.
Return type: tf.data.Dataset
-