Tolstoi Data Loading¶
-
class
deepobs.tolstoi.tolstoi_input.data_loading(batch_size, seq_length)[source]¶ Class providing the data loading functionality for the Tolstoi data set.
Parameters: - batch_size (int) -- Batch size of the input-output pairs. No default value is given.
- seq_length (int) -- Sequence length to be model in each step. No default value is given.
-
batch_size¶ Batch size of the input-output pairs.
Type: int
-
seq_length¶ Sequence length to be model in each step.
Type: int
-
train_eval_size¶ Number of data points to evaluate during the train eval phase. Currently set to
658725the size of the test set.Type: int
-
D_train¶ The training data set.
Type: tf.data.Dataset
-
D_train_eval¶ The training evaluation data set. It is the same data as D_train but we go through it separately.
Type: tf.data.Dataset
-
D_test¶ The test data set.
Type: tf.data.Dataset
-
phase¶ Variable to describe which phase we are currently in. Can be "train", "train_eval" or "test". The phase variable can determine the behaviour of the network, for example deactivate dropout during evaluation.
Type: tf.Variable
-
iterator¶ A single iterator for all three data sets. We us the initialization operators (see below) to switch this iterator to the data sets.
Type: tf.data.Iterator
-
X¶ Tensor holding the input text of the tolstoi data set for character prediction. It has dimension batch_size x seq_length.
Type: tf.Tensor
-
y¶ Tensor holding the target text of the tolstoi data set for character prediction, i.e. the input text shifted by a single character. It has dimension batch_size x seq_length.
Type: tf.Tensor
-
train_init_op¶ A TensorFlow operation to be performed before starting every training epoch. It sets the phase variable to "train" and initializes the iterator to the training data set.
Type: tf.Operation
-
train_eval_init_op¶ A TensorFlow operation to be performed before starting every training eval phase. It sets the phase variable to "train_eval" and initializes the iterator to the training eval data set.
Type: tf.Operation
-
test_init_op¶ A TensorFlow operation to be performed before starting every test evaluation phase. It sets the phase variable to "test" and initializes the iterator to the test data set.
Type: tf.Operation
-
load()[source]¶ Returns the data (X (input text) and y (output text)) and the phase variable.
Returns: Tupel consisting of the input text (X), the output text (y) and the phase variable (phase). Return type: tupel
-
make_text_dataset(filepath, batch_size, seq_length, num_prefetched_batches=10, data_set_size=-1)[source]¶ Produce a TensorFlow dataset from the filepath to the preprocessed data set.
Parameters: - filepath (str) -- Path to the
.npyfile containing the data set. - batch_size (int) -- Batch size of the input-output pairs.
- seq_length (int) -- Sequence length to be model in each step.
- num_prefetched_batches (int) -- Number of prefeteched batches, defaults to
10. - data_set_size (int) -- Size of the data set to extract from the images and label files. Defaults to
-1meaning that the full data set is used.
Returns: Data set object containing the input and output pair.
Return type: tf.data.Dataset
- filepath (str) -- Path to the