Tensorflow Record is useful for storing serialized string efficiently. In this section, we introduce some basic use case of Tensorflow Record.
tf.trian.Example message provide an efficient way to construct serialized data structure for custom dataset. The pipeline is to create dictionary of every feature using tf.train.Feature for each instance, and then transform this dictionary into tf.train.Features object. Thereafter, create tf.train.Example using tf.train.Features, and finally create TFRecord using the serialize string of tf.train.Example.
For instance, if we have a dataset consist of 100 instance with 3 features and 1 target:
tf.train.Feature(float_list=tf.train.FloatList(value=data))
We need to first construct tf.train.Feature object for each instance. Here, we focus on float features since they are the most commonly used data type. However, tf.train.Feature can be generalized to other format.
np.random.seed(0)
X = np.random.normal(0, 1, 300).reshape(100, 3)
y = np.random.normal(0, 1, 100)
dict_of_features_for_sample_0 = {
'feature A': tf.train.Feature(float_list=tf.train.FloatList(value=[X[0, 0]])),
'feature B': tf.train.Feature(float_list=tf.train.FloatList(value=[X[0, 1]])),
'feature C': tf.train.Feature(float_list=tf.train.FloatList(value=[X[0, 2]])),
'target': tf.train.Feature(float_list=tf.train.FloatList(value=[y[0]])),
}
tf.train.Features(dict)
Next, we construct Features object using the dictionary created from previous step.
features_for_sample_0 = tf.train.Features(feature=dict_of_features_for_sample_0)
tf.train.Example(features=features)
Lastly, we coonstruct tf.train.Example using tf.train.Features.
example_for_sample_0 = tf.train.Example(features=features_for_sample_0)
example.SerializeToString()
example_for_sample_0.SerializeToString()