Jupyter Notebook

Tensorflow Record is useful for storing serialized string efficiently. In this section, we introduce some basic use case of Tensorflow Record.

Construct from Example

tf.trian.Example message provide an efficient way to construct serialized data structure for custom dataset. The pipeline is to create dictionary of every feature using tf.train.Feature for each instance, and then transform this dictionary into tf.train.Features object. Thereafter, create tf.train.Example using tf.train.Features, and finally create TFRecord using the serialize string of tf.train.Example.

For instance, if we have a dataset consist of 100 instance with 3 features and 1 target:

Create Feature object

tf.train.Feature(float_list=tf.train.FloatList(value=data))

We need to first construct tf.train.Feature object for each instance. Here, we focus on float features since they are the most commonly used data type. However, tf.train.Feature can be generalized to other format.

np.random.seed(0)
X = np.random.normal(0, 1, 300).reshape(100, 3)
y = np.random.normal(0, 1, 100)
dict_of_features_for_sample_0 = {
    'feature A': tf.train.Feature(float_list=tf.train.FloatList(value=[X[0, 0]])),
    'feature B': tf.train.Feature(float_list=tf.train.FloatList(value=[X[0, 1]])),
    'feature C': tf.train.Feature(float_list=tf.train.FloatList(value=[X[0, 2]])),
    'target': tf.train.Feature(float_list=tf.train.FloatList(value=[y[0]])),
}

Create Features object

tf.train.Features(dict)

Next, we construct Features object using the dictionary created from previous step.

features_for_sample_0 = tf.train.Features(feature=dict_of_features_for_sample_0)

Create Example object

tf.train.Example(features=features)

Lastly, we coonstruct tf.train.Example using tf.train.Features.

example_for_sample_0 = tf.train.Example(features=features_for_sample_0)

Construct serialized string

example.SerializeToString()

example_for_sample_0.SerializeToString()