1-2 Example: Modeling Procedure for Images

1. Data Preparation

The cifar2 dataset is a sub-set of cifar10, which only contains two classes: airplane and automobile.

Each class contains 5000 images for training and 1000 images for testing.

The goal for this task is to train a model to classify images as airplane or automobile.

The files of cifar2 are organized as below:

1-2 Example: Modeling Procedure for Images - 图1

There are two ways of image preparation in TensorFlow.

The first one is constructing the image data generator using ImageDataGenerator in tf.keras.

The second one is constructing data pipeline using tf.data.Dataset and several methods in tf.image

The former is simpler and is demonstrated in this article (in Chinese).

The latter is the original method of TensorFlow, which is more flexible with possible better performance with proper usage.

Below is the introduction to the second method.

  1. import tensorflow as tf
  2. from tensorflow.keras import datasets,layers,models
  3. BATCH_SIZE = 100
  4. def load_image(img_path,size = (32,32)):
  5. label = tf.constant(1,tf.int8) if tf.strings.regex_full_match(img_path,".*automobile.*") \
  6. else tf.constant(0,tf.int8)
  7. img = tf.io.read_file(img_path)
  8. img = tf.image.decode_jpeg(img) #In jpeg format
  9. img = tf.image.resize(img,size)/255.0
  10. return(img,label)
  1. #Parallel pre-processing using num_parallel_calls and caching data with prefetch function to improve the performance
  2. ds_train = tf.data.Dataset.list_files("../data/cifar2/train/*/*.jpg") \
  3. .map(load_image, num_parallel_calls=tf.data.experimental.AUTOTUNE) \
  4. .shuffle(buffer_size = 1000).batch(BATCH_SIZE) \
  5. .prefetch(tf.data.experimental.AUTOTUNE)
  6. ds_test = tf.data.Dataset.list_files("../data/cifar2/test/*/*.jpg") \
  7. .map(load_image, num_parallel_calls=tf.data.experimental.AUTOTUNE) \
  8. .batch(BATCH_SIZE) \
  9. .prefetch(tf.data.experimental.AUTOTUNE)
  1. %matplotlib inline
  2. %config InlineBackend.figure_format = 'svg'
  3. #Checking part of the samples
  4. from matplotlib import pyplot as plt
  5. plt.figure(figsize=(8,8))
  6. for i,(img,label) in enumerate(ds_train.unbatch().take(9)):
  7. ax=plt.subplot(3,3,i+1)
  8. ax.imshow(img.numpy())
  9. ax.set_title("label = %d"%label)
  10. ax.set_xticks([])
  11. ax.set_yticks([])
  12. plt.show()

1-2 Example: Modeling Procedure for Images - 图2

  1. for x,y in ds_train.take(1):
  2. print(x.shape,y.shape)
  1. (100, 32, 32, 3) (100,)

2. Model Definition

Usually there are three ways of modeling using APIs of Keras: sequential modeling using Sequential() function, arbitrary modeling using functional API, and customized modeling by inheriting base class Model.

Here we use API functions for modeling.

  1. tf.keras.backend.clear_session() #Clearing the session
  2. inputs = layers.Input(shape=(32,32,3))
  3. x = layers.Conv2D(32,kernel_size=(3,3))(inputs)
  4. x = layers.MaxPool2D()(x)
  5. x = layers.Conv2D(64,kernel_size=(5,5))(x)
  6. x = layers.MaxPool2D()(x)
  7. x = layers.Dropout(rate=0.1)(x)
  8. x = layers.Flatten()(x)
  9. x = layers.Dense(32,activation='relu')(x)
  10. outputs = layers.Dense(1,activation = 'sigmoid')(x)
  11. model = models.Model(inputs = inputs,outputs = outputs)
  12. model.summary()
  1. Model: "model"
  2. _________________________________________________________________
  3. Layer (type) Output Shape Param #
  4. =================================================================
  5. input_1 (InputLayer) [(None, 32, 32, 3)] 0
  6. _________________________________________________________________
  7. conv2d (Conv2D) (None, 30, 30, 32) 896
  8. _________________________________________________________________
  9. max_pooling2d (MaxPooling2D) (None, 15, 15, 32) 0
  10. _________________________________________________________________
  11. conv2d_1 (Conv2D) (None, 11, 11, 64) 51264
  12. _________________________________________________________________
  13. max_pooling2d_1 (MaxPooling2 (None, 5, 5, 64) 0
  14. _________________________________________________________________
  15. dropout (Dropout) (None, 5, 5, 64) 0
  16. _________________________________________________________________
  17. flatten (Flatten) (None, 1600) 0
  18. _________________________________________________________________
  19. dense (Dense) (None, 32) 51232
  20. _________________________________________________________________
  21. dense_1 (Dense) (None, 1) 33
  22. =================================================================
  23. Total params: 103,425
  24. Trainable params: 103,425
  25. Non-trainable params: 0
  26. _________________________________________________________________

3. Model Training

There are three usual ways for model training: use internal function fit, use internal function train_on_batch, and customized training loop. Here we introduce the simplist way: using internal function fit.

  1. import datetime
  2. import os
  3. stamp = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
  4. logdir = os.path.join('data', 'autograph', stamp)
  5. ## We recommend using pathlib under Python3
  6. # from pathlib import Path
  7. # stamp = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
  8. # logdir = str(Path('../data/autograph/' + stamp))
  9. tensorboard_callback = tf.keras.callbacks.TensorBoard(logdir, histogram_freq=1)
  10. model.compile(
  11. optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
  12. loss=tf.keras.losses.binary_crossentropy,
  13. metrics=["accuracy"]
  14. )
  15. history = model.fit(ds_train,epochs= 10,validation_data=ds_test,
  16. callbacks = [tensorboard_callback],workers = 4)
  1. Train for 100 steps, validate for 20 steps
  2. Epoch 1/10
  3. 100/100 [==============================] - 16s 156ms/step - loss: 0.4830 - accuracy: 0.7697 - val_loss: 0.3396 - val_accuracy: 0.8475
  4. Epoch 2/10
  5. 100/100 [==============================] - 14s 142ms/step - loss: 0.3437 - accuracy: 0.8469 - val_loss: 0.2997 - val_accuracy: 0.8680
  6. Epoch 3/10
  7. 100/100 [==============================] - 13s 131ms/step - loss: 0.2871 - accuracy: 0.8777 - val_loss: 0.2390 - val_accuracy: 0.9015
  8. Epoch 4/10
  9. 100/100 [==============================] - 12s 117ms/step - loss: 0.2410 - accuracy: 0.9040 - val_loss: 0.2005 - val_accuracy: 0.9195
  10. Epoch 5/10
  11. 100/100 [==============================] - 13s 130ms/step - loss: 0.1992 - accuracy: 0.9213 - val_loss: 0.1949 - val_accuracy: 0.9180
  12. Epoch 6/10
  13. 100/100 [==============================] - 14s 136ms/step - loss: 0.1737 - accuracy: 0.9323 - val_loss: 0.1723 - val_accuracy: 0.9275
  14. Epoch 7/10
  15. 100/100 [==============================] - 14s 139ms/step - loss: 0.1531 - accuracy: 0.9412 - val_loss: 0.1670 - val_accuracy: 0.9310
  16. Epoch 8/10
  17. 100/100 [==============================] - 13s 134ms/step - loss: 0.1299 - accuracy: 0.9525 - val_loss: 0.1553 - val_accuracy: 0.9340
  18. Epoch 9/10
  19. 100/100 [==============================] - 14s 137ms/step - loss: 0.1158 - accuracy: 0.9556 - val_loss: 0.1581 - val_accuracy: 0.9340
  20. Epoch 10/10
  21. 100/100 [==============================] - 14s 142ms/step - loss: 0.1006 - accuracy: 0.9617 - val_loss: 0.1614 - val_accuracy: 0.9345

4. Model Evaluation

  1. %load_ext tensorboard
  2. #%tensorboard --logdir ../data/keras_model
  1. from tensorboard import notebook
  2. notebook.list()
  1. #Checking model in tensorboard
  2. notebook.start("--logdir ../data/keras_model")

1-2 Example: Modeling Procedure for Images - 图3

  1. import pandas as pd
  2. dfhistory = pd.DataFrame(history.history)
  3. dfhistory.index = range(1,len(dfhistory) + 1)
  4. dfhistory.index.name = 'epoch'
  5. dfhistory

1-2 Example: Modeling Procedure for Images - 图4

  1. %matplotlib inline
  2. %config InlineBackend.figure_format = 'svg'
  3. import matplotlib.pyplot as plt
  4. def plot_metric(history, metric):
  5. train_metrics = history.history[metric]
  6. val_metrics = history.history['val_'+metric]
  7. epochs = range(1, len(train_metrics) + 1)
  8. plt.plot(epochs, train_metrics, 'bo--')
  9. plt.plot(epochs, val_metrics, 'ro-')
  10. plt.title('Training and validation '+ metric)
  11. plt.xlabel("Epochs")
  12. plt.ylabel(metric)
  13. plt.legend(["train_"+metric, 'val_'+metric])
  14. plt.show()
  1. plot_metric(history,"loss")

1-2 Example: Modeling Procedure for Images - 图5

  1. plot_metric(history,"accuracy")

1-2 Example: Modeling Procedure for Images - 图6

  1. #Evaluating data using model.evaluate function
  2. val_loss,val_accuracy = model.evaluate(ds_test,workers=4)
  3. print(val_loss,val_accuracy)
  1. 0.16139143370091916 0.9345

5. Model Application

We can use model.predict(ds_test) for prediction.

We can also use model.predict_on_batch(x_test) to predict a batch of data.

  1. model.predict(ds_test)
  1. array([[9.9996173e-01],
  2. [9.5104784e-01],
  3. [2.8648047e-04],
  4. ...,
  5. [1.1484033e-03],
  6. [3.5589080e-02],
  7. [9.8537153e-01]], dtype=float32)
  1. for x,y in ds_test.take(1):
  2. print(model.predict_on_batch(x[0:20]))
  1. tf.Tensor(
  2. [[3.8065155e-05]
  3. [8.8236779e-01]
  4. [9.1433197e-01]
  5. [9.9921846e-01]
  6. [6.4052093e-01]
  7. [4.9970779e-03]
  8. [2.6735585e-04]
  9. [9.9842811e-01]
  10. [7.9198682e-01]
  11. [7.4823302e-01]
  12. [8.7208226e-03]
  13. [9.3951421e-03]
  14. [9.9790359e-01]
  15. [9.9998581e-01]
  16. [2.1642199e-05]
  17. [1.7915063e-02]
  18. [2.5839690e-02]
  19. [9.7538447e-01]
  20. [9.7393811e-01]
  21. [9.7333014e-01]], shape=(20, 1), dtype=float32)

6. Model Saving

We recommend model saving with the original way of TensorFlow.

  1. # Saving the weights, this way only save the tensors of the weights
  2. model.save_weights('../data/tf_model_weights.ckpt',save_format = "tf")
  1. # Saving model structure and parameters to a file, so the model allows cross-platform deployment
  2. model.save('../data/tf_model_savedmodel', save_format="tf")
  3. print('export saved model.')
  4. model_loaded = tf.keras.models.load_model('../data/tf_model_savedmodel')
  5. model_loaded.evaluate(ds_test)
  1. [0.16139124035835267, 0.9345]

Please leave comments in the WeChat official account “Python与算法之美” (Elegance of Python and Algorithms) if you want to communicate with the author about the content. The author will try best to reply given the limited time available.

You are also welcomed to join the group chat with the other readers through replying 加群 (join group) in the WeChat official account.

image.png