6-6 Model Deploying Using tensorflow-serving

6-6 Model Deploying Using tensorflow-serving

There are multiple ways to deploy and run the trained models which saved with the original tensorflow format.

For example:

We can load and run the model in the web browser using javascript through tensorflow-js.

We can load and run the TensorFlow model on mobile and embeded devices through tensorflow-lite.

We can use tensorflow-serving to load the model that providing network interface API service and to acquire the prediction results from the model through sending network requests in arbitrary programming languages.

We can predict using the TensorFlow model in Java or spark (scala) through the TensorFlow for Java port.

This section introduces model deploying by tensorflow serving and using spark (scala) to implement the TensorFlow models.

0. Introduction to model deploying by tensorflow serving

The necessary steps of model deploying using tensorflow serving are:

(1) Prepare the protobuf model file.
(2) Install the tensorflow serving.
(3) Start the tensorflow serving service.
(4) Send the request to the API service to obtain the prediction.

You may use the following link for testing (tf_serving, in Chinese) https://colab.research.google.com/drive/1vS5LAYJTEn-H0GDb1irzIuyRB8E3eWc8

%tensorflow_version 2.x
import tensorflow as tf
print(tf.__version__)
from tensorflow.keras import *

1. Prepare the protobuf Model File

Here we train a simple linear regression model with tf.keras and save it as protobuf file.

import tensorflow as tf
from tensorflow.keras import models,layers,optimizers
## Number of samples
n = 800
## Generating testing dataset
X = tf.random.uniform([n,2],minval=-10,maxval=10) 
w0 = tf.constant([[2.0],[-1.0]])
b0 = tf.constant(3.0)
Y = X@w0 + b0 + tf.random.normal([n,1],
    mean = 0.0,stddev= 2.0) # @ is matrix multiplication; adding Gaussian noise
## Modeling
tf.keras.backend.clear_session()
inputs = layers.Input(shape = (2,),name ="inputs") # Set the input name as "inputs"
outputs = layers.Dense(1, name = "outputs")(inputs) # Set the output name as "outputs"
linear = models.Model(inputs = inputs,outputs = outputs)
linear.summary()
## Training with fit method
linear.compile(optimizer="rmsprop",loss="mse",metrics=["mae"])
linear.fit(X,Y,batch_size = 8,epochs = 100)  
tf.print("w = ",linear.layers[1].kernel)
tf.print("b = ",linear.layers[1].bias)
## Save the model as pb format
export_path = "../data/linear_model/"
version = "1"       # Version could be used for management of further updates
linear.save(export_path+version, save_format="tf")

# Check the saved model file
!ls {export_path+version}

assets    saved_model.pb    variables

# Check the info of the model file
!saved_model_cli show --dir {export_path+str(version)} --all

MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:
signature_def['__saved_model_init_op']:
  The given SavedModel SignatureDef contains the following input(s):
  The given SavedModel SignatureDef contains the following output(s):
    outputs['__saved_model_init_op'] tensor_info:
        dtype: DT_INVALID
        shape: unknown_rank
        name: NoOp
  Method name is: 
signature_def['serving_default']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['inputs'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 2)
        name: serving_default_inputs:0
  The given SavedModel SignatureDef contains the following output(s):
    outputs['outputs'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 1)
        name: StatefulPartitionedCall:0
  Method name is: tensorflow/serving/predict
WARNING:tensorflow:From /tensorflow-2.1.0/python3.6/tensorflow_core/python/ops/resource_variable_ops.py:1786: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
Defined Functions:
  Function Name: '__call__'
    Option #1
      Callable with:
        Argument #1
          inputs: TensorSpec(shape=(None, 2), dtype=tf.float32, name='inputs')
        Argument #2
          DType: bool
          Value: False
        Argument #3
          DType: NoneType
          Value: None
    Option #2
      Callable with:
        Argument #1
          inputs: TensorSpec(shape=(None, 2), dtype=tf.float32, name='inputs')
        Argument #2
          DType: bool
          Value: True
        Argument #3
          DType: NoneType
          Value: None
  Function Name: '_default_save_signature'
    Option #1
      Callable with:
        Argument #1
          inputs: TensorSpec(shape=(None, 2), dtype=tf.float32, name='inputs')
  Function Name: 'call_and_return_all_conditional_losses'
    Option #1
      Callable with:
        Argument #1
          inputs: TensorSpec(shape=(None, 2), dtype=tf.float32, name='inputs')
        Argument #2
          DType: bool
          Value: True
        Argument #3
          DType: NoneType
          Value: None
    Option #2
      Callable with:
        Argument #1
          inputs: TensorSpec(shape=(None, 2), dtype=tf.float32, name='inputs')
        Argument #2
          DType: bool
          Value: False
        Argument #3
          DType: NoneType
          Value: None

2. Installing tensorflow serving

Two methods for installing tensorflow serving: Using Docker images, or using apt.

Docker image is the simplest way of installation and we recommend it.

Docker is a container that provides independent environment for various programs.

The companies that are using TensorFlow usually use Docker to install tensorflow serving by operation experts, so the algorithm engineers don’t have to worry about the installation.

The installation of Docker on different OS are shown below (in Chinese).

Windows: https://www.runoob.com/docker/windows-docker-install.html

MacOs: https://www.runoob.com/docker/macos-docker-install.html

CentOS: https://www.runoob.com/docker/centos-docker-install.html

After successful installation of Docker, run the following command to load the tensorflow/serving image.

docker pull tensorflow/serving

3. Starting tensorflow serving Service

!docker run -t --rm -p 8501:8501 \
    -v "/Users/.../data/linear_model/" \
    -e MODEL_NAME=linear_model \
    tensorflow/serving & >server.log 2>&1

4. Sending request to the API service

The request could be sent through http function in any kind of the programming languages. We demonstrate request sending using the curl command in Linux and the requests library in Python.

!curl -d '{"instances": [1.0, 2.0, 5.0]}' \
    -X POST http://localhost:8501/v1/models/linear_model:predict

{
    "predictions": [[3.06546211], [5.01313448]
    ]
}

import json,requests
data = json.dumps({"signature_name": "serving_default", "instances": [[1.0, 2.0], [5.0,7.0]]})
headers = {"content-type": "application/json"}
json_response = requests.post('http://localhost:8501/v1/models/linear_model:predict', 
        data=data, headers=headers)
predictions = json.loads(json_response.text)["predictions"]
print(predictions)

[[3.06546211], [6.02843142]]

Please leave comments in the WeChat official account “Python与算法之美” (Elegance of Python and Algorithms) if you want to communicate with the author about the content. The author will try best to reply given the limited time available.

You are also welcomed to join the group chat with the other readers through replying 加群 (join group) in the WeChat official account.