TensorFlow Serving

Serving TensorFlow models

Serving a model

To deploy a model we create following resources as illustrated below

  • A deployment to deploy the model using TFServing
  • A K8s service to create an endpoint a service
  • An Istio virtual service to route traffic to the model and expose it through the Istio gateway
  • An Istio DestinationRule is for doing traffic splitting.
  1. apiVersion: v1
  2. kind: Service
  3. metadata:
  4. labels:
  5. app: mnist
  6. name: mnist-service
  7. namespace: kubeflow
  8. spec:
  9. ports:
  10. - name: grpc-tf-serving
  11. port: 9000
  12. targetPort: 9000
  13. - name: http-tf-serving
  14. port: 8500
  15. targetPort: 8500
  16. selector:
  17. app: mnist
  18. type: ClusterIP
  19. ---
  20. apiVersion: extensions/v1beta1
  21. kind: Deployment
  22. metadata:
  23. labels:
  24. app: mnist
  25. name: mnist-v1
  26. namespace: kubeflow
  27. spec:
  28. template:
  29. metadata:
  30. annotations:
  31. sidecar.istio.io/inject: "true"
  32. labels:
  33. app: mnist
  34. version: v1
  35. spec:
  36. containers:
  37. - args:
  38. - --port=9000
  39. - --rest_api_port=8500
  40. - --model_name=mnist
  41. - --model_base_path=YOUR_MODEL
  42. command:
  43. - /usr/bin/tensorflow_model_server
  44. image: tensorflow/serving:1.11.1
  45. imagePullPolicy: IfNotPresent
  46. livenessProbe:
  47. initialDelaySeconds: 30
  48. periodSeconds: 30
  49. tcpSocket:
  50. port: 9000
  51. name: mnist
  52. ports:
  53. - containerPort: 9000
  54. - containerPort: 8500
  55. resources:
  56. limits:
  57. cpu: "4"
  58. memory: 4Gi
  59. requests:
  60. cpu: "1"
  61. memory: 1Gi
  62. volumeMounts:
  63. - mountPath: /var/config/
  64. name: config-volume
  65. volumes:
  66. - configMap:
  67. name: mnist-v1-config
  68. name: config-volume
  69. ---
  70. apiVersion: networking.istio.io/v1alpha3
  71. kind: DestinationRule
  72. metadata:
  73. labels:
  74. name: mnist-service
  75. namespace: kubeflow
  76. spec:
  77. host: mnist-service
  78. subsets:
  79. - labels:
  80. version: v1
  81. name: v1
  82. ---
  83. apiVersion: networking.istio.io/v1alpha3
  84. kind: VirtualService
  85. metadata:
  86. labels:
  87. name: mnist-service
  88. namespace: kubeflow
  89. spec:
  90. gateways:
  91. - kubeflow-gateway
  92. hosts:
  93. - '*'
  94. http:
  95. - match:
  96. - method:
  97. exact: POST
  98. uri:
  99. prefix: /tfserving/models/mnist
  100. rewrite:
  101. uri: /v1/models/mnist:predict
  102. route:
  103. - destination:
  104. host: mnist-service
  105. port:
  106. number: 8500
  107. subset: v1
  108. weight: 100

Referring to the above example, you can customize your deployment by changing the following configurations in the YAML file:

  • In the deployment resource, the model_base_path argument points to the model.Change the value to your own model.

  • The example contains three configurations for Google Cloud Storage (GCS) access:volumes (secret user-gcp-sa), volumeMounts, andenv (GOOGLE_APPLICATION_CREDENTIALS).If your model is not at GCS (e.g. using S3 from AWS), See the section below onhow to setup access.

  • GPU. If you want to use GPU, add nvidia.com/gpu: 1in container resources, and use a GPU image, for example:tensorflow/serving:1.11.1-gpu.

  1. resources:
  2. limits:
  3. cpu: "4"
  4. memory: 4Gi
  5. nvidia.com/gpu: 1
  • The resource VirtualService and DestinationRule are for routing.With the example above, the model is accessible at HOSTNAME/tfserving/models/mnist(HOSTNAME is your Kubeflow deployment hostname). To change the path, edit thehttp.match.uri of VirtualService.

Pointing to the model

Depending where model file is located, set correct parameters

Google cloud

Change the deployment spec as follows:

  1. spec:
  2. template:
  3. metadata:
  4. annotations:
  5. sidecar.istio.io/inject: "true"
  6. labels:
  7. app: mnist
  8. version: v1
  9. spec:
  10. containers:
  11. - args:
  12. - --port=9000
  13. - --rest_api_port=8500
  14. - --model_name=mnist
  15. - --model_base_path=gs://kubeflow-examples-data/mnist
  16. command:
  17. - /usr/bin/tensorflow_model_server
  18. env:
  19. - name: GOOGLE_APPLICATION_CREDENTIALS
  20. value: /secret/gcp-credentials/user-gcp-sa.json
  21. image: tensorflow/serving:1.11.1-gpu
  22. imagePullPolicy: IfNotPresent
  23. livenessProbe:
  24. initialDelaySeconds: 30
  25. periodSeconds: 30
  26. tcpSocket:
  27. port: 9000
  28. name: mnist
  29. ports:
  30. - containerPort: 9000
  31. - containerPort: 8500
  32. resources:
  33. limits:
  34. cpu: "4"
  35. memory: 4Gi
  36. nvidia.com/gpu: 1
  37. requests:
  38. cpu: "1"
  39. memory: 1Gi
  40. volumeMounts:
  41. - mountPath: /var/config/
  42. name: config-volume
  43. - mountPath: /secret/gcp-credentials
  44. name: gcp-credentials
  45. volumes:
  46. - configMap:
  47. name: mnist-v1-config
  48. name: config-volume
  49. - name: gcp-credentials
  50. secret:
  51. secretName: user-gcp-sa

The changes are:

  • environment variable GOOGLE_APPLICATION_CREDENTIALS
  • volume gcp-credentials
  • volumeMount gcp-credentials

We need a service account that can access the model.If you are using Kubeflow’s click-to-deploy app, there should be already a secret, user-gcp-sa, in the cluster.

The model at gs://kubeflow-examples-data/mnist is publicly accessible. However, if your environment doesn’thave google cloud credential setup, TF serving will not be able to read the model.See this issue for example.To setup the google cloud credential, you should either have the environment variableGOOGLE_APPLICATION_CREDENTIALS pointing to the credential file, or run gcloud auth login.See doc for more detail.

S3

To use S3, first you need to create secret that will contain access credentials. Use base64 to encode your credentials and check details in the Kubernetes guide to creating a secret manually

  1. apiVersion: v1
  2. metadata:
  3. name: secretname
  4. data:
  5. AWS_ACCESS_KEY_ID: bmljZSB0cnk6KQ==
  6. AWS_SECRET_ACCESS_KEY: YnV0IHlvdSBkaWRuJ3QgZ2V0IG15IHNlY3JldCE=
  7. kind: Secret

Then use the following manifest as an example:

  1. apiVersion: extensions/v1beta1
  2. kind: Deployment
  3. metadata:
  4. labels:
  5. app: s3
  6. name: s3
  7. namespace: kubeflow
  8. spec:
  9. template:
  10. metadata:
  11. annotations:
  12. sidecar.istio.io/inject: null
  13. labels:
  14. app: s3
  15. version: v1
  16. spec:
  17. containers:
  18. - args:
  19. - --port=9000
  20. - --rest_api_port=8500
  21. - --model_name=s3
  22. - --model_base_path=s3://abc
  23. - --monitoring_config_file=/var/config/monitoring_config.txt
  24. command:
  25. - /usr/bin/tensorflow_model_server
  26. env:
  27. - name: AWS_ACCESS_KEY_ID
  28. valueFrom:
  29. secretKeyRef:
  30. key: AWS_ACCESS_KEY_ID
  31. name: secretname
  32. - name: AWS_SECRET_ACCESS_KEY
  33. valueFrom:
  34. secretKeyRef:
  35. key: AWS_SECRET_ACCESS_KEY
  36. name: secretname
  37. - name: AWS_REGION
  38. value: us-west-1
  39. - name: S3_USE_HTTPS
  40. value: "true"
  41. - name: S3_VERIFY_SSL
  42. value: "true"
  43. - name: S3_ENDPOINT
  44. value: s3.us-west-1.amazonaws.com
  45. image: tensorflow/serving:1.11.1
  46. imagePullPolicy: IfNotPresent
  47. livenessProbe:
  48. initialDelaySeconds: 30
  49. periodSeconds: 30
  50. tcpSocket:
  51. port: 9000
  52. name: s3
  53. ports:
  54. - containerPort: 9000
  55. - containerPort: 8500
  56. resources:
  57. limits:
  58. cpu: "4"
  59. memory: 4Gi
  60. requests:
  61. cpu: "1"
  62. memory: 1Gi
  63. volumeMounts:
  64. - mountPath: /var/config/
  65. name: config-volume
  66. volumes:
  67. - configMap:
  68. name: s3-config
  69. name: config-volume

Sending prediction request directly

If the service type is LoadBalancer, it will have its own accessible external ip.Get the external ip by:

  1. kubectl get svc mnist-service

And then send the request

  1. curl -X POST -d @input.json http://EXTERNAL_IP:8500/v1/models/mnist:predict

Sending prediction request through ingress and IAP

If the service type is ClusterIP, you can access through ingress.It’s protected and only one with right credentials can access the endpoint.Below shows how to programmatically authenticate a service account to access IAP.

  • Save the client ID that you used todeploy Kubeflow as IAP_CLIENT_ID.
  • Create a service account gcloud iam service-accounts create —project=$PROJECT $SERVICE_ACCOUNT
  • Grant the service account access to IAP enabled resources: gcloud projects add-iam-policy-binding $PROJECT \ —role roles/iap.httpsResourceAccessor \ —member serviceAccount:$SERVICE_ACCOUNT
  • Download the service account key: gcloud iam service-accounts keys create ${KEY_FILE} \ —iam-account ${SERVICE_ACCOUNT}@${PROJECT}.iam.gserviceaccount.com
  • Export the environment variable GOOGLE_APPLICATION_CREDENTIALS to point to the key file of the service account. Finally, you can send the request with this pythonscript
  1. python iap_request.py https://YOUR_HOST/tfserving/models/mnist IAP_CLIENT_ID --input=YOUR_INPUT_FILE

Telemetry and Rolling out model using Istio

Please look at the Istio guide.

Logs and metrics with Stackdriver

See the guide to logging and monitoringfor instructions on getting logs and metrics using Stackdriver.