Questions
What is TensorFlow Serving and how do you use it to deploy a model?
The Scenario
You are an MLOps engineer at a fintech company. Your team has just finished training a new fraud detection model. The model has been tested and approved for deployment to production.
The production environment has the following requirements:
- High availability: The model must be available 24/7.
- Low latency: The model must respond to requests in under 50ms.
- Scalability: The model must be able to handle a large number of requests per second.
- Easy to manage: The model must be easy to deploy, update, and monitor.
The Challenge
Explain your strategy for deploying this model to production using TensorFlow Serving. What are the key features of TensorFlow Serving that you would use to meet the requirements of the production environment?
A junior engineer might suggest deploying the model as a simple Flask app. This would be a quick and easy solution, but it would not be reliable, scalable, or easy to manage. They might not be aware of the benefits of using a dedicated serving system like TensorFlow Serving.
A senior engineer would know that TensorFlow Serving is the best tool for this job. They would be able to explain how to use its features for high availability, low latency, scalability, and easy management. They would also have a clear plan for how to configure TensorFlow Serving for a production environment.
Step 1: Why TensorFlow Serving?
Before we dive into the code, let’s compare TensorFlow Serving with a custom Flask app.
| Feature | TensorFlow Serving | Custom Flask App |
|---|---|---|
| Performance | Very fast, written in C++ and highly optimized for performance. | Can be slow, especially if it is not well-optimized. |
| Scalability | Can be scaled to serve a large number of requests. | Can be difficult to scale. |
| Reliability | Very reliable, designed for production environments. | Can be unreliable, especially if it is not well-tested. |
| Management | Easy to deploy, update, and monitor. | Can be difficult to manage, especially if you have a large number of models. |
For our use case, TensorFlow Serving is the best choice. It is fast, scalable, reliable, and easy to manage.
Step 2: Export the Model
The first step is to export the model in the SavedModel format.
model.save("fraud_detection_model/1")Note that we have added a version number to the path. This is important for model versioning.
Step 3: Configure TensorFlow Serving
The next step is to configure TensorFlow Serving. We can do this by creating a models.config file:
model_config_list: {
config: {
name: "fraud_detection_model",
base_path: "/models/fraud_detection_model",
model_platform: "tensorflow",
model_version_policy: {
specific: {
versions: 1
}
}
}
}This file tells TensorFlow Serving where to find the model and which version to serve.
Step 4: Deploy with Docker
The easiest way to deploy TensorFlow Serving is to use Docker.
docker run -p 8501:8501 \
--mount type=bind,source=/path/to/models.config,target=/models/models.config \
--mount type=bind,source=/path/to/fraud_detection_model,target=/models/fraud_detection_model \
-t tensorflow/serving --model_config_file=/models/models.configStep 5: Advanced Features
Here are some advanced features of TensorFlow Serving that we can use to meet the requirements of our production environment:
- Model Versioning: We can use model versioning to roll out new versions of the model without any downtime.
- Batching: We can use batching to group incoming requests together and process them in a single batch. This can significantly improve performance.
- Monitoring: We can use the built-in monitoring features of TensorFlow Serving to monitor the health and performance of the model.
Practice Question
You want to be able to A/B test two different versions of your model in production. Which feature of TensorFlow Serving would you use?