Amazon SageMaker Serverless Inference is an inference option that enables you to easily deploy machine learning (ML) models for inference without having to configure or manage the underlying infrastructure. SageMaker Serverless Inference is ideal for applications with intermittent or unpredictable traffic. In this post, you’ll see how to use SageMaker Serverless Inference to reduce cost when you deploy an ML model as part of the testing phase of your MLOps pipeline.
Let’s start by using the scenario described in the SageMaker Project template called “MLOps template for model building, training, and deployment”. In this scenario, our MLOps pipeline goes through two main phases, model building and training (Figure 1), followed by model testing and deployment (Figure 2).
Figure 1 : First half of the MLOps pipeline, covering model building and training.
Figure 2 : Second half of the MLOps pipeline, covering model testing and deployment.