Amazon SageMaker is a fully managed service that provides developers and data scientists the ability to quickly build, train, and deploy machine learning (ML) models. With SageMaker, you can deploy your ML models on hosted endpoints and get inference results in real time. You can easily view the performance metrics for your endpoints in Amazon CloudWatch, automatically scale endpoints based on traffic, and update your models in production without losing any availability. SageMaker offers a wide variety of options to deploy ML models for inference in any of the following ways, depending on your use case:

For synchronous predictions that need to be served in the order of milliseconds, use SageMaker real-time inference
For workloads that have idle periods between traffic spurts and can tolerate cold starts, use Serverless Inference
For requests with large payload sizes up to 1 GB, long processing times (up to 15

