AWS Sagemaker has simplified the deployment of machine learning models at scale. Configuring effective autoscaling policies is crucial for balancing performance and cost. This article aims to demonstrate how to set up various autoscaling policies using TypeScript CDK, focusing on request, memory, and CPU-based autoscaling for different ML model types.
Model Types Based on Invocation Patterns
At a high level, model deployment in SageMaker can be broken into three main categories based on invocation patterns: