Until recently, customers who wanted to use a deep learning (DL) framework with Amazon SageMaker Processing faced increased complexity compared to those using scikit-learn or Apache Spark. This post shows you how SageMaker Processing has simplified running machine learning (ML) preprocessing and postprocessing tasks with popular frameworks such as PyTorch, TensorFlow, Hugging Face, MXNet, and XGBoost.
Benefits of SageMaker Processing
Training an ML model takes many steps. One of them, data preparation, is paramount to creating an accurate ML model. A typical preprocessing step includes operations such as the following:
Converting the dataset to the input format expected by the ML algorithm that you’re using
Transforming existing features to a more expressive representation, such as one-hot encoding categorical features
Rescaling or normalizing numerical features
Engineering high-level features; for example, replacing mailing addresses with GPS coordinates
Cleaning and tokenizing text for natural language processing (NLP)