Amazon SageMaker Data Wrangler is a new capability of Amazon SageMaker that helps data scientists and data engineers quickly and easily prepare data for machine learning (ML) applications using a visual interface. It contains over 300 built-in data transformations so you can quickly normalize, transform, and combine features without having to write any code.
Today, we’re excited to announce the new Data Quality and Insights Report feature within Data Wrangler. This report automatically verifies data quality and detects abnormalities in your data. Data scientists and data engineers can use this tool to efficiently and quickly apply domain knowledge to process datasets for ML model training.
The report includes the following sections:
Summary statistics – This section provides insights into the number of rows, features, % missing, % valid, duplicate rows, and a breakdown of the type of feature (e.g. numeric vs. text).
Data Quality Warnings – This