Amazon SageMaker Data Wrangler is a UI-based data preparation tool that helps perform data analysis, preprocessing, and visualization with features to clean, transform, and prepare data faster. Data Wrangler pre-built flow templates help make data preparation quicker for data scientists and machine learning (ML) practitioners by helping you accelerate and understand best practice patterns for data flows using common datasets.
You can use Data Wrangler flows to perform the following tasks:
Data visualization – Examining statistical properties for each column in the dataset, building histograms, studying outliers
Data cleaning – Removing duplicates, dropping or filling entries with missing values, removing outliers
Data enrichment and feature engineering – Processing columns to build more expressive features, selecting a subset of features for training
This post will help you understand Data Wrangler using the following sample pre-built flows on GitHub. The repository showcases tabular data transformation, time series