Amazon SageMaker Data Wrangler reduces the time to aggregate and prepare data for machine learning (ML) from weeks to minutes. With Data Wrangler, you can select and query data with just a few clicks, quickly transform data with over 300 built-in data transformations, and understand your data with built-in visualizations without writing any code.
Additionally, you can create custom transforms unique to your requirements. Custom transforms allow you to write custom transformations using either PySpark, Pandas, or SQL.
Data Wrangler now supports a custom Pandas user-defined function (UDF) transform that can process large datasets efficiently. You can choose from two custom Pandas UDF modes: Pandas and Python. Both modes provide an efficient solution to process datasets, and the mode you choose depends on your preference.
In this post, we demonstrate how to use the new Pandas UDF transform in either mode.
Solution overview
At the time of this writing, you