Starting today, you can use Amazon SageMaker Autopilot to tackle regression and classification tasks on large datasets up to 100 GB. Additionally, you can now provide your datasets in either CSV or Apache Parquet content types.
Businesses are generating more data than ever. A corresponding demand is growing for generating insights from these large datasets to shape business decisions. However, successfully training state-of-the-art machine learning (ML) algorithms on these large datasets can be challenging. Autopilot automates this process and provides a seamless experience for running automated machine learning (AutoML) on large datasets up to 100 GB.
Autopilot subsamples your large datasets automatically to fit the maximum supported limit while preserving the rare class in case of class imbalance. Class imbalance is an important problem to be aware of in ML, especially when dealing with large datasets. Consider a fraud detection dataset where only a small fraction of transactions is expected

Continue reading



At FusionWeb, we aim to look at the future through the lenses of imagination, creativity, expertise and simplicity in the most cost effective ways. All we want to make something that brings smile to our clients face. Let’s try us to believe us.