As machine learning (ML) models have improved, data scientists, ML engineers and researchers have shifted more of their attention to defining and bettering data quality. This has led to the emergence of a data-centric approach to ML and various techniques to improve model performance by focusing on data requirements. Applying these techniques allows ML practitioners to reduce the amount of data required to train an ML model.
As part of this approach, advanced data subset selection techniques have surfaced to speed up training by reducing input data quantity. This process is based on automatically selecting a given number of points that approximate the distribution of a larger dataset and using it for training. Applying this type of technique reduces the amount of time required to train an ML model.
In this post, we describe applying data-centric AI principles with Amazon SageMaker Ground Truth, how to implement data subset selection techniques

Continue reading



At FusionWeb, we aim to look at the future through the lenses of imagination, creativity, expertise and simplicity in the most cost effective ways. All we want to make something that brings smile to our clients face. Let’s try us to believe us.