Prepare data from Amazon EMR for machine learning using Amazon SageMaker Data Wrangler

Data preparation is a principal component of machine learning (ML) pipelines. In fact, it is estimated that data professionals spend about 80 percent of their time on data preparation. In this intensive competitive market, teams want to analyze data and extract more meaningful insights quickly. Customers are adopting more efficient and visual ways to build data processing systems.
Amazon SageMaker Data Wrangler simplifies the data preparation and feature engineering process, reducing the time it takes from weeks to minutes by providing a single visual interface for data scientists to select, clean data, create features, and automate data preparation in ML workflows without writing any code. You can import data from multiple data sources, such as Amazon Simple Storage Service (Amazon S3), Amazon Athena, Amazon Redshift, and Snowflake. You can now also use Amazon EMR as a data source in Data Wrangler to easily prepare data for ML.
Analyzing, transforming, and

Continue reading

Exafunction supports AWS Inferentia to unlock best price performance for machine learning inference

Across all industries, machine learning (ML) models are getting deeper, workflows are getting more complex, and workloads are operating at larger scales. Significant effort and resources are put into making these models more accurate since this investment directly results in better products and experiences. On the other hand, making these models run efficiently in production is a non-trivial undertaking that’s often overlooked, despite being key to achieving performance and budget goals. In this post we cover how Exafunction and AWS Inferentia work together to unlock easy and cost-efficient deployment for ML models in production.
Exafunction is a start-up focused on enabling companies to perform ML at scale as efficiently as possible. One of their products is ExaDeploy, an easy-to-use SaaS solution to serve ML workloads at scale. ExaDeploy efficiently orchestrates your ML workloads across mixed resources (CPU and hardware accelerators) to maximize resource utilization. It also takes care of auto

Continue reading

Damage assessment using Amazon SageMaker geospatial capabilities and custom SageMaker models

In this post, we show how to train, deploy, and predict natural disaster damage with Amazon SageMaker with geospatial capabilities. We use the new SageMaker geospatial capabilities to generate new inference data to test the model. Many government and humanitarian organizations need quick and accurate situational awareness when a disaster strikes. Knowing the severity, cause, and location of damage can assist in the first responder’s response strategy and decision-making. The lack of accurate and timely information can contribute to an incomplete or misdirected relief effort.
As the frequency and severity of natural disasters increases, it’s important that we equip decision-makers and first responders with fast and accurate damage assessment. In this example, we use geospatial imagery to predict natural disaster damage. Geospatial data can be used in the immediate aftermath of a natural disaster for rapidly identifying damage to buildings, roads, or other critical infrastructure. In this post, we show

Continue reading

Deploy Amazon SageMaker Autopilot models to serverless inference endpoints

Amazon SageMaker Autopilot automatically builds, trains, and tunes the best machine learning (ML) models based on your data, while allowing you to maintain full control and visibility. Autopilot can also deploy trained models to real-time inference endpoints automatically.
If you have workloads with spiky or unpredictable traffic patterns that can tolerate cold starts, then deploying the model to a serverless inference endpoint would be more cost efficient.
Amazon SageMaker Serverless Inference is a purpose-built inference option ideal for workloads with unpredictable traffic patterns and that can tolerate cold starts. Unlike a real-time inference endpoint, which is backed by a long-running compute instance, serverless endpoints provision resources on demand with built-in auto scaling. Serverless endpoints scale automatically based on the number of incoming requests and scale down resources to zero when there are no incoming requests, helping you minimize your costs.
In this post, we show how to deploy Autopilot trained

Continue reading

Improve scalability for Amazon Rekognition stateless APIs using multiple regions

In previous blog post, we described an end-to-end identity verification solution in a single AWS Region. The solution uses the Amazon Rekognition APIs DetectFaces for face detection and CompareFaces for face comparison. We think of those APIs as stateless APIs because they don’t depend on an Amazon Rekognition face collection. They’re also idempotent, meaning repeated calls with the same parameters will return the same result. They provide flexible options on passing images, either through an Amazon Simple Storage Service (Amazon S3) location or raw bytes.
In this post, we focus on Amazon Rekognition Image stateless APIs, and discuss two options of passing images and when to choose one over the other from a system architecture point of view. Then we discuss how to scale the stateless APIs to overcome some Regional limitations. When talking about scalability, we often refer to the maximum transactions per second (TPS) the solution can handle.

Continue reading

Use your own training scripts and automatically select the best model using hyperparameter optimization in Amazon SageMaker

The success of any machine learning (ML) pipeline depends not just on the quality of model used, but also the ability to train and iterate upon this model. One of the key ways to improve an ML model is by choosing better tunable parameters, known as hyperparameters. This is known as hyperparameter optimization (HPO). However, doing this tuning manually can often be cumbersome due to the size of the search space, sometimes involving thousands of training iterations.
This post shows how Amazon SageMaker enables you to not only bring your own model algorithm using script mode, but also use the built-in HPO algorithm. You will learn how to easily output the evaluation metric of choice to Amazon CloudWatch, from which you can extract this metric to guide the automatic HPO algorithm. You can then create an HPO tuning job that orchestrates several training jobs and associated compute resources. Upon completion,

Continue reading

Build a robust text-based toxicity predictor

With the growth and popularity of online social platforms, people can stay more connected than ever through tools like instant messaging. However, this raises an additional concern about toxic speech, as well as cyber bullying, verbal harassment, or humiliation. Content moderation is crucial for promoting healthy online discussions and creating healthy online environments. To detect toxic language content, researchers have been developing deep learning-based natural language processing (NLP) approaches. Most recent methods employ transformer-based pre-trained language models and achieve high toxicity detection accuracy.
In real-world toxicity detection applications, toxicity filtering is mostly used in security-relevant industries like gaming platforms, where models are constantly being challenged by social engineering and adversarial attacks. As a result, directly deploying text-based NLP toxicity detection models could be problematic, and preventive measures are necessary.
Research has shown that deep neural network models don’t make accurate predictions when faced with adversarial examples. There has been a

Continue reading

Metrics for evaluating an identity verification solution

Globally, there has been an accelerated shift toward frictionless digital user experiences. Whether it’s registering at a website, transacting online, or simply logging in to your bank account, organizations are actively trying to reduce the friction their customers experience while at the same time enhance their security, compliance, and fraud prevention measures. The shift toward frictionless user experiences has given rise to face-based biometric identity verification solutions aimed at answering the question “How do you verify a person in the digital world?”
There are two key advantages of facial biometrics when it comes to questions of identification and authentication. First, it’s a convenient technology for users: there is no need to remember a password, deal with multi-factor challenges, click verification links, or solve CAPTCHA puzzles. Secondly, a high level of security is achieved: identification and authentication on the basis of facial-biometrics is secure and less susceptible to fraud and attacks.

Continue reading

Introducing one-step classification and entity recognition with Amazon Comprehend for intelligent document processing

“Intelligent document processing (IDP) solutions extract data to support automation of high-volume, repetitive document processing tasks and for analysis and insight. IDP uses natural language technologies and computer vision to extract data from structured and unstructured content, especially from documents, to support automation and augmentation.”  – Gartner
The goal of Amazon’s intelligent document processing (IDP) is to automate the processing of large amounts of documents using machine learning (ML) in order to increase productivity, reduce costs associated with human labor, and provide a seamless user experience. Customers spend a significant amount of time and effort identifying documents and extracting critical information from them for various use cases. Today, Amazon Comprehend supports classification for plain text documents, which requires you to preprocess documents in semi-structured formats (scanned, digital PDF or images such as PNG, JPG, TIFF) and then use the plain text output to run inference with your custom classification model.

Continue reading

Illustrative notebooks in Amazon SageMaker JumpStart

Amazon SageMaker JumpStart is the Machine Learning (ML) hub of SageMaker providing pre-trained, publicly available models for a wide range of problem types to help you get started with machine learning.
JumpStart also offers example notebooks that use Amazon SageMaker features like spot instance training and experiments over a large variety of model types and use cases. These example notebooks contain code that shows how to apply ML solutions by using SageMaker and JumpStart. They can be adapted to match to your own needs and can thus speed up application development.
Recently, we added 10 new notebooks to JumpStart in Amazon SageMaker Studio. This post focuses on these new notebooks. As of this writing, JumpStart offers 56 notebooks, ranging from using state-of-the-art natural language processing (NLP) models to fixing bias in datasets when training models.
The 10 new notebooks can help you in the following ways:

They offer

Continue reading

Interactive data prep widget for notebooks powered by Amazon SageMaker Data Wrangler

According to a 2020 survey of data scientists conducted by Anaconda, data preparation is one of the critical steps in machine learning (ML) and data analytics workflows, and often very time consuming for data scientists. Data scientists spend about 66% of their time on data preparation and analysis tasks, including loading (19%), cleaning (26%), and visualizing data (21%).
Amazon SageMaker Studio is the first fully integrated development environment (IDE) for ML. With a single click, data scientists and developers can quickly spin up Studio notebooks to explore datasets and build models. If you prefer a GUI-based and interactive interface, you can use Amazon SageMaker Data Wrangler, with over 300 built in visualizations, analyses, and transformations to efficiently process data backed by Spark without writing a single line of code.
Data Wrangler now offers a built-in data preparation capability in Amazon SageMaker Studio Notebooks that allows ML practitioners to visually review

Continue reading

Run notebooks as batch jobs in Amazon SageMaker Studio Lab

Recently, the Amazon SageMaker Studio launched an easy way to run notebooks as batch jobs that can run on a recurring schedule. Amazon SageMaker Studio Lab also supports this feature, enabling you to run notebooks that you develop in SageMaker Studio Lab in your AWS account. This enables you to quickly scale your machine learning (ML) experiments with bigger datasets and more powerful instances, without having to learn anything new or change one line of code.
In this post, we walk you through the one time prerequisite to connect your Studio Lab environment to an AWS account. After that, we walk you through the steps to run notebooks as a batch job from Studio Lab.
Solution overview
Studio Lab incorporated the same extension as Studio, which is based on the Jupyter open-source extension for scheduled notebooks. This extension has additional AWS-specific parameters, like the compute type. In Studio Lab, a

Continue reading

Organize machine learning development using shared spaces in SageMaker Studio for real-time collaboration

Amazon SageMaker Studio is the first fully integrated development environment (IDE) for machine learning (ML). It provides a single, web-based visual interface where you can perform all ML development steps, including preparing data and building, training, and deploying models.
Within an Amazon SageMaker Domain, users can provision a personal Amazon SageMaker Studio IDE application, which runs a free JupyterServer with built‑in integrations to examine Amazon SageMaker Experiments, orchestrate Amazon SageMaker Pipelines, and much more. Users only pay for the flexible compute on their notebook kernels. These personal applications automatically mount a respective user’s private Amazon Elastic File System (Amazon EFS) home directory so they can keep code, data, and other files isolated from other users. Amazon SageMaker Studio already supports sharing of notebooks between private applications, but the asynchronous mechanism can slow down the iteration process.
Now with shared spaces in Amazon SageMaker Studio, users can organize collaborative ML endeavors

Continue reading

Minimize the production impact of ML model updates with Amazon SageMaker shadow testing

Amazon SageMaker now allows you to compare the performance of a new version of a model serving stack with the currently deployed version prior to a full production rollout using a deployment safety practice known as shadow testing. Shadow testing can help you identify potential configuration errors and performance issues before they impact end-users. With SageMaker, you don’t need to invest in building your shadow testing infrastructure, allowing you to focus on model development. SageMaker takes care of deploying the new version alongside the current version serving production requests, routing a portion of requests to the shadow version. You can then compare the performance of the two versions using metrics such as latency and error rate. This gives you greater confidence that production rollouts to SageMaker inference endpoints won’t cause performance regressions, and helps you avoid outages due to accidental misconfigurations.
In this post, we demonstrate this new SageMaker capability.

Continue reading

Improve governance of your machine learning models with Amazon SageMaker

As companies are increasingly adopting machine learning (ML) for their mainstream enterprise applications, more of their business decisions are influenced by ML models. As a result of this, having simplified access control and enhanced transparency across all your ML models makes it easier to validate that your models are performing well and take action when they are not.
In this post, we explore how companies can improve visibility into their models with centralized dashboards and detailed documentation of their models using two new features: SageMaker Model Cards and the SageMaker Model Dashboard. Both these features are available at no additional charge to SageMaker customers.
Overview of model governance
Model governance is a framework that gives systematic visibility into model development, validation, and usage. Model governance is applicable across the end-to-end ML workflow, starting from identifying the ML use case to ongoing monitoring of a deployed model through alerts, reports, and

Continue reading

Define customized permissions in minutes with Amazon SageMaker Role Manager

Administrators of machine learning (ML) workloads are focused on ensuring that users are operating in the most secure manner, striving towards a principal of least privilege design. They have a wide variety of personas to account for, each with their own unique sets of needs, and building the right sets of permissions policies to meet those needs can sometimes be an inhibitor to agility. In this post, we look at how to use Amazon SageMaker Role Manager to quickly build out a set of persona-based roles that can be further customized to your specific requirements in minutes, right on the Amazon SageMaker console.
Role Manager offers predefined personas and ML activities combined with a wizard to streamline your permission generation process, allowing your ML practitioners to perform their responsibilities with the minimal necessary permissions. If you require additional customization, SageMaker Role Manager allows you to specify networking and encryption permissions

Continue reading

Build an agronomic data platform with Amazon SageMaker geospatial capabilities

The world is at increasing risk of global food shortage as a consequence of geopolitical conflict, supply chain disruptions, and climate change. Simultaneously, there’s an increase in overall demand from population growth and shifting diets that focus on nutrient- and protein-rich food. To meet the excess demand, farmers need to maximize crop yield and effectively manage operations at scale, using precision farming technology to stay ahead.
Historically, farmers have relied on inherited knowledge, trial and error, and non-prescriptive agronomic advice to make decisions. Key decisions include what crops to plant, how much fertilizer to apply, how to control pests, and when to harvest. However, with an increasing demand for food and the need to maximize harvest yield, farmers need more information in addition to inherited knowledge. Innovative technologies like remote sensing, IoT, and robotics have the potential to help farmers move past legacy decision-making. Data-driven decisions fueled by near-real-time insights

Continue reading

Separate lines of business or teams with multiple Amazon SageMaker domains

Amazon SageMaker Studio is a fully integrated development environment (IDE) for machine learning (ML) that enables data scientists and developers to perform every step of the ML workflow, from preparing data to building, training, tuning, and deploying models.
To access SageMaker Studio, Amazon SageMaker Canvas, or other Amazon ML environments like RStudio on Amazon SageMaker, you must first provision a SageMaker domain. A SageMaker domain includes an associated Amazon Elastic File System (Amazon EFS) volume; a list of authorized users; and a variety of security, application, policy, and Amazon Virtual Private Cloud (Amazon VPC) configurations.
Administrators can now provision multiple SageMaker domains in order to separate different lines of business or teams within a single AWS account. This creates a logical separation between the users, files storage, and configuration settings for various groups in your organization. As an example, your organization may want to separate your financial line of business

Continue reading

Operationalize your Amazon SageMaker Studio notebooks as scheduled notebook jobs

Amazon SageMaker Studio provides a fully managed solution for data scientists to interactively build, train, and deploy machine learning (ML) models. In addition to the interactive ML experience, data workers also seek solutions to run notebooks as ephemeral jobs without the need to refactor code as Python modules or learn DevOps tools and best practices to automate their deployment infrastructure. Some common use cases for doing this include:

Regularly running model inference to generate reports
Scaling up a feature engineering step after having tested in Studio against a subset of data on a small instance
Retraining and deploying models on some cadence
Analyzing your team’s Amazon SageMaker usage on a regular cadence

Previously, when data scientists wanted to take the code they built interactively on notebooks and run them as batch jobs, they were faced with a steep learning curve using Amazon SageMaker Pipelines,

Continue reading

How xarvio Digital Farming Solutions accelerates its development with Amazon SageMaker geospatial capabilities

This is a guest post co-written by Julian Blau, Data Scientist at xarvio Digital Farming Solutions; BASF Digital Farming GmbH, and Antonio Rodriguez, AI/ML Specialist Solutions Architect at AWS
xarvio Digital Farming Solutions is a brand from BASF Digital Farming GmbH, which is part of BASF Agricultural Solutions division. xarvio Digital Farming Solutions offers precision digital farming products to help farmers optimize crop production. Available globally, xarvio products use machine learning (ML), image recognition technology, and advanced crop and disease models, in combination with data from satellites and weather station devices, to deliver accurate and timely agronomic recommendations to manage the needs of individual fields. xarvio products are tailored to local farming conditions, can monitor growth stages, and recognize diseases and pests. They increase efficiency, save time, reduce risks, and provide higher reliability for planning and decision-making—all while contributing to sustainable agriculture.
We work with different geospatial data, including satellite

Continue reading



At FusionWeb, we aim to look at the future through the lenses of imagination, creativity, expertise and simplicity in the most cost effective ways. All we want to make something that brings smile to our clients face. Let’s try us to believe us.