With Amazon Rekognition Custom Labels, you can have Amazon Rekognition train a custom model for object detection or image classification specific to your business needs. For example, Rekognition Custom Labels can find your logo in social media posts, identify your products on store shelves, classify machine parts in an assembly line, distinguish healthy and infected plants, or detect animated characters in videos.
Developing a Rekognition Custom Labels model to analyze images is a significant undertaking that requires time, expertise, and resources, often taking months to complete. Additionally, it often requires thousands or tens of thousands of hand-labeled images to provide the model with enough data to accurately make decisions. Generating this data can take months to gather and require large teams of labelers to prepare it for use in machine learning (ML).
With Rekognition Custom Labels, we take care of the heavy lifting for you. Rekognition Custom Labels builds off
Build a machine learning model to predict student performance using Amazon SageMaker Canvas
There has been a paradigm change in the mindshare of education customers who are now willing to explore new technologies and analytics. Universities and other higher learning institutions have collected massive amounts of data over the years, and now they are exploring options to use that data for deeper insights and better educational outcomes.
You can use machine learning (ML) to generate these insights and build predictive models. Educators can also use ML to identify challenges in learning outcomes, increase success and retention among students, and broaden the reach and impact of online learning content.
However, higher education institutions often lack ML professionals and data scientists. With this fact, they are looking for solutions that can be quickly adopted by their existing business analysts.
Amazon SageMaker Canvas is a low-code/no-code ML service that enables business analysts to perform data preparation and transformation, build ML models, and deploy these models into
Access Snowflake data using OAuth-based authentication in Amazon SageMaker Data Wrangler
In this post, we show how to configure a new OAuth-based authentication feature for using Snowflake in Amazon SageMaker Data Wrangler. Snowflake is a cloud data platform that provides data solutions for data warehousing to data science. Snowflake is an AWS Partner with multiple AWS accreditations, including AWS competencies in machine learning (ML), retail, and data and analytics.
Data Wrangler simplifies the data preparation and feature engineering process, reducing the time it takes from weeks to minutes by providing a single visual interface for data scientists to select and clean data, create features, and automate data preparation in ML workflows without writing any code. You can import data from multiple data sources, such as Amazon Simple Storage Service (Amazon S3), Amazon Athena, Amazon Redshift, Amazon EMR, and Snowflake. With this new feature, you can use your own identity provider (IdP) such as Okta, Azure AD, or Ping Federate to connect
Remote monitoring of raw material supply chains for sustainability with Amazon SageMaker geospatial capabilities
Deforestation is a major concern in many tropical geographies where local rainforests are at severe risk of destruction. About 17% of the Amazon rainforest has been destroyed over the past 50 years, and some tropical ecosystems are approaching a tipping point beyond which recovery is unlikely.
A key driver for deforestation is raw material extraction and production, for example the production of food and timber or mining operations. Businesses consuming these resources are increasingly recognizing their share of responsibility in tackling the deforestation issue. One way they can do this is by ensuring that their raw material supply is produced and sourced sustainably. For example, if a business uses palm oil in their products, they will want to ensure that natural forests were not burned down and cleared to make way for a new palm oil plantation.
Geospatial analysis of satellite imagery taken of the locations where suppliers operate can
Best practices for viewing and querying Amazon SageMaker service quota usage
Amazon SageMaker customers can view and manage their quota limits through Service Quotas. In addition, they can view near real-time utilization metrics and create Amazon CloudWatch metrics to view and programmatically query SageMaker quotas.
SageMaker helps you build, train, and deploy machine learning (ML) models with ease. To learn more, refer to Getting started with Amazon SageMaker. Service Quotas simplifies limit management by allowing you to view and manage your quotas for SageMaker from a central location.
With Service Quotas, you can view the maximum number of resources, actions, or items in your AWS account or AWS Region. You can also use Service Quotas to request an increase for adjustable quotas.
With the increasing usage of MLOps practices, and therefore the demand for resources designated for ML model experimentation and retraining, more customers need to run multiple instances, often of the same instance type at the same time.
Many data
Build custom code libraries for your Amazon SageMaker Data Wrangler Flows using AWS Code Commit
As organizations grow in size and scale, the complexities of running workloads increase, and the need to develop and operationalize processes and workflows becomes critical. Therefore, organizations have adopted technology best practices, including microservice architecture, MLOps, DevOps, and more, to improve delivery time, reduce defects, and increase employee productivity. This post introduces a best practice for managing custom code within your Amazon SageMaker Data Wrangler workflow.
Data Wrangler is a low-code tool that facilitates data analysis, preprocessing, and visualization. It contains over 300 built-in data transformation steps to aid with feature engineering, normalization, and cleansing to transform your data without having to write any code.
In addition to the built-in transforms, Data Wrangler contains a custom code editor that allows you to implement custom code written in Python, PySpark, or SparkSQL.
When using Data Wrangler custom transform steps to implement your custom functions, you need to implement best practices around
Accelerate Amazon SageMaker inference with C6i Intel-based Amazon EC2 instances
This is a guest post co-written with Antony Vance from Intel.
Customers are always looking for ways to improve the performance and response times of their machine learning (ML) inference workloads without increasing the cost per transaction and without sacrificing the accuracy of the results. Running ML workloads on Amazon SageMaker running Amazon Elastic Compute Cloud (Amazon EC2) C6i instances with Intel’s INT8 inference deployment can help boost the overall performance by up to four times per dollar spent while keeping the loss in inference accuracy less than 1% as compared to FP32 when applied to certain ML workloads. When it comes to running the models in embedded devices where form factor and size of the model is important, quantization can help.
Quantization is a technique to reduce the computational and memory costs of running inference by representing the weights and activations with low-precision data types like 8-bit integer (INT8)
Intelligently search your organization’s Microsoft Teams data source with the Amazon Kendra connector for Microsoft Teams
Organizations use messaging platforms like Microsoft Teams to bring the right people together to securely communicate with each other and collaborate to get work done. Microsoft Teams captures invaluable organizational knowledge in the form of the information that flows through it as users collaborate. However, making this knowledge easily and securely available to users can be challenging due to the fragmented nature of conversations across groups, channels, and chats within an organization. Additionally, the conversational nature of Microsoft Teams communication renders a traditional keyword-based approach to search ineffective when trying to find accurate answers to questions from the content and therefore requires intelligent search capabilities that have the ability to process natural language queries.
You can now use the Amazon Kendra connector for Microsoft Teams to index Microsoft Teams messages and documents, and search this content using intelligent search in Amazon Kendra, powered by machine learning (ML).
This post shows
Bring legacy machine learning code into Amazon SageMaker using AWS Step Functions
Tens of thousands of AWS customers use AWS machine learning (ML) services to accelerate their ML development with fully managed infrastructure and tools. For customers who have been developing ML models on premises, such as their local desktop, they want to migrate their legacy ML models to the AWS Cloud to fully take advantage of the most comprehensive set of ML services, infrastructure, and implementation resources available on AWS.
The term legacy code refers to code that was developed to be manually run on a local desktop, and is not built with cloud-ready SDKs such as the AWS SDK for Python (Boto3) or Amazon SageMaker Python SDK. In other words, these legacy codes aren’t optimized for cloud deployment. The best practice for migration is to refactor these legacy codes using the Amazon SageMaker API or the SageMaker Python SDK. However, in some cases, organizations with a large number of legacy
Maximize performance and reduce your deep learning training cost with AWS Trainium and Amazon SageMaker
Today, tens of thousands of customers are building, training, and deploying machine learning (ML) models using Amazon SageMaker to power applications that have the potential to reinvent their businesses and customer experiences. These ML models have been increasing in size and complexity over the last few years, which has led to state-of-the-art accuracies across a range of tasks and also pushing the time to train from days to weeks. As a result, customers must scale their models across hundreds to thousands of accelerators, which makes them more expensive to train.
SageMaker is a fully managed ML service that helps developers and data scientists easily build, train, and deploy ML models. SageMaker already provides the broadest and deepest choice of compute offerings featuring hardware accelerators for ML training, including G5 (Nvidia A10G) instances and P4d (Nvidia A100) instances.
Growing compute requirements calls for faster and more cost-effective processing power. To further
How VMware built an MLOps pipeline from scratch using GitLab, Amazon MWAA, and Amazon SageMaker
This post is co-written with Mahima Agarwal, Machine Learning Engineer, and Deepak Mettem, Senior Engineering Manager, at VMware Carbon Black
VMware Carbon Black is a renowned security solution offering protection against the full spectrum of modern cyberattacks. With terabytes of data generated by the product, the security analytics team focuses on building machine learning (ML) solutions to surface critical attacks and spotlight emerging threats from noise.
It is critical for the VMware Carbon Black team to design and build a custom end-to-end MLOps pipeline that orchestrates and automates workflows in the ML lifecycle and enables model training, evaluations, and deployments.
There are two main purposes for building this pipeline: support the data scientists for late-stage model development, and surface model predictions in the product by serving models in high volume and in real-time production traffic. Therefore, VMware Carbon Black and AWS chose to build a custom MLOps pipeline using Amazon
Few-click segmentation mask labeling in Amazon SageMaker Ground Truth Plus
Amazon SageMaker Ground Truth Plus is a managed data labeling service that makes it easy to label data for machine learning (ML) applications. One common use case is semantic segmentation, which is a computer vision ML technique that involves assigning class labels to individual pixels in an image. For example, in video frames captured by a moving vehicle, class labels can include vehicles, pedestrians, roads, traffic signals, buildings, or backgrounds. It provides a high-precision understanding of the locations of different objects in the image and is often used to build perception systems for autonomous vehicles or robotics. To build an ML model for semantic segmentation, it is first necessary to label a large volume of data at the pixel level. This labeling process is complex. It requires skilled labelers and significant time—some images can take up to 2 hours or more to label accurately!
In 2019, we released an ML-powered
Accelerate time to insight with Amazon SageMaker Data Wrangler and the power of Apache Hive
Amazon SageMaker Data Wrangler reduces the time it takes to aggregate and prepare data for machine learning (ML) from weeks to minutes in Amazon SageMaker Studio. Data Wrangler enables you to access data from a wide variety of popular sources (Amazon S3, Amazon Athena, Amazon Redshift, Amazon EMR and Snowflake) and over 40 other third-party sources. Starting today, you can connect to Amazon EMR Hive as a big data query engine to bring in large datasets for ML.
Aggregating and preparing large amounts of data is a critical part of ML workflow. Data scientists and data engineers use Apache Spark, Apache Hive, and Presto running on Amazon EMR for large-scale data processing. This blog post will go through how data professionals may use SageMaker Data Wrangler’s visual interface to locate and connect to existing Amazon EMR clusters with Hive endpoints. To get ready for modeling or reporting, they can visually analyze the database,
Using Amazon SageMaker with Point Clouds: Part 1- Ground Truth for 3D labeling
In this two-part series, we demonstrate how to label and train models for 3D object detection tasks. In part 1, we discuss the dataset we’re using, as well as any preprocessing steps, to understand and label data. In part 2, we walk through how to train a model on your dataset and deploy it to production.
LiDAR (light detection and ranging) is a method for determining ranges by targeting an object or surface with a laser and measuring the time for the reflected light to return to the receiver. Autonomous vehicle companies typically use LiDAR sensors to generate a 3D understanding of the environment around their vehicles.
As LiDAR sensors become more accessible and cost-effective, customers are increasingly using point cloud data in new spaces like robotics, signal mapping, and augmented reality. Some new mobile devices even include LiDAR sensors. The growing availability of LiDAR sensors has increased interest in
Real-time fraud detection using AWS serverless and machine learning services
Online fraud has a widespread impact on businesses and requires an effective end-to-end strategy to detect and prevent new account fraud and account takeovers, and stop suspicious payment transactions. Detecting fraud closer to the time of fraud occurrence is key to the success of a fraud detection and prevention system. The system should be able to detect fraud as effectively as possible also alert the end-user as quickly as possible. The user can then choose to take action to prevent further abuse.
In this post, we show a serverless approach to detect online transaction fraud in near-real time. We show how you can apply this approach to various data streaming and event-driven architectures, depending on the desired outcome and actions to take to prevent fraud (such as alert the user about the fraud or flag the transaction for additional review).
This post implements three architectures:
Streaming data inspection
Architect personalized generative AI SaaS applications on Amazon SageMaker
The AI landscape is being reshaped by the rise of generative models capable of synthesizing high-quality data, such as text, images, music, and videos. The course toward democratization of AI helped to further popularize generative AI following the open-source releases for such foundation model families as BERT, T5, GPT, CLIP and, most recently, Stable Diffusion. Hundreds of software as a service (SaaS) applications are being developed around these pre-trained models, which are either directly served to end-customers, or fine-tuned first on a per-customer basis to generate personal and unique content (such as avatars, stylized photo edits, video game assets, domain-specific text, and more). With the pace of technological innovation and proliferation of novel use cases for generative AI, upcoming AI-native SaaS providers and startups in the B2C segment need to prepare for scale from day one, and aim to shorten their time-to-market by reducing operational overhead as much as possible.
Use a data-centric approach to minimize the amount of data required to train Amazon SageMaker models
As machine learning (ML) models have improved, data scientists, ML engineers and researchers have shifted more of their attention to defining and bettering data quality. This has led to the emergence of a data-centric approach to ML and various techniques to improve model performance by focusing on data requirements. Applying these techniques allows ML practitioners to reduce the amount of data required to train an ML model.
As part of this approach, advanced data subset selection techniques have surfaced to speed up training by reducing input data quantity. This process is based on automatically selecting a given number of points that approximate the distribution of a larger dataset and using it for training. Applying this type of technique reduces the amount of time required to train an ML model.
In this post, we describe applying data-centric AI principles with Amazon SageMaker Ground Truth, how to implement data subset selection techniques
Use Snowflake as a data source to train ML models with Amazon SageMaker
Amazon SageMaker is a fully managed machine learning (ML) service. With SageMaker, data scientists and developers can quickly and easily build and train ML models, and then directly deploy them into a production-ready hosted environment. Sagemaker provides an integrated Jupyter authoring notebook instance for easy access to your data sources for exploration and analysis, so you don’t have to manage servers. It also provides common ML algorithms that are optimized to run efficiently against extremely large data in a distributed environment.
SageMaker requires that the training data for an ML model be present either in Amazon Simple Storage Service (Amazon S3), Amazon Elastic File System (Amazon EFS) or Amazon FSx for Lustre (for more information, refer to Access Training Data). In order to train a model using data stored outside of the three supported storage services, the data first needs to be ingested into one of these services (typically Amazon
How Marubeni is optimizing market decisions using AWS machine learning and analytics
This post is co-authored with Hernan Figueroa, Sr. Manager Data Science at Marubeni Power International.
Marubeni Power International Inc (MPII) owns and invests in power business platforms in the Americas. An important vertical for MPII is asset management for renewable energy and energy storage assets, which are critical to reduce the carbon intensity of our power infrastructure. Working with renewable power assets requires predictive and responsive digital solutions, because renewable energy generation and electricity market conditions are continuously changing. MPII is using a machine learning (ML) bid optimization engine to inform upstream decision-making processes in power asset management and trading. This solution helps market analysts design and perform data-driven bidding strategies optimized for power asset profitability.
In this post, you will learn how Marubeni is optimizing market decisions by using the broad set of AWS analytics and ML services, to build a robust and cost-effective Power Bid Optimization solution.
Solution
Portfolio optimization through multidimensional action optimization using Amazon SageMaker RL
Reinforcement learning (RL) encompasses a class of machine learning (ML) techniques that can be used to solve sequential decision-making problems. RL techniques have found widespread applications in numerous domains, including financial services, autonomous navigation, industrial control, and e-commerce. The objective of an RL problem is to train an agent that, given an observation from its environment, will choose the optimal action that maximizes cumulative reward. Solving a business problem with RL involves specifying the agent’s environment, the space of actions, the structure of observations, and the right reward function for the target business outcome. In policy-based RL methods, the outcome of model training is often a policy, which defines a probability distribution over the actions given an observation. The optimal policy will maximize the cumulative returns obtained by the agent.
In constrained decision-making problems, the agent is tasked with choosing the optimal actions under constraints. A distinct class of such