Amazon’s product search engine indexes billions of products, serves hundreds of millions of customers worldwide, and is one of the most heavily used services in the world. The Amazon Search team develops machine learning (ML) technology that powers the search engine and helps customers search effortlessly. To deliver a great customer experience and operate at the massive scale required by the search engine, this team is always looking for ways to build more cost-effective systems with real-time latency and throughput requirements. The team constantly explores hardware and compilers optimized for deep learning to accelerate model training and inference, while reducing operational costs across the board.
In this post, we describe how Amazon Search uses AWS Inferentia, a high-performance accelerator purpose built by AWS to accelerate deep learning inference workloads. The team runs low-latency ML inference with Transformer-based NLP models on AWS Inferentia-based Amazon Elastic Compute Cloud (Amazon EC2)

Continue reading



At FusionWeb, we aim to look at the future through the lenses of imagination, creativity, expertise and simplicity in the most cost effective ways. All we want to make something that brings smile to our clients face. Let’s try us to believe us.