High-performance and low-cost machine learning infrastructure accelerates cloud innovation


Artificial intelligence and machine learning (AI and ML) are key technologies that help organizations develop new ways to increase sales, reduce costs, streamline business processes, and better understand their customers. AWS helps customers accelerate AI / ML adoption by providing powerful computing, high-speed networking, and scalable on-demand high-performance storage options for any machine learning project. This reduces the entry barrier for organizations that want to adopt the cloud to scale their ML applications.

Data developers and scientists are pushing the boundaries of technology and increasingly embracing deep learning, which is a type of machine learning based on neural network algorithms. These deep learning models are larger and more complex, leading to increasing costs for managing the basic training infrastructure and implementing these models.

To enable customers to accelerate their AI / ML transformation, AWS creates high-performance and low-cost machine learning chips. AWS Inferentia is the first machine learning chip created from scratch by AWS for the lowest cost of machine learning pins in the cloud. In fact, Inferentia-powered Amazon EC2 Inf1 instances provide 2.3 times higher performance and up to 70% lower machine learning output costs compared to the current generation of GPU-based EC2 instances. AWS Trainium is the second machine learning chip from AWS, specifically designed to train deep learning models and will be available in late 2021.

Customers from various industries have implemented their machine learning applications in the production of Inferentia and have noticed significant improvements in productivity and cost savings. For example, AirBnB’s customer support platform enables intelligent, scalable and exclusive experiences to serve its community of millions of hosts and guests around the world. It uses Inferentia EC2 Inf1-based instances to implement natural language processing (NLP) models that support its chatbots. This resulted in 2x performance improvements from the box compared to GPU-based instances.

With these innovations in silicon, AWS enables customers to train and implement their deep training models in manufacturing easily with high productivity and productivity at significantly lower costs.

Machine learning causes a rapid transition to cloud infrastructure

Machine learning is an iterative process that requires teams to quickly create, train, and deploy applications, and to train, retrain, and experiment frequently to increase model prediction accuracy. When deploying trained models in their business applications, organizations must also scale their applications to serve new users around the world. They should be able to handle multiple requests coming at the same time with near real-time latency to provide a superior user experience.

Emerging uses such as object detection, natural language processing (NLP), image classification, conversational AI and time series data rely on deep learning technology. Deep learning models are growing exponentially in size and complexity, going from millions of parameters to billions in a few years.

Training and implementation of these complex and advanced models leads to significant infrastructure costs. Costs can quickly become prohibitive as organizations scale their applications to provide an almost real-time experience to their users and customers.

This is where cloud machine learning infrastructure services can help. The cloud provides on-demand access to computing, a high-performance network, and large data storage seamlessly combined with ML operations and higher-level AI services to enable organizations to start immediately and scale their AI / ML initiatives.

How AWS helps customers accelerate their AI / ML transformation

AWS Inferentia and AWS Trainium aim to democratize machine learning and make it accessible to developers, regardless of the experience and size of the organization. The Inferentia’s design is optimized for high performance, bandwidth and low latency, making it ideal for implementing ML scale output.

Each AWS Inferentia chip contains four NeuronCores that implement a high-performance systolic array matrix multiplier that significantly accelerates typical deep learning operations such as convolution and transformers. NeuronCores are also equipped with a large built-in cache, which helps reduce access to external memory, reduces latency and increases bandwidth.

AWS Neuron, the Inferentia software development kit, initially supports leading ML frameworks such as TensorFlow and PyTorch. Developers can continue to use the same frameworks and lifecycle tools they know and love. For many of their trained models, they can compile and deploy them in Inferentia, changing only one line of code, without further changes to the application code.

The result is a high-performance in-line deployment that can be easily scaled while keeping costs under control.

Sprinklr, a software company as a service, has an artificial intelligence-driven unified customer experience management platform that allows companies to collect and translate customer feedback in real time across multiple channels into practical insights. This leads to proactive problem solving, improved product development, improved content marketing and better customer service. Sprinklr uses Inferentia to deploy its NLP and some of its computer vision models and has made significant improvements in performance.

Several Amazon services are also implementing their Inferentia machine learning models.

Amazon Prime Video uses ML models for computer vision to analyze the video quality of live events to provide an optimal viewing experience for Prime Video members. It implemented its ML image classification models in EC2 Inf1 instances and saw 4x performance improvements and up to 40% cost savings over GPU-based instances.

Another example is Amazon Alexa’s AI and ML-based intelligence, powered by Amazon Web Services, which is now available on more than 100 million devices. Alexa’s promise to customers is that it’s always getting smarter, more conversational, more active and even more delightful. Fulfilling this promise requires continuous improvements in response time and machine training infrastructure costs. By implementing Alexa’s speech-to-speech models in Inf1 instances, it has managed to reduce output latency by 25% and the cost of output by 30% to improve the service experience for tens of millions of customers who use Alexa. monthly.

Unlocking new opportunities for machine learning in the cloud

As companies compete for a reliable business for the future by providing the best digital products and services, no organization can lag behind in implementing sophisticated machine learning models to help innovate its customers. The last few years have seen a huge increase in the applicability of machine learning for a variety of uses, from customization and outflow forecasting to fraud detection and supply chain forecasting.

Fortunately, the machine learning infrastructure in the cloud unlocks new opportunities that were not possible before, making it much more accessible to non-experts. That’s why AWS customers now use Inferentia-powered Amazon EC2 Inf1 instances to provide the intelligence behind their recommended machines and chatbots and get useful information from customer feedback.

With AWS, cloud-based machine learning infrastructure options suitable for different skill levels, it is clear that any organization can accelerate innovation and cover the entire lifecycle of machine learning on a large scale. As machine learning continues to become more widespread, organizations are now able to fundamentally transform the customer experience – and the way they do business – with a cost-effective, high-performance cloud-based machine learning infrastructure.

Learn more about how the AWS machine learning platform can help your company innovate here.

This content is produced by AWS. Not written by the MIT Technology Review.



Source link