Train ALBERT for natural language processing with TensorFlow on Amazon SageMaker

Train ALBERT for natural language processing with TensorFlow on Amazon SageMaker

  • May 28, 2020
Table of Contents

Train ALBERT for natural language processing with TensorFlow on Amazon SageMaker

At re:Invent 2019, AWSsharedthe fastest training times on the cloud for two popular machine learning (ML) models: BERT (natural language processing) and Mask-RCNN (object detection). To train BERT in 1 hour, we efficiently scaled out to 2,048 NVIDIA V100 GPUs by improving the underlying infrastructure, network, and ML framework. Today, we’re open-sourcing the optimized training codefor ALBERT (A Lite BERT), a powerful BERT-based language modelthat achieves state-of-the-art performanceon industry benchmarks while training 1.7 times faster and cheaper.

This post demonstrates how to train a faster, smaller, higher-quality model called ALBERT on Amazon SageMaker, a fully managed service that makes it easy to build, train, tune, and deploy ML models. Although this isn’t a new model, it’s the first efficient distributed GPU implementation for TensorFlow 2. You can use AWS training scripts to train ALBERT inAmazon SageMakeron p3dn and g4dn instances for both single-node and distributed training.

The scripts use mixed-precision training and accelerated linear algebra to complete training in under 24 hours(five times faster than without these optimizations), which allows data scientists to iterate faster and bring their models to production sooner. It uses model architectures fromthe open-source Hugging Facetransformerslibrary. For more information, seethe GitHub repo.

Source: amazon.com

Tags :
Share :
comments powered by Disqus

Related Posts

The Dark Secrets Of BERT

The Dark Secrets Of BERT

BERT stands for Bidirectional Encoder Representations from Transformers. This model is basically a multi-layer bidirectional Transformer encoder(Devlin, Chang, Lee, & Toutanova, 2019), and there are multiple excellent guides about how it works generally, includingthe Illustrated Transformer. What we focus on is one specific component of Transformer architecture known as self-attention.

Read More
Top 10 Best FREE Artificial Intelligence Courses

Top 10 Best FREE Artificial Intelligence Courses

Most of the Machine Learning, Deep Learning, Computer Vision, NLP job positions, or in general every Artificial Intelligence (AI) job position requires you to have at least a bachelor’s degree in Computer Science, Electrical Engineering, or some similar field. If your degree comes from some of the world’s best universities than your chances might be higher in beating the competition on your job interview. But looking realistically, not most of the people can afford to go to the top universities in the world simply because not most of us are geniuses and don’t have thousands of dollars, or come from some poor country (like we do).

Read More
A Hacker’s Guide to Efficiently Train Deep Learning Models

A Hacker’s Guide to Efficiently Train Deep Learning Models

Three months ago, I participated in a data science challenge that took place at my company. The goal was to help a marine researcher better identify whales based on the appearance of their flukes. More specifically, we were asked to predict for each image of a test set, the top 20 most similar images from the full database (train+test).

Read More