The Dark Secrets Of BERT

The Dark Secrets Of BERT

  • May 7, 2020
Table of Contents

The Dark Secrets Of BERT

BERT stands for Bidirectional Encoder Representations from Transformers. This model is basically a multi-layer bidirectional Transformer encoder(Devlin, Chang, Lee, & Toutanova, 2019), and there are multiple excellent guides about how it works generally, includingthe Illustrated Transformer. What we focus on is one specific component of Transformer architecture known as self-attention.

In a nutshell, it is a way to weigh the components of the input and output sequences so as to model relations between them, even long-distance dependencies. As a brief example, let’s say we need to create a representation of the sentence “Tom is a black cat”. BERT may choose to pay more attention to “Tom” while encoding the word “cat”, and less attention to the words “is”, “a”, “black”.

This could be represented as a vector of weights (for each word in the sentence). Such vectors are computed when the model encodes each word in the sequence, yielding a square matrix which we refer to as the self-attention map.

Source: topbots.com

Tags :
Share :
comments powered by Disqus

Related Posts

OpenAI, PyTorch

OpenAI, PyTorch

We are standardizing OpenAI’s deep learning framework on PyTorch. In the past, we implemented projects in many frameworks depending on their relative strengths. We’ve now chosen to standardize to make it easier for our team to create and share optimized implementations of our models.

Read More
Facebook AI, AWS partner to release new PyTorch libraries

Facebook AI, AWS partner to release new PyTorch libraries

Facebook AI and AWS have partnered to release new libraries that target high-performance PyTorch model deployment and large scale model training. As part of the broader PyTorch community, Facebook AI and AWS engineers have partnered to develop new libraries targeted at large-scale elastic and fault-tolerant model training and high-performance PyTorch model deployment. These libraries enable the community to efficiently productionize AI models at scale and push the state of the art on model exploration as model architectures continue to increase in size and complexity.

Read More