lms — course view | AI-Designed with dxmax

The Mechanism of Self-Attention

Self-attention, sometimes called intra-attention, is an attention mechanism relating different positions of a single sequence in order to compute a representation of the sequence. It has been used successfully in a variety of tasks including reading comprehension, abstractive summarization, textual entailment and learning task-independent sentence representations.

Key Concept: Query, Key, and Value

In the self-attention layer, for each word, we create three vectors: a Query vector, a Key vector, and a Value vector. These vectors are created by multiplying the embedding by three matrices that we trained during the training process.

When computing the self-attention for a word, we score each word of the input sentence against it. The score determines how much focus to place on other parts of the input sentence as we encode a word at a certain position.

RNNs and Vanishing Gradients

The Transformer Architecture

Section 1: Foundations

History of Neural Nets

12:00 • Completed

Self-Attention Mechanism

18:45 • Playing

Multi-Head Attention

15:20

Module Quiz

10 Questions

AI Tutor is online