
Note: To view the a full version of the above LLM Components in detail, click on the “More action” on the top right of the above image, view original and zoom to view in detail.
Large Language Models (LLMs) have several key components that contribute to their functionality and performance. These components work together to process text, understand context, generate responses, and perform various language-related tasks. Here are the key components:
1. Input Representation (Feature Representation)
- LLMs process textual input by converting it into a structured numerical form.
- Tokenization: The input text is split into tokens (subwords, words, or characters) using methods like Byte Pair Encoding (BPE) or WordPiece.
- Embeddings: Each token is mapped to a high-dimensional vector representation using pre-trained word embeddings (e.g., Word2Vec, GloVe) or learned embeddings from the model itself.
2. Positional Encoding
- Since transformers do not have an inherent notion of sequence order (unlike RNNs), they use positional encodings to provide information about token positions.
- Common methods:
- Fixed sinusoidal encoding (e.g., used in vanilla Transformers)
- Learnable embeddings (e.g., used in BERT, GPT)
3. Multi-Head Self-Attention (Attention Maps)
- Self-attention mechanism allows the model to assign different importance levels to different words in a sequence, regardless of their positions.
- Each token attends to every other token in the input, producing an attention map that represents these relationships.
- Key steps:
-
Query (Q), Key (K), and Value (V) Projections: The input is linearly transformed into Q, K, and V vectors.
-
Scaled Dot-Product Attention: Computes attention scores using:
Attention(Q,K,V)=softmax(QKTdk)V\text{Attention}(Q, K, V) = \text{softmax} \left(\frac{QK^T}{\sqrt{d_k}}\right) V
-
Multi-Head Attention: Uses multiple attention heads to capture different aspects of the input.