Session 1 — LLM Components Primer + Tiny Transformer + “DeepSeek-style” Optimization 101

18-Week Learning Outline

Here’s a text-based diagram that covers the full set of typical LLM system components (from data → training → serving → app → safety/ops). I’m using a layered “stack” view plus a couple of zoom-ins.

1) End-to-end LLM System Map (full stack)

┌──────────────────────────────────────────────────────────────────────────────┐
│                                USER / CHANNELS                               │
│  Web App | Mobile | IDE | API Client | Voice | Agents | Plugins | Devices     │
└──────────────────────────────────────────────────────────────────────────────┘
                                      │
                                      ▼
┌──────────────────────────────────────────────────────────────────────────────┐
│                           APPLICATION / PRODUCT LAYER                         │
│  UI/UX • Workflows • Session State • Feature Flags • A/B Tests • Analytics    │
│  Domain Logic • Permissions • Billing • Tenant Mgmt • Rate Plans              │
└──────────────────────────────────────────────────────────────────────────────┘
                                      │
                                      ▼
┌──────────────────────────────────────────────────────────────────────────────┐
│                        ORCHESTRATION / AGENT RUNTIME                          │
│  Prompt Orchestrator • Router • Multi-model Selector • Tool Policy            │
│  Planner/Executor (ReAct) • Reflexion • Memory Manager • Task Graph           │
│  Tool Registry (MCP) • Connectors • Retries/Backoff • Timeout Control         │
└──────────────────────────────────────────────────────────────────────────────┘
            │                             │                               │
            │                             │                               │
            ▼                             ▼                               ▼
┌───────────────────────┐     ┌─────────────────────────┐     ┌───────────────────────┐
│  PROMPT / CONTEXT      │     │    SAFETY / GUARDRAILS   │     │  OBSERVABILITY / OPS  │
│  System/Dev/User msgs  │     │  Input/Output filters    │     │  Logs • Traces •      │
│  Templates • Few-shot  │     │  Policy engine           │     │  Metrics • Alerts     │
│  Context window mgmt   │     │  PII redaction           │     │  Cost/Token tracking  │
│  Compression/Summary   │     │  Jailbreak detection     │     │  SLOs • Dashboards     │
└───────────────────────┘     └─────────────────────────┘     └───────────────────────┘
            │                             │                               │
            └──────────────┬──────────────┴──────────────┬────────────────┘
                           │                             │
                           ▼                             ▼
┌──────────────────────────────────────────────────────────────────────────────┐
│                         RETRIEVAL / KNOWLEDGE LAYER (RAG)                     │
│  Ingestion: loaders, OCR, parsing, chunking, metadata, dedup, PII scrub       │
│  Indexing: embeddings, vector DB, keyword/BM25, hybrid search, rerankers       │
│  Retrieval: query rewrite, filters, top-k, rerank, citations, grounding        │
│  Knowledge Stores: docs, dbs, wikis, tickets, emails, code, web, files         │
└──────────────────────────────────────────────────────────────────────────────┘
                           │                             │
                           │                             │
                           ▼                             ▼
┌────────────────────────────────┐            ┌─────────────────────────────────┐
│   TOOL / ACTION LAYER (MCP)     │            │   MODEL SERVING / INFERENCE     │
│  Function calling               │            │  API Gateway / Router           │
│  Browsers • DB • Email •        │            │  Auth • Quotas • Rate limits    │
│  Calendar • Payments •          │            │  Prompt cache • KV cache        │
│  Code exec • Search •           │            │  Streaming • Batching           │
│  Workflows (n8n/Airflow)        │            │  Fallbacks • Canary • Rollback  │
└────────────────────────────────┘            └─────────────────────────────────┘
                                                     │
                                                     ▼
┌──────────────────────────────────────────────────────────────────────────────┐
│                                   LLM CORE                                   │
│  Tokenizer → Embedding → Transformer Blocks (Attention + MLP) → LM Head       │
│  Decoding: greedy / beam / sampling / temperature / top-p / top-k             │
│  Alignment helpers: system prompts, preference models, safety classifiers      │
└──────────────────────────────────────────────────────────────────────────────┘
                                                     │
                                                     ▼
┌──────────────────────────────────────────────────────────────────────────────┐
│                          INFRASTRUCTURE / PLATFORM                            │
│  GPU/CPU Nodes • Kubernetes • Autoscaling • Model Registry • CI/CD             │
│  Secrets • KMS • IAM • VPC • Load Balancers • Storage • Backup/DR              │
└──────────────────────────────────────────────────────────────────────────────┘
                                                     │
                                                     ▼
┌──────────────────────────────────────────────────────────────────────────────┐
│                       TRAINING / FINETUNING / EVAL PIPELINES                  │
│  Data: collection, cleaning, labeling, synthetic data, red-teaming             │
│  Train: pretrain, SFT, DPO/RLHF, distillation, quantization, pruning           │
│  Eval: benchmarks, unit tests, regression, safety evals, human eval            │
│  Release: versioning, model cards, governance, approvals                       │
└──────────────────────────────────────────────────────────────────────────────┘

2) Zoom-in: “Inside the LLM Core”

Text → [Tokenizer] → token_ids
          │
          ▼
     [Token Embeddings] + [Positional Encoding]
          │
          ▼
┌─────────────────────────────────────────────────────────────┐
│   Transformer Block × N                                     │
│   ┌───────────────────────────┐   ┌───────────────────────┐ │
│   │ Multi-Head Self-Attention  │ → │  FeedForward / MLP     │ │
│   └───────────────────────────┘   └───────────────────────┘ │
│        ↑ residual + layernorm         ↑ residual + layernorm │
└─────────────────────────────────────────────────────────────┘
          │
          ▼
      [LM Head] → logits over vocab
          │
          ▼
   [Decoder / Sampler] → next_token → (loop) → output_text

3) Zoom-in: RAG (Retrieval-Augmented Generation) Components

                         ┌──────────────────────────────┐
User Query ─────────────▶│ Query processing              │
                         │ rewrite • expand • classify   │
                         └──────────────┬───────────────┘
                                        │
                                        ▼
                 ┌────────────────────────────────────────────┐
                 │ Retrieval                                   │
                 │  - Vector search (embeddings)               │
                 │  - Keyword/BM25                             │
                 │  - Hybrid + filters                         │
                 └──────────────┬─────────────────────────────┘
                                │
                                ▼
                 ┌────────────────────────────────────────────┐
                 │ Reranking / Scoring                         │
                 │  - cross-encoder reranker                   │
                 │  - diversity / freshness / authority        │
                 └──────────────┬─────────────────────────────┘
                                │
                                ▼
                 ┌────────────────────────────────────────────┐
                 │ Context packer                              │
                 │  - chunk selection                          │
                 │  - citations                                │
                 │  - summarization/compression                │
                 └──────────────┬─────────────────────────────┘
                                │
                                ▼
                         ┌──────────────┐
                         │ LLM response  │
                         └──────────────┘

4) Zoom-in: Agentic LLM (Plan → Act → Reflect) Loop

┌───────────────┐
│ User Goal      │
└──────┬────────┘
       ▼
┌───────────────────────────────┐
│ Planner (decompose into tasks) │
└──────┬────────────────────────┘
       ▼
┌───────────────────────────────┐
│ Tool Selector + Policy         │
│ (what tool, what args, allowed)│
└──────┬────────────────────────┘
       ▼
┌───────────────────────────────┐
│ Execute Tool(s)                │
│ (search/db/code/email/calendar)│
└──────┬────────────────────────┘
       ▼
┌───────────────────────────────┐
│ Memory Update                  │
│ (episodic + semantic + state)  │
└──────┬────────────────────────┘
       ▼
┌───────────────────────────────┐
│ Reflexion / Critic             │
│ (check errors, retry, improve) │
└──────┬────────────────────────┘
       ▼
┌───────────────┐
│ Final Answer   │
└───────────────┘