按发布时间整理主流大语言模型的参数规模、上下文长度、FFN 架构、注意力机制、归一化方式与激活函数。表格基于原截图内容,并对联网核验后能确认的 2025–2026 年模型字段做了修正。
| 序号 | 模型名 | 发布日期 | FFN 架构 | 注意力架构 | 上下文 | 总参数 | 激活参数 | Pre-Norm | Post-Norm | Attention-Norm | 位置编码 | Attention1 | Attention2 | 残差 | 激活函数 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | GPT-2 XL | 2019/11/05 | Dense | 1K | 1.5B | 1.5B | LayerNorm | PE | MHA | RC | GELU | ||||
| 2 | GPT-3 | 2020/05/14 | Dense | 2K | 175B | 175B | LayerNorm | PE | MHA | RC | GELU | ||||
| 3 | InstructGPT | 2022/03/02 | Dense | 2K | 175B | 175B | LayerNorm | PE | MHA | RC | GELU | ||||
| 4 | Llama | 2023/03/13 | Dense | 2K | 7B | 7B | RMSNorm | RoPE | MHA | RC | SiLU | ||||
| 5 | Llama 2 | 2023/07/09 | Dense | 8K | 70B | 70B | RMSNorm | RoPE | GQA | RC | SiLU | ||||
| 6 | Llama 2 | 2023/07/09 | Dense | 4K | 7B | 7B | RMSNorm | RoPE | MHA | RC | SiLU | ||||
| 7 | Qwen | 2023/08/03 | Dense | 33K | 7B | 7B | RMSNorm | RoPE | MHA | RC | SiLU | ||||
| 8 | Llama 3 | 2024/04/18 | Dense | 8K | 8B | 8B | RMSNorm | RoPE | GQA | RC | SiLU | ||||
| 9 | Llama 3.2 | 2024/09/25 | Dense | 128K | 1B | 1B | RMSNorm | RoPE | GQA | RC | SiLU | ||||
| 10 | OLMo 2 | 2024/11/25 | Dense | 4K | 7B | 7B | RMSNorm | RoPE | MHA | RC | SiLU | ||||
| 11 | Phi-4 | 2024/12/12 | Dense | 16K | 14B | 14B | RMSNorm | RoPE | GQA | RC | SiLU | ||||
| 12 | DeepSeek V3 | 2024/12/26 | Sparse | MoE | 128K | 671B | 37B | RMSNorm | RoPE | MLA | RC | SiLU | |||
| 13 | DeepSeek R1 | 2025/01/20 | Sparse | MoE | 128K | 671B | 37B | RMSNorm | RoPE | MLA | RC | SiLU | |||
| 14 | Gemma 3 | 2025/03/11 | Dense | 128K | 27B | 27B | RMSNorm | QK-RMSNorm | RoPE | GQA | SWA | RC | GELU | ||
| 15 | Mistral Small 3.1 | 2025/03/18 | Dense | 128K | 24B | 24B | RMSNorm | RoPE | GQA | RC | SiLU | ||||
| 16 | Llama 4 Maverick | 2025/04/05 | Sparse | MoE | 1M | 400B | 17B | RMSNorm | RoPE | GQA | RC | SiLU | |||
| 17 | Qwen3 | 2025/04/28 | Sparse | MoE | 128K | 235B | 22B | RMSNorm | QK-RMSNorm | RoPE | GQA | RC | SiLU | ||
| 18 | Qwen3 | 2025/04/28 | Dense | 128K | 32B | 32B | RMSNorm | QK-RMSNorm | RoPE | GQA | RC | SiLU | |||
| 19 | Qwen3 | 2025/04/28 | Dense | 128K | 8B | 8B | RMSNorm | QK-RMSNorm | RoPE | GQA | RC | SiLU | |||
| 20 | Qwen3 | 2025/04/28 | Dense | 33K | 4B | 4B | RMSNorm | QK-RMSNorm | RoPE | GQA | RC | SiLU | |||
| 21 | SmolLM3 | 2025/06/19 | Dense | 131K | 3B | 3B | RMSNorm | RoPE+NoPE | GQA | RC | SiLU | ||||
| 22 | Kimi K2 | 2025/07/10 | Sparse | MoE | 128K | 1T | 32B | RMSNorm | RoPE | MLA | RC | SwiGLU | |||
| 23 | GLM-4.5 | 2025/07/28 | Sparse | MoE | 128K | 355B | 32B | RMSNorm | QK-RMSNorm | RoPE | GQA | RC | SiLU | ||
| 24 | GLM-4.5-Air | 2025/07/28 | Sparse | MoE | 128K | 106B | 12B | RMSNorm | RoPE | GQA | RC | SiLU | |||
| 25 | Qwen3-Coder-480B-A35B | 2025/07/22 | Sparse | MoE | 256K(YaRN 可扩展至 1M) | 480B | 35B | RMSNorm | QK-RMSNorm | RoPE | GQA | RC | SiLU | ||
| 26 | DeepSeek V3.2 | 2025/12/01 | Sparse | DeepSeekMoE | 128K | 671B | 37B | RMSNorm | RoPE | MLA | DSA | RC | SiLU | ||
| 27 | Kimi K2.5 | 2026/02/02 | Sparse | MoE | 256K | 1T | 32B | RMSNorm | RoPE | MLA | RC | SwiGLU | |||
| 28 | GLM-5 | 2026/02/12 | Sparse | MoE | 200K | 744B | 40B | RMSNorm | QK-RMSNorm | RoPE | DSA | RC | SiLU | ||
| 29 | Gemini 3.1 Pro | 待核验 | 未公开 | 未公开 | 1M | 未公开 | 未公开 | 未公开 | 未公开 | 未公开 | 未公开 | 未公开 | 未公开 | ||
| 30 | GPT-5.4 | 2026/03/05 | 未公开 | 未公开 | 1M(API/Codex;272K 为高价阈值) | 未公开 | 未公开 | 未公开 | 未公开 | 未公开 | 未公开 | 未公开 | 未公开 | ||
| 31 | Mistral Small 4 | 2026/03/16 | Sparse | MoE / Hybrid | 256K | 119B | 6.5B | 未公开 | 未公开 | 未公开 | 未公开 | RC | 未公开 | ||
| 32 | Gemma 4 26B-A4B | 2026/03/31 | Sparse | MoE | 未公开 | 26B | ≈4B | RMSNorm | 未公开 | RoPE | 未公开 | SWA / Global Attention | RC | GELU | |
| 33 | Gemma 4 31B | 2026/03/31 | Dense | Dense Transformer | 未公开 | 31B | 31B | RMSNorm | 未公开 | RoPE | 未公开 | SWA / Global Attention | RC | GELU | |
| 34 | GLM-5.1 | 2026/04/07 | Sparse | MoE | 200K | 744B | 40B | RMSNorm | QK-RMSNorm | RoPE | DSA | RC | SiLU | ||
| 35 | Kimi K2.6 | 2026/02/02 | Sparse | MoE | 256K | 1T | 32B | RMSNorm | RoPE | MLA | RC | SwiGLU | |||
| 36 | DeepSeek V4-Pro | 2026/04/24 | Sparse | MoE | 1M | 1.6T | 49B | RMSNorm | RoPE | CSA + HCA | Token-wise Compression | RC | SiLU | ||
| 37 | DeepSeek V4-Flash | 2026/04/24 | Sparse | MoE | 1M | 284B | 13B | RMSNorm | RoPE | CSA + HCA | Token-wise Compression | RC | SiLU | ||
| 38 | GPT-5.5 | 2026/04/23 | 未公开 | 未公开 | 400K(Codex);API 1M | 未公开 | 未公开 | 未公开 | 未公开 | 未公开 | 未公开 | 未公开 | 未公开 | ||
| 39 | GPT-5.5 Instant | 2026/05/05 | 未公开 | 未公开 | 未公开 | 未公开 | 未公开 | 未公开 | 未公开 | 未公开 | 未公开 | 未公开 | 未公开 | ||
| 40 | Gemini 3.5 Flash | 2026/05/20 | 未公开 | 未公开 | 1M | 未公开 | 未公开 | 未公开 | 未公开 | 未公开 | 未公开 | 未公开 | 未公开 |