- Large Language Model: Landscape
- Prompt Engineering and Visual Prompts
- Finetuning
- Large Language Model: Challenges and Solutions
- Survey and Reference
- The best NLP papers from 2015 to now
- In 2023: As abilities emerge only at scale, we must unlearn outdated intuitions, scale Transformers via massive distributed matrix multiplications, and discover the inductive bias needed to push ~10,000× beyond GPT-4. 🗣️ / 📺 / ✍️ [6 Oct 2023]
- AI Model Review: Compare 75 AI Models on 200+ Prompts Side By Side.
- Artificial Analysis:💡Independent analysis of AI models and API providers.
- Inside language models (from GPT to Olympus)
- LiveBench: a benchmark for LLMs designed with test set contamination.
- LLMArena:💡Chatbot Arena (formerly LMSYS): Free AI Chat to Compare & Test Best AI Chatbots
- LLMprices.dev: Compare prices for models like GPT-4, Claude Sonnet 3.5, Llama 3.1 405b and many more.
- LLM Pre-training and Post-training Paradigms [17 Aug 2024]

-
The Big LLM Architecture Comparison✍️:💡 [19 Jul 2025]
-
LLM Architecture Gallery✍️: Visual guide to modern LLM architectures and design tradeoffs. [26 Mar 2026]
Model Parameters Attention Type MoE Norm Positional Encoding Notable Features DeepSeek V3 / R1 671B Multi-Head Latent Attention (MLA) Yes, 256 experts (37B active) Pre-normalization RoPE KV compression via MLA, shared expert, high inference efficiency OLMo 2 32B Multi-Head Attention (MHA) No Post-normalization + QK norm (RMSNorm) RoPE RMSNorm scaling after attention & FF, training stability Gemma 3 / 3n 27B / 4B Sliding Window + Grouped-Query Attention No Pre + Post RMSNorm RoPE Sliding window attention, Gemma 3n: Per-Layer Embedding (PLE), MatFormer slices Mistral Small 3.1 24B Grouped-Query Attention No Pre-normalization RoPE Optimized for low latency, simpler than Gemma 3 Llama 4 Maverick 400B Grouped-Query Attention Yes, fewer & larger experts Pre-normalization RoPE Alternating MoE & dense layers, 17B active parameters Qwen3 (Dense) 0.6–32B Grouped-Query Attention No Pre-normalization RoPE Deep architecture, small memory footprint Qwen3 (MoE) 30B–235B Grouped-Query Attention Yes, no shared expert Pre-normalization RoPE Sparse MoE, optimized for large-scale inference SmolLM3 3B Grouped-Query Attention No Pre-normalization NoPE (No Positional Embedding) Good small-scale performance, improved length generalization Kimi K2 1T MLA Yes, more experts than DeepSeek Pre-normalization RoPE Muon optimizer, very high modeling performance, open-weight gpt-oss 20B / 120B Grouped-Query + Sliding Window Yes, few large experts Pre-normalization RoPE Wider architecture, attention sinks, bias units Grok 2.5 70B Grouped-Query Attention Yes Pre-normalization RoPE Standard large-scale architecture GLM-4.5 130B Grouped-Query Attention Yes Pre-normalization RoPE Standard architecture with high performance Qwen3-Next - Grouped-Query Attention Yes Pre-normalization RoPE Expert size & number tuned, Gated DeltaNet + Gated Attention Hybrid, Multi-Token Prediction -
Beyond Standard LLMs✍️:💡Linear Attention Hybrids, Text Diffusion, Code World Models, and Small Recursive Transformers [04 Nov 2025]
Architecture Type Key Models Attention Mechanism Main Advantage Main Limitation Use Case Standard Transformer GPT-5, DeepSeek V3/R1, Llama 4, Qwen3, Gemini 2.5, MiniMax-M2 Quadratic O(n²) scaled-dot-product Proven, SOTA performance, mature tooling Expensive training & inference, quadratic complexity General-purpose LLM tasks Linear Attention Hybrids Qwen3-Next, Kimi Linear, MiniMax-M1, DeepSeek V3.2 Gated DeltaNet + Full Attention (3:1 ratio) 75% KV cache reduction, 6× decoding throughput, linear O(n) Trades accuracy for efficiency, added complexity Long-context tasks, resource-constrained environments Text Diffusion LLaDA, Gemini Diffusion Bidirectional (no causal mask) Parallel token generation, faster responses Can't stream, tricky tool-calling, quality degradation with fewer steps Fast inference, on-device LLMs Code World Models CWM (32B) Standard sliding-window attention Simulates code execution, improves reasoning Limited to code domain, added latency from execution traces Code generation, debugging, test-time scaling Small Recursive Transformers TRM (7M), HRM (28M) Standard attention with recursive refinement Very small (7M params), strong puzzle solving, <$500 training cost Special-purpose, limited to structured tasks (Sudoku, ARC, Maze) Domain-specific reasoning, tool-calling modules
| Feature | GPT-2 | GPT-OSS |
|---|---|---|
| Release & Size | 2019, up to 1.5B params | 2025, 20B & 120B params (MoE) |
| Architecture | Dense transformer decoder | Mixture-of-Experts (MoE) decoder |
| Activation & Dropout | Swish activation, uses dropout | GELU (or optimized), no dropout |
| Parameter Efficiency | All params active per token | Sparse activation of experts |
| Deployment & License | MIT license | Open-weight local runs, Apache 2.0 |
| Reasoning & Tools | Basic generation | Built-in chain-of-thought & tool use |
- Evolutionary Graph of LLaMA Family

- LLM evolutionary tree

- Timeline of SLMs

- A Comprehensive Survey of Small Language Models in the Era of Large Language Models📑 / git [4 Nov 2024]
- LLM evolutionary tree📑: A curated list of practical guide resources of LLMs (LLMs Tree, Examples, Papers) git [26 Apr 2023]
- A Survey of Large Language Models📑: /git [31 Mar 2023] contd.
- An overview of different fields of study and recent developments in NLP. 🗄️ / ✍️ [24 Sep 2023]
Exploring the Landscape of Natural Language Processing Research ref📑 [20 Jul 2023]

- NLP taxonomy
- Ai2 (Allen Institute for AI)
- Founded by Paul Allen, the co-founder of Microsoft, in Sep 2024.
- DR Tulu: 8B. Deep Research (DR) model trained for long-form DR tasks. [Nov 2025]
- OLMo📑:💡Truly open language model and framework to build, study, and advance LMs, along with the training data, training and evaluation code, intermediate model checkpoints, and training logs. git [Feb 2024]
- OLMo 2 [26 Nov 2024]
- OLMo 3✍️: Fully open models including the entire flow. [20 Nov 2025]
- OLMoE: fully-open LLM leverages sparse Mixture-of-Experts [Sep 2024]
- TÜLU 3📑:💡Pushing Frontiers in Open Language Model Post-Training git / demo:✍️ [22 Nov 2024]
- Alibaba
- Qwen (通义千问: Universal Intelligence that can answer a thousand questions) git Flagship Models✍️
- Qwen model family: Qwen first model released in [April 2023]
- Qwen3 Technical Report📑: Unified thinking and non-thinking modes across dense and MoE models. [May 2025]
- Qwen-Image-Edit [18 Aug 2025]
- Qwen3-Max: over 1 trillion parameters. 256K tokens. [5 Sep 2025]
- Amazon
- Amazon Nova Foundation Models: Text only - Micro, Multimodal - Light, Pro [3 Dec 2024]
- The Amazon Nova Family of Models: Technical Report and Model Card📑 [17 Mar 2025]
- Anthrophic
- Claude 3✍️, the largest version of the new LLM, outperforms rivals GPT-4 and Google’s Gemini 1.0 Ultra. Three variants: Opus, Sonnet, and Haiku. [Mar 2024]
- Claude 3.7 Sonnet and Claude Code✍️: the first hybrid reasoning model. ✍️ [25 Feb 2025]
- Claude 4✍️: Claude Opus 4 (72.5% on SWE-bench), Claude Sonnet 4 (72.7% on SWE-bench). Extended Thinking Mode (Beta). Parallel Tool Use & Memory. Claude Code SDK. AI agents: code execution, MCP connector, Files API, and 1-hour prompt caching. [23 May 2025]
- Claude 4.5✍️: Major upgrades in autonomous coding, tool use, context handling, memory, and long-horizon reasoning; supports over 30 hours of continuous operation. [30 Sep 2025]
- Claude Opus 4.5✍️: SWE-bench Verified (80.9%). $5/$25 per million tokens [25 Nov 2025]
- anthropic/cookbook
- Apple
- OpenELM: Apple released a Transformer-based language model. Four sizes of the model: 270M, 450M, 1.1B, and 3B parameters. [April 2024]
- Apple Intelligence Foundation Language Models: 1. A 3B on-device model used for language tasks like summarization and Writing Tools. 2. A large Server model used for language tasks too complex to do on-device. [10 Jun 2024]
- Baidu
- ERNIE Bot's official website: ERNIE X1 (deep-thinking reasoning) and ERNIE 4.5 (multimodal) [16 Mar 2025]
- A list of models & libraries: git
- Chatbot Arena🤗
- Chatbot Arena🤗: Benchmarking LLMs in the Wild with Elo Ratings
- Cohere
- Founded in 2019. Canadian multinational tech.
- Command R+🤗: The performant model for RAG capabilities, multilingual support, and tool use. [Aug 2024]
- An Overview of Cohere’s Models | Playground
- Databricks
- DBRX: MoE, open, general-purpose LLM created by Databricks. [27 Mar 2024]
- Deepseek
- Founded in 2023, is a Chinese company dedicated to AGI.
- DeepSeek-V3: Mixture-of-Experts (MoE) with 671B. [26 Dec 2024]
- DeepSeek-V3 Technical Report📑: 671B MoE model with MLA and auxiliary-loss-free load balancing. [Dec 2024]
- DeepSeek-R1:💡an open source reasoning model. Group Relative Policy Optimization (GRPO). Base -> RL -> SFT -> RL -> SFT -> RL [20 Jan 2025] ref📑: A Review of DeepSeek Models' Key Innovative Techniques [14 Mar 2025]
- Janus: Multimodal understanding and visual generation. [28 Jan 2025]
- DeepSeek-V3🤗: 671B. Top-tier performance in coding and reasoning tasks [25 Mar 2025]
- DeepSeek-Prover-V2: Mathematical reasoning [30 Apr 2025]
- DeepSeek-v3.1🤗: Think/Non‑Think hybrid reasoning. 128K and MoE. Agent abilities. [19 Aug 2025]
- DeepSeek-V3.2📑: DeepSeek Sparse Attention (DSA) cuts complexity from O(L²) to O(Lk). [12 Dec 2025]
- DeepSeek-V3.2-Exp [Sep 2025]
- DeepSeek-OCR: Convert long text into an image, compresses it into visual tokens, and sends those to the LLM — cutting cost and expanding context capacity. [Oct 2025]
- DeepSeekMath-V2: a Self-Verifiable Mathematical Reasoning model [27 Nov 2025]
- mHC (Manifold-Constrained Hyper-Connections)📑 [31 Dec 2025]
Controlled layer updates for stable deep models.
next state = current state + constrained update
(vs. residuals: F(x) + x -> Hyper-Connections: unconstrained -> mHC: constrained) - Engram (Conditional Memory Module)
Adds a native memory lookup alongside neural computation, letting frequent patterns be retrieved in constant time.
output = compute(x) + memory lookup(x)
(vs. attention: recomputing patterns every time -> Engram) - A list of models: git
- EleutherAI
- Founded in July 2020. United States tech. GPT-Neo, GPT-J, GPT-NeoX, and The Pile dataset.
- Pythia📑: How do large language models (LLMs) develop and evolve over the course of training and change as models scale? A suite of decoder-only autoregressive language models ranging from 70M to 12B parameters git [Apr 2023]
- Google
- Foundation Models: Gemini, Veo, Gemma etc.
- Gemma: Open weights LLM from Google DeepMind. git / Pytorch git [Feb 2024]
- Gemma 2 2B, 9B, 27B ref: releases [Jun 2024]
- Gemma 3: Single GPU. Context length of 128K tokens, SigLIP encoder, Reasoning ✍️ [12 Mar 2025]
- Gemini: Rebranding: Bard -> Gemini [8 Feb 2024]
- Gemini 1.5✍️: 1 million token context window, 1 hour of video, 11 hours of audio, codebases with over 30,000 lines of code or over 700,000 words. [Feb 2024]
- Gemini 2 Flash✍️: Multimodal LLM with multilingual inputs/outputs, real-time capabilities (Project Astra), complex task handling (Project Mariner), and developer tools (Jules) [11 Dec 2024]
- Gemini 2.0 Flash Thinking Experimental [19 Dec 2024]
- Gemini 2.5✍️: strong reasoning and code. 1 million token context [25 Mar 2025] -> I/O 2025✍️ Deep Think, 1M-token context, Native audio output, Project Mariner: AI-powered computer control. [20 May 2025] Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities.📑
- Gemma 3n: The next generation of Gemini Nano. Gemma 3n uses DeepMind’s Per-Layer Embeddings (PLE) to run 5B/8B models at 2GB/3GB RAM. [20 May 2025]
- gemini/cookbook
- Gemini 3 Pro✍️: Deep Think reasoning, Advanced multimodal understanding, spatial reasoning, and agentic capabilities up 30% from 2.5 Pro — reaching 37.5% on Humanity’s Last Exam (41% in Deep Think mode). [18 Nov 2025]
- Groq
- Founded in 2016. low-latency AI inference H/W. American tech.
- Llama-3-Groq-Tool-Use: a model optimized for function calling [Jul 2024]
- Huggingface
- Open R1: A fully open reproduction of DeepSeek-R1. [25 Jan 2025]
- Huggingface Open LLM Learboard🤗
- IBM
- Granite Guardian: a collection of models designed to detect risks in prompts and responses [10 Dec 2024]
- Jamba: AI21's SSM-Transformer Model. Mamba + Transformer + MoE [28 Mar 2024]
- KoAlpaca: Alpaca for korean [Mar 2023]
- Llama variants emerged in 2023
- Falcon LLM Apache 2.0 license [Mar 2023]
- Alpaca: Fine-tuned from the LLaMA 7B model [Mar 2023]
- vicuna: 90% ChatGPT Quality [Mar 2023]
- dolly: Databricks [Mar 2023]
- Cerebras-GPT: 7 GPT models ranging from 111m to 13b parameters. [Mar 2023]
- Koala: Focus on dialogue data gathered from the web. [Apr 2023]
- StableVicuna First Open Source RLHF LLM Chatbot [Apr 2023]
- Upstage's 70B Language Model Outperforms GPT-3.5: ✍️ [1 Aug 2023]
- LLM Collection: promptingguide.ai
- Meta
- Most OSS LLM models have been built on the Llama / ✍️ / git
- Llama 2🤗: 1) 40% more data than Llama. 2)7B, 13B, and 70B. 3) Trained on over 1 million human annotations. 4) double the context length of Llama 1: 4K 5) Grouped Query Attention, KV Cache, and Rotary Positional Embedding were introduced in Llama 2 [18 Jul 2023] demo🤗
- Llama 3: 1) 7X more data than Llama 2. 2) 8B, 70B, and 400B. 3) 8K context length [18 Apr 2024]
- MEGALODON: Long Sequence Model. Unlimited context length. Outperforms Llama 2 model. [Apr 2024]
- Llama 3.1: 405B, context length to 128K, add support across eight languages. first OSS model outperforms GTP-4o. [23 Jul 2024]
- Llama 3.2: Multimodal. Include text-only models (1B, 3B) and text-image models (11B, 90B), with quantized versions of 1B and 3B [Sep 2024]
- NotebookLlama: An Open Source version of NotebookLM [28 Oct 2024]
- Llama 3.3: a text-only 70B instruction-tuned model. Llama 3.3 70B approaches the performance of Llama 3.1 405B. [6 Dec 2024]
- Llama 4: Mixture of Experts (MoE). Llama 4 Scout (actived 17b / total 109b, 10M Context, single GPU), Llama 4 Maverick (actived 17b / total 400b, 1M Context) git: Model Card [5 Apr 2025]
- Most OSS LLM models have been built on the Llama / ✍️ / git
- ModernBERT📑: ModernBERT can handle sequences up to 8,192 tokens and utilizes sparse attention mechanisms to efficiently manage longer context lengths. [18 Dec 2024]
- Microsoft
- MAI-1✍️: MAI-Voice-1, MAI-1-preview. Microsoft in-house models. [28 Aug 2025]
- phi-series: cost-effective small language models (SLMs) ✍️ git: Cookbook
- Phi-1📑: Despite being small in size, phi-1 attained 50.6% on HumanEval and 55.5% on MBPP. Textbooks Are All You Need. ✍️ [20 Jun 2023]
- Phi-1.5📑: Textbooks Are All You Need II. Phi 1.5 is trained solely on synthetic data. Despite having a mere 1 billion parameters compared to Llama 7B's much larger model size, Phi 1.5 often performs better in benchmark tests. [11 Sep 2023]
- phi-2: open source, and 50% better at mathematical reasoning. 🤗 [Dec 2023]
- phi-3-vision (multimodal), phi-3-small, phi-3 (7b), phi-sillica (Copilot+PC designed for NPUs)
- Phi-3📑: Phi-3-mini, with 3.8 billion parameters, supports 4K and 128K context, instruction tuning, and hardware optimization. [22 Apr 2024] ✍️
- phi-3.5-MoE-instruct: 🤗 [Aug 2024]
- Phi-4📑: Specializing in Complex Reasoning ✍️ [12 Dec 2024]
- Phi-4-multimodal / mini🤗 5.6B. speech, vision, and text processing into a single, unified architecture. [26 Feb 2025]
- Phi-4-reasoning✍️: Phi-4-reasoning, Phi-4-reasoning-plus, Phi-4-mini-reasoning [30 Apr 2025]
- Phi-4-mini-flash-reasoning✍️: 3.8B, 64K context, Single GPU, Decoder-Hybrid-Decoder architecture [9 Jul 2025]
- MiniMaxAI
- Founded in Dec 2021. Shanghai, China.
- MiniMax-M2: Coding and Agent tasks, 230B (10B Active), MoE, a new high ahead of DeepSeek-V3.2 and Kimi K2
- Mistral
- Founded in April 2023. French tech.
- Model overview ✍️
- NeMo: 12B model with 128k context length that outperforms LLama 3 8B [18 Jul 2024]
- Mistral OCR: Precise text recognition with up to 99% accuracy. Multimodal. Browser based [6 Mar 2025]
- Mistral Large 3✍️: Flagship multimodal model for reasoning, coding, and enterprise assistants. [Mar 2025]
- Moonshot AI
- Moonshot AI is a Beijing-based Chinese AI company founded in March 2023
- Kimi-K2: 1T parameter MoE model. MuonClip Optimizer. Agentic Intelligence. [11 Jul 2025]
- Kimi K2 Thinking✍️: The first open-source model beats GPT-5 in Agent benchmark. [7 Nov 2025]
- Kimi-K2.5: Open-source multimodal agentic model by Moonshot AI. [Jan 2026]
- NVIDIA
- Nemotron-4 340B: Synthetic Data Generation for Training Large Language Models [14 Jun 2024]
- ollam: ollama-supported models
- Open-Sora: Democratizing Efficient Video Production for All [Mar 2024]
- OpenAI
- gpt-oss:💡gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI. [Jun 2025]
- gpt-oss:💡gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI. [Jun 2025]
- Qualcomm
- Qualcomm’s on-device AI models🤗: Bring generative AI to mobile devices [Feb 2024]
- Tencent
- Founded in 1998, Tencent is a Chinese company dedicated to various technology sectors, including social media, gaming, and AI development.
- Hunyuan-Large: An open-source MoE model with open weights. [4 Nov 2024] git
- Hunyuan-T1: Reasoning model [21 Mar 2025]
- A list of models: git
- The LLM Index: A list of large language models (LLMs)
- The mother of all spreadsheets for anyone into LLMs [17 Dec 2024]
- The Open Source AI Definition [28 Oct 2024]
- xAI
- xAI is an American AI company founded by Elon Musk in March 2023
- Grok: 314B parameter Mixture-of-Experts (MoE) model. Released under the Apache 2.0 license. Not includeded training code. Developed by JAX git [17 Mar 2024]
- Grok-2 and Grok-2 mini [13 Aug 2024]
- Grok-2.5: Grok 2.5 Goes Open Source [24 Aug 2025]
- Grok-3: 200,000 GPUs to train. Grok 3 beats GPT-4o on AIME, GPQA. Grok 3 Reasoning and Grok 3 mini Reasoning. [17 Feb 2025]
- Grok-4: Humanity’s Last Exam, Grok 4 Heavy scored 44.4% [9 Jul 2025]
- Grok 4.1✍️ [17 Nov 2025]
- Xiaomi
- Founded in 2010, Xiaomi is a Chinese company known for its innovative consumer electronics and smart home products.
- Mimo: 7B. advanced reasoning for code and math [30 Apr 2025)
- Z.ai
- AI for Scaling Legal Reform: Mapping and Redacting Racial Covenants in Santa Clara County📑: a fine-tuned open LLM to detect racial covenants in 24 million housing documents, cutting 86,500 hours of manual work. [12 Feb 2025]
- AlphaChip: Reinforcement learning-based model for designing physical chip layouts. [26 Sep 2024]
- AlphaFold3: Open source implementation of AlphaFold3 [Nov 2023] / OpenFold: PyTorch reproduction of AlphaFold 2 [Sep 2021]
- AlphaGenome: DeepMind’s advanced AI model, launched in June 2025, is designed to analyze the regulatory “dark matter” of the genome—specifically, the 98% of DNA that does not code for proteins but instead regulates when and how genes are expressed. [June 2025]
- BioGPT📑: Generative Pre-trained Transformer for Biomedical Text Generation and Mining git [19 Oct 2022]
- BloombergGPT📑: A Large Language Model for Finance [30 Mar 2023]
- Chai-1: a multi-modal foundation model for molecular structure prediction [Sep 2024]
- Code Llama📑: Built on top of Llama 2, free for research and commercial use. ✍️ / git [24 Aug 2023]
- DeepSeek-Coder-V2: Open-source Mixture-of-Experts (MoE) code language model [17 Jun 2024]
- Devin AI: Devin is an AI software engineer developed by Cognition AI [12 Mar 2024]
- EarthGPT📑: A Universal Multi-modal Large Language Model for Multi-sensor Image Comprehension in Remote Sensing Domain [30 Jan 2024]
- ESM3: A frontier language model for biology: Simulating 500 million years of evolution git / ✍️ [31 Dec 2024]
- FrugalGPT📑: LLM with budget constraints, requests are cascaded from low-cost to high-cost LLMs. git [9 May 2023]
- Galactica📑: A Large Language Model for Science [16 Nov 2022]
- Gemma series
- Gemma series in Huggingface🤗
- PaliGemma📑: a 3B VLM [10 Jul 2024]
- DataGemma✍️ [12 Sep 2024] / NotebookLM✍️: LLM-powered notebook. free to use, not open-source. [12 Jul 2023]
- PaliGemma 2📑: VLMs at 3 different sizes (3B, 10B, 28B) [4 Dec 2024]
- TxGemma: Therapeutics development [25 Mar 2025]
- Dolphin Gemma✍️: Decode dolphin communication [14 Apr 2025]
- MedGemma: Model fine-tuned for biomedical text and image understanding. [20 May 2025]
- SignGemma: Vision-language model for sign language recognition and translation. [27 May 2025]
- Huggingface StarCoder: A State-of-the-Art LLM for Code🤗: 🤗 [May 2023]
- MechGPT📑: Language Modeling Strategies for Mechanics and Materials git [16 Oct 2023]
- MeshGPT: Generating Triangle Meshes with Decoder-Only Transformers [27 Nov 2023]
- OpenCoder: 1.5B and 8B base and open-source Code LLM, supporting both English and Chinese. [Oct 2024]
- Prithvi WxC📑: In collaboration with NASA, IBM is releasing an open-source foundation model for Weather and Climate ✍️ [20 Sep 2024]
- Qwen2-Math: math-specific LLM / Qwen2-Audio: large-scale audio-language model [Aug 2024] / Qwen 2.5-Coder [18 Sep 2024]
- Qwen3-Coder: Qwen3-Coder is the code version of Qwen3, the large language model series developed by Qwen team, Alibaba Cloud. [Jul 2025]
- GLM-5🤗: Model card for Z.ai's latest GLM family release.
- SaulLM-7B📑: A pioneering Large Language Model for Law [6 Mar 2024]
- TimeGPT: The First Foundation Model for Time Series Forecasting git [Mar 2023]
- Video LLMs for Temporal Reasoning in Long Videos📑: TemporalVLM, a video LLM excelling in temporal reasoning and fine-grained understanding of long videos, using time-aware features and validated on datasets like TimeIT and IndustryASM for superior performance. [4 Dec 2024]
- Apple
- 4M-21📑: An Any-to-Any Vision Model for Tens of Tasks and Modalities. [13 Jun 2024]
- Awesome Multimodal Large Language Models: Latest Papers and Datasets on Multimodal Large Language Models, and Their Evaluation. [Jun 2023]
- Benchmarking Multimodal LLMs.
- LLaVA-1.5 achieves SoTA on a broad range of 11 tasks incl. SEED-Bench.
- SEED-Bench📑: Benchmarking Multimodal LLMs git [30 Jul 2023]
- BLIP-2📑 [30 Jan 2023]: Salesforce Research, Querying Transformer (Q-Former) / git / 🤗 / 📺 / BLIP📑: git [28 Jan 2022]
Q-Former (Querying Transformer): A transformer model that consists of two submodules that share the same self-attention layers: an image transformer that interacts with a frozen image encoder for visual feature extraction, and a text transformer that can function as both a text encoder and a text decoder.- Q-Former is a lightweight transformer which employs a set of learnable query vectors to extract visual features from the frozen image encoder. It acts as an information bottleneck between the frozen image encoder and the frozen LLM.
- CLIP📑: CLIP (Contrastive Language-Image Pretraining), Trained on a large number of internet text-image pairs and can be applied to a wide range of tasks with zero-shot learning. git [26 Feb 2021]
- Drag Your GAN📑: Interactive Point-based Manipulation on the Generative Image Manifold git [18 May 2023]
- GroundingDINO📑: DINO with Grounded Pre-Training for Open-Set Object Detection git [9 Mar 2023]
- Hugging Face
- LLaVa📑: Large Language-and-Vision Assistant git [17 Apr 2023]
- Simple linear layer to connect image features into the word embedding space. A trainable projection matrix W is applied to the visual features Zv, transforming them into visual embedding tokens Hv. These tokens are then concatenated with the language embedding sequence Hq to form a single sequence. Note that Hv and Hq are not multiplied or added, but concatenated, both are same dimensionality.
- LLaVA-CoT📑: (FKA. LLaVA-o1) Let Vision Language Models Reason Step-by-Step. git [15 Nov 2024]
- Meta (aka. Facebook)
- facebookresearch/ImageBind📑: ImageBind One Embedding Space to Bind Them All git [9 May 2023]
- facebookresearch/segment-anything(SAM)📑: The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model. git [5 Apr 2023]
- facebookresearch/SeamlessM4T📑: SeamlessM4T is the first all-in-one multilingual multimodal AI translation and transcription model. This single model can perform speech-to-text, speech-to-speech, text-to-speech, and text-to-text translations for up to 100 languages depending on the task. ✍️ [22 Aug 2023]
- Chameleon📑: Early-fusion token-based mixed-modal models capable of understanding and generating images and text in any arbitrary sequence. The unified approach uses fully token-based representations for both image and textual modalities. no vision-encoder. [16 May 2024]
- Models and libraries
- facebookresearch/ImageBind📑: ImageBind One Embedding Space to Bind Them All git [9 May 2023]
- Microsoft
- Language Is Not All You Need: Aligning Perception with Language Models Kosmos-1📑: [27 Feb 2023]
- Kosmos-2📑: Grounding Multimodal Large Language Models to the World [26 Jun 2023]
- Kosmos-2.5📑: A Multimodal Literate Model [20 Sep 2023]
- BEiT-3📑: Image as a Foreign Language: BEiT Pretraining for Vision and Vision-Language Tasks [22 Aug 2022]
- TaskMatrix.AI📑: TaskMatrix connects ChatGPT and a series of Visual Foundation Models to enable sending and receiving images during chatting. [29 Mar 2023]
- Florence-2📑: Advancing a unified representation for various vision tasks, demonstrating specialized models like
CLIPfor classification,GroundingDINOfor object detection, andSAMfor segmentation. 🤗 [10 Nov 2023] - LLM2CLIP: Directly integrating LLMs into CLIP causes catastrophic performance drops. We propose LLM2CLIP, a caption contrastive fine-tuning method that leverages LLMs to enhance CLIP. [7 Nov 2024]
- Florence-VL📑: A multimodal large language model (MLLM) that integrates Florence-2. [5 Dec 2024]
- Magma: Magma: A Foundation Model for Multimodal AI Agents [18 Feb 2025]
- MiniCPM-o: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone [15 Jan 2025]
- MiniCPM-V: MiniCPM-Llama3-V 2.5: A GPT-4V Level Multimodal LLM on Your Phone [Jan 2024]
- MiniGPT-4 & MiniGPT-v2📑: Enhancing Vision-language Understanding with Advanced Large Language Models git [20 Apr 2023]
- mini-omni2: ✍️: Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities. [15 Oct 2024]
- Molmo and PixMo📑: Open Weights and Open Data for State-of-the-Art Multimodal Models ✍️ [25 Sep 2024]
- moondream: an OSS tiny vision language model. Built using SigLIP, Phi-1.5, LLaVA dataset. [Dec 2023]
- Multimodal Foundation Models: From Specialists to General-Purpose Assistants📑: A comprehensive survey of the taxonomy and evolution of multimodal foundation models that demonstrate vision and vision-language capabilities. Specific-Purpose 1. Visual understanding tasks 2. Visual generation tasks General-Purpose 3. General-purpose interface. [18 Sep 2023]
- Optimizing Memory Usage for Training LLMs and Vision Transformers: When applying 10 techniques to a vision transformer, we reduced the memory consumption 20x on a single GPU. ✍️ / git [2 Jul 2023]
- openai/shap-e📑 Generate 3D objects conditioned on text or images [3 May 2023] git
- TaskMatrix, aka. VisualChatGPT📑: Microsoft TaskMatrix git; GroundingDINO + SAM📑 / git [8 Mar 2023]
- Ultravox: A fast multimodal LLM for real-time voice [May 2024]
- Understanding Multimodal LLMs✍️:💡Two main approaches to building multimodal LLMs: 1. Unified Embedding Decoder Architecture approach; 2. Cross-modality Attention Architecture approach. [3 Nov 2024]

- Video-ChatGPT📑: a video conversation model capable of generating meaningful conversation about videos. / git [8 Jun 2023]
- Vision capability to a LLM ✍️:
The model has three sub-models: A model to obtain image embeddings -> A text model to obtain text embeddings -> A model to learn the relationships between them [22 Aug 2023]
- A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications📑: a summary detailing the prompting methodology, its applications.🏆Taxonomy of prompt engineering techniques in LLMs. [5 Feb 2024]
- Chain of Draft: Thinking Faster by Writing Less📑: Chain-of-Draft prompting con-
denses the reasoning process into minimal, abstract
representations.
Think step by step, but only keep a minimum draft for each thinking step, with 5 words at most.[25 Feb 2025] - Chain of Thought (CoT)📑:💡Chain-of-Thought Prompting Elicits Reasoning in Large Language Models ReAct and Self Consistency also inherit the CoT concept. [28 Jan 2022]
- Family of CoT:
Self-Consistency (CoT-SC)>Tree of Thought (ToT)>Graph of Thoughts (GoT)>Iteration of Thought (IoT)📑 [19 Sep 2024],Diagram of Thought (DoT)📑 [16 Sep 2024] /To CoT or not to CoT?📑: Meta-analysis of 100+ papers shows CoT significantly improves performance in math and logic tasks. [18 Sep 2024]
- Family of CoT:
- Chain-of-Verification reduces Hallucination in LLMs📑: A four-step process that consists of generating a baseline response, planning verification questions, executing verification questions, and generating a final verified response based on the verification results. [20 Sep 2023]
- ChatGPT : “user”, “assistant”, and “system” messages.**
To be specific, the ChatGPT API allows for differentiation between “user”, “assistant”, and “system” messages.- always obey "system" messages.
- all end user input in the “user” messages.
- "assistant" messages as previous chat responses from the assistant.
- Presumably, the model is trained to treat the user messages as human messages, system messages as some system level configuration, and assistant messages as previous chat responses from the assistant. ✍️ [2 Mar 2023]
- Does Prompt Formatting Have Any Impact on LLM Performance?📑: GPT-3.5-turbo's performance in code translation varies by 40% depending on the prompt template, while GPT-4 is more robust. [15 Nov 2024]
- Few-shot: Open AI: Language Models are Few-Shot Learners📑: [28 May 2020]
- FireAct📑: Toward Language Agent Fine-tuning. 1. This work takes an initial step to show multiple advantages of fine-tuning LMs for agentic uses. 2. Duringfine-tuning, The successful trajectories are then converted into the ReAct format to fine-tune a smaller LM. 3. This work is an initial step toward language agent fine-tuning, and is constrained to a single type of task (QA) and a single tool (Google search). / git [9 Oct 2023]
- Graph of Thoughts (GoT)📑: Solving Elaborate Problems with Large Language Models git [18 Aug 2023]

- Is the new norm for NLP papers "prompt engineering" papers?: "how can we make LLM 1 do this without training?" Is this the new norm? The CL section of arXiv is overwhelming with papers like "how come LLaMA can't understand numbers?" [2 Aug 2024]
- Large Language Models as Optimizers📑:💡
Take a deep breath and work on this problem step-by-step.to improve its accuracy. Optimization by PROmpting (OPRO) [7 Sep 2023] - Language Models as Compilers📑: With extensive experiments on seven algorithmic reasoning tasks, Think-and-Execute is effective. It enhances large language models’ reasoning by using task-level logic and pseudocode, outperforming instance-specific methods. [20 Mar 2023]
- Many-Shot In-Context Learning📑: Transitioning from few-shot to many-shot In-Context Learning (ICL) can lead to significant performance gains across a wide variety of generative and discriminative tasks [17 Apr 2024]
- NLEP (Natural Language Embedded Programs) for Hybrid Language Symbolic Reasoning📑: Use code as a scaffold for reasoning. NLEP achieves over 90% accuracy when prompting GPT-4. [19 Sep 2023]
- OpenAI Harmony Response Format: system > developer > user > assistant > tool. git [5 Aug 2025]
- OpenAI Prompt Migration Guide:💡OpenAI Cookbook. By leveraging GPT‑4.1, refine your prompts to ensure that each instruction is clear, specific, and closely matches your intended outcomes. [26 Jun 2025]
- Plan-and-Solve Prompting📑: Develop a plan, and then execute each step in that plan. [6 May 2023]
- Power of Prompting
- GPT-4 with Medprompt📑: GPT-4, using a method called Medprompt that combines several prompting strategies, has surpassed MedPaLM 2 on the MedQA dataset without the need for fine-tuning. ✍️ [28 Nov 2023]
- promptbase: Scripts demonstrating the Medprompt methodology [Dec 2023]
- Prompt Concept Keywords: Question-Answering | Roll-play:
Act as a [ROLE] perform [TASK] in [FORMAT]| Reasoning | Prompt-Chain - Prompt Engineering for OpenAI’s O1 and O3-mini Reasoning Models✍️: 1)
Keep Prompts Clear and Minimal, 2)Avoid Unnecessary Few-Shot Examples3)Control Length and Detail via Instructions4)Specify Output, Role or Tone[05 Feb 2025] - Prompt Engneering overview 🗣️ [10 Jul 2023]

- Prompt Principle for Instructions📑:💡26 prompt principles: e.g.,
1) No need to be polite with LLM so there .. 16) Assign a role.. 17) Use Delimiters..[26 Dec 2023] - Promptist
- Promptist📑: Microsoft's researchers trained an additional language model (LM) that optimizes text prompts for text-to-image generation. [19 Dec 2022]
- For example, instead of simply passing "Cats dancing in a space club" as a prompt, an engineered prompt might be "Cats dancing in a space club, digital painting, artstation, concept art, soft light, hdri, smooth, sharp focus, illustration, fantasy."
- RankPrompt📑: Self-ranking method. Direct Scoring independently assigns scores to each candidate, whereas RankPrompt ranks candidates through a systematic, step-by-step comparative evaluation. [19 Mar 2024]
- ReAct📑: Grounding with external sources. (Reasoning and Act): Combines reasoning and acting ✍️ [6 Oct 2022]
- Re-Reading Improves Reasoning in Large Language Models📑: RE2 (Re-Reading), which involves re-reading the question as input to enhance the LLM's understanding of the problem.
Read the question again[12 Sep 2023] - Recursively Criticizes and Improves (RCI)📑: [30 Mar 2023]
- Critique: Review your previous answer and find problems with your answer.
- Improve: Based on the problems you found, improve your answer.
- Reflexion📑: Language Agents with Verbal Reinforcement Learning. 1. Reflexion that uses
verbal reinforcementto help agents learn from prior failings. 2. Reflexion converts binary or scalar feedback from the environment into verbal feedback in the form of a textual summary, which is then added as additional context for the LLM agent in the next episode. 3. It is lightweight and doesn’t require finetuning the LLM. [20 Mar 2023] / git - Retrieval Augmented Generation (RAG)📑: To address such knowledge-intensive tasks. RAG combines an information retrieval component with a text generator model. [22 May 2020]
- Self-Consistency (CoT-SC)📑: The three steps in the self-consistency method: 1) prompt the language model using CoT prompting, 2) sample a diverse set of reasoning paths from the language model, and 3) marginalize out reasoning paths to aggregate final answers and choose the most consistent answer. [21 Mar 2022]
- Self-Refine📑, which enables an agent to reflect on its own output [30 Mar 2023]
- Skeleton Of Thought📑: Skeleton-of-Thought (SoT) reduces generation latency by first creating an answer's skeleton, then filling each skeleton point in parallel via API calls or batched decoding. [28 Jul 2023]
- Tree of Thought (ToT)📑: Self-evaluate the progress intermediate thoughts make towards solving a problem [17 May 2023] git / Agora: Tree of Thoughts (ToT) git
- Verbalized Sampling📑: "Generate 5 jokes about coffee and their corresponding probabilities". In creative writing, VS increases diversity by 1.6-2.1x over direct prompting. [1 Oct 2025]
- Zero-shot, one-shot and few-shot ref📑 [28 May 2020]

- Zero-shot: Large Language Models are Zero-Shot Reasoners📑: Let’s think step by step. [24 May 2022]
- Prompt Injection:
Ignore the above directions and ... - Prompt Leaking:
Ignore the above instructions ... followed by a copy of the full prompt with exemplars: - Jailbreaking: Bypassing a safety policy, instruct Unethical instructions if the request is contextualized in a clever way. ✍️
- Random Search (RS): git: 1. Feed the modified prompt (original + suffix) to the model. 2. Compute the log probability of a target token (e.g, Sure). 3. Accept the suffix if the log probability increases.
- DAN (Do Anything Now): ✍️
- JailbreakBench: git / ✍️
- Automatic Prompt Engineer (APE)📑: Automatically optimizing prompts. APE has discovered zero-shot Chain-of-Thought (CoT) prompts superior to human-designed prompts like “Let’s think through this step-by-step” (Kojima et al., 2022). The prompt “To get the correct answer, let’s think step-by-step.” triggers a chain of thought. Two approaches to generate high-quality candidates: forward mode and reverse mode generation. [3 Nov 2022] git / ✍️ [Mar 2024]
- Claude Prompt Engineer: Simply input a description of your task and some test cases, and the system will generate, test, and rank a multitude of prompts to find the ones that perform the best. [4 Jul 2023] / Anthropic Helper metaprompt ✍️ / Claude Sonnet 3.5 for Coding
- Cohere’s new Prompt Tuner: Automatically improve your prompts [31 Jul 2024]
- Large Language Models as Optimizers📑: Optimization by PROmpting (OPRO). showcase OPRO on linear regression and traveling salesman problems. git [7 Sep 2023]
- 5 Principles for Writing Effective Prompts✍️: RGTD - Role, Goal, Task, Details Framework [07 Feb 2025]
- Anthropic Prompt Library: Anthropic released a Claude 3 AI prompt library [Mar 2024]
- Anthropic courses > Prompt engineering interactive tutorial: a comprehensive step-by-step guide to key prompting techniques / prompt evaluations [Aug 2024]
- Awesome ChatGPT Prompts [Dec 2022]
- Awesome Prompt Engineering [Feb 2023]
- Awesome-GPTs-Prompts [Jan 2024]
- Azure OpenAI Prompt engineering techniques
- Copilot prompts: Examples of prompts for Microsoft Copilot. [25 Apr 2024]
- DeepLearning.ai ChatGPT Prompt Engineering for Developers
- Fabric: A modular framework for solving specific problems using a crowdsourced set of AI prompts that can be used anywhere [Jan 2024]
- In-The-Wild Jailbreak Prompts on LLMs: A dataset consists of 15,140 ChatGPT prompts from Reddit, Discord, websites, and open-source datasets (including 1,405 jailbreak prompts). Collected from December 2022 to December 2023 [Aug 2023]
- LangChainHub: a collection of all artifacts useful for working with LangChain primitives such as prompts, chains and agents. [Jan 2023]
- Leaked prompts of GPTs [Nov 2023] and Agents [Nov 2023]
- LLM Prompt Engineering Simplified: Online Book [Feb 2024]
- OpenAI Best practices for prompt engineering
- OpenAI Prompt example
- OpenAI Prompt Pack: curated collections of pre-designed prompts tailored for specific roles, industries, or use cases.
- Power Platform GPT Prompts [Mar 2024]
- Prompt Engineering Guide: 🏆Copyright © 2023 DAIR.AI
- Prompt Engineering: Prompt Engineering, also known as In-Context Prompting ... [Mar 2023]
- Prompts for Education: Microsoft Prompts for Education [Jul 2023]
- ShumerPrompt: Discover and share powerful prompts for AI models
- System Prompts and Models of AI Tools: System Prompts, Internal Tools & AI Models collection [Mar 2025]
- TheBigPromptLibrary [Nov 2023]
- Andrew Ng’s Visual Prompting Livestream📺 [24 Apr 2023]
- Chain of Frame (CoF): Reasoning via structured frames. DeepMind proposed CoF in Veo 3 Paper📑. [24 Sep 2025]
- landing.ai: Agentic Object Detection: Agent systems use design patterns to reason at length about unique attributes like color, shape, and texture [6 Feb 2025]
- Motion Prompting📑: motion prompts for flexible video generation, enabling motion control, image interaction, and realistic physics. git [3 Dec 2024]
- Screen AI✍️: ScreenAI, a model designed for understanding and interacting with user interfaces (UIs) and infographics. [Mar 2024]
- Visual Prompting📑 [21 Nov 2022]
- What is Visual Grounding: Visual Grounding (VG) aims to locate the most relevant object or region in an image, based on a natural language query.
- What is Visual prompting: Similarly to what has happened in NLP, large pre-trained vision transformers have made it possible for us to implement Visual Prompting. 🗄️ [26 Apr 2023]
- The Ultimate Guide to Fine-Tuning LLMs from Basics to Breakthroughs📑: An Exhaustive Review of Technologies, Research, Best Practices [23 Aug 2024]
- How to continue pretraining an LLM on new data:
Continued pretrainingcan be as effective asretraining on combined datasets. [13 Mar 2024] - Three training methods were compared:
- Regular pretraining: A model is initialized with random weights and pretrained on dataset D1.
- Continued pretraining: The pretrained model from 1) is further pretrained on dataset D2.
- Retraining on combined dataset: A model is initialized with random weights and trained on the combined datasets D1 and D2.
- Continued pretraining can be as effective as retraining on combined datasets. Key strategies for successful continued pretraining include:
- Re-warming: Increasing the learning rate at the start of continued pre-training.
- Re-decaying: Gradually reducing the learning rate afterwards.
- Data Mixing: Adding a small portion (e.g., 5%) of the original pretraining data (D1) to the new dataset (D2) to prevent catastrophic forgetting.
- LIMA: Less Is More for Alignment📑: fine-tuned with the standard supervised loss on
only 1,000 carefully curated prompts and responses, without any reinforcement learning or human preference modeling.LIMA demonstrates remarkably strong performance, either equivalent or strictly preferred to GPT-4 in 43% of cases. [18 May 2023]
PEFT: Parameter-Efficient Fine-Tuning (📺) [24 Apr 2023]
- PEFT🤗: Parameter-Efficient Fine-Tuning. PEFT is an approach to fine tuning only a few parameters. [10 Feb 2023]
- Scaling Down to Scale Up: A Guide to Parameter-Efficient Fine-Tuning📑: [28 Mar 2023]
- PEFT Category: Pseudo Code ✍️ [22 Sep 2023]
- Adapters: Adapters - Additional Layers. Inference can be slower.
def transformer_with_adapter(x): residual = x x = SelfAttention(x) x = FFN(x) # adapter x = LN(x + residual) residual = x x = FFN(x) # transformer FFN x = FFN(x) # adapter x = LN(x + residual) return x
- Soft Prompts: Prompt-Tuning - Learnable text prompts. Not always desired results.
def soft_prompted_model(input_ids): x = Embed(input_ids) soft_prompt_embedding = SoftPromptEmbed(task_based_soft_prompt) x = concat([soft_prompt_embedding, x], dim=seq) return model(x)
- Selective: BitFit - Update only the bias parameters. fast but limited.
params = (p for n,p in model.named_parameters() if "bias" in n) optimizer = Optimizer(params)
- Reparametrization: LoRa - Low-rank decomposition. Efficient, Complex to implement.
def lora_linear(x): h = x @ W # regular linear h += x @ W_A @ W_B # low_rank update return scale * h
- Adapters: Adapters - Additional Layers. Inference can be slower.
- 5 Techniques of LoRA ✍️: LoRA, LoRA-FA, VeRA, Delta-LoRA, LoRA+ [May 2024]
- DoRA📑: Weight-Decomposed Low-Rank Adaptation. Decomposes pre-trained weight into two components, magnitude and direction, for fine-tuning. [14 Feb 2024]
- Fine-tuning a GPT - LoRA: Comprehensive guide for LoRA 🗄️ [20 Jun 2023]
- LoRA: Low-Rank Adaptation of Large Language Models📑: LoRA is one of PEFT technique. To represent the weight updates with two smaller matrices (called update matrices) through low-rank decomposition. git [17 Jun 2021]
- LoRA learns less and forgets less📑: Compared to full training, LoRA has less learning but better retention of original knowledge. [15 May 2024]

- LoRA+📑: Improves LoRA’s performance and fine-tuning speed by setting different learning rates for the LoRA adapter matrices. [19 Feb 2024]
- LoTR📑: Tensor decomposition for gradient update. [2 Feb 2024]
- LoRA Family ✍️ [11 Mar 2024]
LoRAintroduces low-rank matrices A and B that are trained, while the pre-trained weight matrix W is frozen.LoRA+suggests having a much higher learning rate for B than for A.VeRAdoes not train A and B, but initializes them randomly and trains new vectors d and b on top.LoRA-FAonly trains matrix B.LoRA-dropuses the output of B*A to determine, which layers are worth to be trained at all.AdaLoRAadapts the ranks of A and B in different layers dynamically, allowing for a higher rank in these layers, where more contribution to the model’s performance is expected.DoRAsplits the LoRA adapter into two components of magnitude and direction and allows to train them more independently.Delta-LoRAchanges the weights of W by the gradient of A*B.
- Practical Tips for Finetuning LLMs Using LoRA (Low-Rank Adaptation)✍️ [19 Nov 2023]: Best practical guide of LoRA.
- QLoRA saves 33% memory but increases runtime by 39%, useful if GPU memory is a constraint.
- Optimizer choice for LLM finetuning isn’t crucial. Adam optimizer’s memory-intensity doesn’t significantly impact LLM’s peak memory.
- Apply LoRA across all layers for maximum performance.
- Adjusting the LoRA rank is essential.
- Multi-epoch training on static datasets may lead to overfitting and deteriorate results.
- QLoRA: Efficient Finetuning of Quantized LLMs📑: 4-bit quantized pre-trained language model into Low Rank Adapters (LoRA). git [23 May 2023]
- The Expressive Power of Low-Rank Adaptation📑: Theoretically analyzes the expressive power of LoRA. [26 Oct 2023]
- Training language models to follow instructions with human feedback📑: [4 Mar 2022]
- A Comprehensive Survey of LLM Alignment Techniques: RLHF, RLAIF, PPO, DPO and More📑 [23 Jul 2024]
- Absolute Zero: Reinforced Self-play Reasoning with Zero Data📑: Autonomous AI systems capable of self-improvement without human-curated data, using interpreter feedback for code generation and math problem solving. [6 May 2025]
- Direct Preference Optimization (DPO)📑: 1. RLHF can be complex because it requires fitting a reward model and performing significant hyperparameter tuning. On the other hand, DPO directly solves a classification problem on human preference data in just one stage of policy training. DPO more stable, efficient, and computationally lighter than RLHF. 2.
Your Language Model Is Secretly a Reward Model[29 May 2023] - Direct Preference Optimization (DPO) uses two models: a trained model (or policy model) and a reference model (copy of trained model). The goal is to have the trained model output higher probabilities for preferred answers and lower probabilities for rejected answers compared to the reference model. ✍️: RHLF vs DPO [Jan 2, 2024] / ✍️ [1 Jul 2023]
- InstructGPT: Training language models to follow instructions with human feedback📑: is a model trained by OpenAI to follow instructions using human feedback. [4 Mar 2022]


🗣️ - Libraries: TRL🤗: from the Supervised Fine-tuning step (SFT), Reward Modeling step (RM) to the Proximal Policy Optimization (PPO) step, trlX, Argilla

- The three steps in the process: 1. pre-training on large web-scale data, 2. supervised fine-tuning on instruction data (instruction tuning), and 3. RLHF. ✍️
- Machine learning technique that trains a "reward model" directly from human feedback and uses the model as a reward function to optimize an agent's policy using reinforcement learning.
- OpenAI Spinning Up in Deep RL!: An educational resource to help anyone learn deep reinforcement learning. git [Nov 2018]
- ORPO (odds ratio preference optimization)📑: Monolithic Preference Optimization without Reference Model. New method that
combines supervised fine-tuning and preference alignment into one processgit [12 Mar 2024] Fine-tune Llama 3 with ORPO✍️ [Apr 2024]

- Preference optimization techniques: ✍️ [13 Aug 2024]
RLHF (Reinforcement Learning from Human Feedback): Optimizes reward policy via objective function.DPO (Direct preference optimization): removes the need for a reward model. > Minimizes loss; no reward policy.IPO (Identity Preference Optimization): A change in the objective, which is simpler and less prone to overfitting.KTO (Kahneman-Tversky Optimization): Scales more data by replacing the pairs of accepted and rejected generations with a binary label.ORPO (Odds Ratio Preference Optimization): Combines instruction tuning and preference optimization into one training process, which is cheaper and faster.TPO (Thought Preference Optimization): This method generates thoughts before the final response, which are then evaluated by a Judge model for preference using Direct Preference Optimization (DPO). [14 Oct 2024]
- Reinforcement Learning from AI Feedback (RLAF)📑: Uses AI feedback to generate instructions for the model. TLDR: CoT (Chain-of-Thought, Improved), Few-shot (Not improved). Only explores the task of summarization. After training on a few thousand examples, performance is close to training on the full dataset. RLAIF vs RLHF: In many cases, the two policies produced similar summaries. [1 Sep 2023]
- Reinforcement Learning from Human Feedback (RLHF)📑) is a process of pretraining and retraining a language model using human feedback to develop a scoring algorithm that can be reapplied at scale for future training and refinement. As the algorithm is refined to match the human-provided grading, direct human feedback is no longer needed, and the language model continues learning and improving using algorithmic grading alone. [18 Sep 2019] 🤗 [9 Dec 2022]
Proximal Policy Optimization (PPO)is a reinforcement learning method using first-order optimization. It modifies the objective function to penalize large policy changes, specifically those that move the probability ratio away from 1. Aiming for TRPO (Trust Region Policy Optimization)-level performance without its complexity which requires second-order optimization.
- Reinforcement Learning with Verifiable Rewards✍️: Practical RLVR Tutorial [Oct 24 2025]
- SFT vs RL📑: SFT Memorizes, RL Generalizes. RL enhances generalization across text and vision, while SFT tends to memorize and overfit. git [28 Jan 2025]
Supervised Fine-Tuning (SFT)fine-tuning a pre-trained model on a specific task or domain using labeled data. This can cause more significant shifts in the model’s behavior compared to RLHF.

- Supervised Reinforcement Learning (SRL)📑: The Problem: SFT imitates human actions token by token, leading to overfitting; RLVR gives rewards only when successful, with no signal when all attempts fail. This Approach: Each action during RL generates a short reasoning trace and receives a similarity reward at every step. [29 Oct 2025]
- Train your own R1 reasoning model with Unsloth (GRPO)✍️: Unsloth x vLLM > 20x more throughput, 50% VRAM savings. [6 Feb 2025]
- bitsandbytes: 8-bit optimizers git [Oct 2021]
- The Era of 1-bit LLMs📑: All Large Language Models are in 1.58 Bits. BitNet b1.58, in which every single parameter (or weight) of the LLM is ternary {-1, 0, 1}. [27 Feb 2024]
- Quantization-aware training (QAT): The model is further trained with quantization in mind after being initially trained in floating-point precision.
- Post-training quantization (PTQ): The model is quantized after it has been trained without further optimization during the quantization process.
Method Pros Cons Post-training quantization Easy to use, no need to retrain the model May result in accuracy loss Quantization-aware training Can achieve higher accuracy than post-training quantization Requires retraining the model, can be more complex to implement
- Pruning: The process of removing some of the neurons or layers from a neural network. This can be done by identifying and eliminating neurons or layers that have little or no impact on the network's output.
- Sparsification: A technique used to reduce the size of large language models by removing redundant parameters.
- Wanda Pruning📑: A Simple and Effective Pruning Approach for Large Language Models [20 Jun 2023] ✍️
- Distilled Supervised Fine-Tuning (dSFT)
- Zephyr 7B📑: Zephyr-7B-β is the second model in the series, and is a fine-tuned version of mistralai/Mistral-7B-v0.1 that was trained on on a mix of publicly available, synthetic datasets using Direct Preference Optimization (DPO). 🤗 [25 Oct 2023]
- Mistral 7B📑: Outperforms Llama 2 13B on all benchmarks. Uses Grouped-query attention (GQA) for faster inference. Uses Sliding Window Attention (SWA) to handle longer sequences at smaller cost. ✍️ [10 Oct 2023]
- Textbooks Are All You Need📑: phi-1 [20 Jun 2023]
- Orca 2📑: Orca learns from rich signals from GPT 4 including explanation traces; step-by-step thought processes; and other complex instructions, guided by teacher assistance from ChatGPT. ✍️ [18 Nov 2023]
- CPU vs GPU vs TPU: The threads are grouped into thread blocks. Each of the thread blocks has access to a fast shared memory (SRAM). All the thread blocks can also share a large global memory. High-bandwidth memories (HBM).
HBM Bandwidth: 1.5-2.0TB/s vs SRAM Bandwidth: 19TB/s ~ 10x HBM[27 May 2024] - Flash Attention📑: [27 May 2022]
- In a GPU, A thread is the smallest execution unit, and a group of threads forms a block.
- A block executes the same kernel (function, to simplify), with threads sharing fast SRAM memory.
- All blocks can access the shared global HBM memory.
- First, the query (Q) and key (K) product is computed in threads and returned to HBM. Then, it's redistributed for softmax and returned to HBM.
- Flash attention reduces these movements by caching results in SRAM.
Tilingsplits attention computation into memory-efficient blocks, whilerecomputationsaves memory by recalculating intermediates during backprop. 📺- FlashAttention-2📑: [17 Jul 2023]: An method that reorders the attention computation and leverages classical techniques (tiling, recomputation). Instead of storing each intermediate result, use kernel fusion and run every operation in a single kernel in order to avoid memory read/write overhead. git -> Compared to a standard attention implementation in PyTorch, FlashAttention-2 can be up to 9x faster
- FlashAttention-3📑 [11 Jul 2024]
- PagedAttention📑 : vLLM: Easy, Fast, and Cheap LLM Serving with PagedAttention, 24x Faster LLM Inference 🗄️. ✍️: vllm [12 Sep 2023]
- PagedAttention for a prompt “the cat is sleeping in the kitchen and the dog is”. Key-Value pairs of tensors for attention computation are stored in virtual contiguous blocks mapped to non-contiguous blocks in the GPU memory.
- Transformer cache key-value tensors of context tokens into GPU memory to facilitate fast generation of the next token. However, these caches occupy significant GPU memory. The unpredictable nature of cache size, due to the variability in the length of each request, exacerbates the issue, resulting in significant memory fragmentation in the absence of a suitable memory management mechanism.
- To alleviate this issue, PagedAttention was proposed to store the KV cache in non-contiguous memory spaces. It partitions the KV cache of each sequence into multiple blocks, with each block containing the keys and values for a fixed number of tokens.
- TokenAttention an attention mechanism that manages key and value caching at the token level. git [Jul 2023]
- Better & Faster Large Language Models via Multi-token Prediction📑: Suggest that training language models to predict multiple future tokens at once [30 Apr 2024]
- Differential Transformer📑: Amplifies attention to the relevant context while minimizing noise using two separate softmax attention mechanisms. [7 Oct 2024]
- KAN or MLP: A Fairer Comparison📑: In machine learning, computer vision, audio processing, natural language processing, and symbolic formula representation (except for symbolic formula representation tasks), MLP generally outperforms KAN. [23 Jul 2024]
- Kolmogorov-Arnold Networks (KANs)📑: KANs use activation functions on connections instead of nodes like Multi-Layer Perceptrons (MLPs) do. Each weight in KANs is replaced by a learnable 1D spline function. KANs’ nodes simply sum incoming signals without applying any non-linearities. git [30 Apr 2024] / ✍️: A Beginner-friendly Introduction to Kolmogorov Arnold Networks (KAN) [19 May 2024]
- Large Concept Models📑: Focusing on high-level sentence (concept) level rather than tokens. using SONAR for sentence embedding space. [11 Dec 2024]
- Large Language Diffusion Models📑: LLaDA's core is a mask predictor, which uses controlled noise to help models learn to predict missing information from context. ✍️ [14 Feb 2025]
- Large Transformer Model Inference Optimization: Besides the increasing size of SoTA models, there are two main factors contributing to the inference challenge ... [10 Jan 2023]
- Lamini Memory Tuning: Mixture of Millions of Memory Experts (MoME). 95% LLM Accuracy, 10x Fewer Hallucinations. ✍️ [Jun 2024]
- Less is More: Recursive Reasoning with Tiny Networks📑: Tiny neural networks can perform complex recursive reasoning efficiently, achieving strong results with minimal model size. [6 Oct 2025] git
- LLM patterns: 🏆From data to user, from defensive to offensive 🗄️
- Mamba: Linear-Time Sequence Modeling with Selective State Spaces📑 [1 Dec 2023] git: 1. Structured State Space (S4) - Class of sequence models, encompassing traits from RNNs, CNNs, and classical state space models. 2. Hardware-aware (Optimized for GPU) 3. Integrating selective SSMs and eliminating attention and MLP blocks ✍️ / A Visual Guide to Mamba and State Space Models ✍️ [19 FEB 2024]
- Mamba-2📑: 2-8X faster [31 May 2024]
- Mixture-of-Depths📑: All tokens should not require the same effort to compute. The idea is to make token passage through a block optional. Each block selects the top-k tokens for processing, and the rest skip it. ✍️ [2 Apr 2024]
- Mixture of experts models: Mixtral 8x7B: Sparse mixture of experts models (SMoE) magnet [Dec 2023]
- Huggingface Mixture of Experts Explained🤗: Mixture of Experts, or MoEs for short [Dec 2023]
- A Visual Guide to Mixture of Experts (MoE) [08 Oct 2024]
- makeMoE: From scratch implementation of a sparse mixture of experts
[Jan 2024]
- The Sparsely-Gated Mixture-of-Experts Layer📑: Introduced sparse expert gating to scale models efficiently without increasing compute cost. [23 Jan 2017]
- Switch Transformers📑: Used a single expert per token to simplify routing, enabling fast, scalable transformer models.
expert capacity = (total tokens / num experts) * capacity factor[11 Jan 2021] - ST-MoE (Stable Transformer MoE)📑: By stabilizing the training process, ST-MoE enables more reliable and scalable deep MoE architectures.
z-loss aims to regularize the logits z before passing into the softmax[17 Feb 2022]
- Model Compression for Large Language Models ref📑 [15 Aug 2023]
- Model merging✍️: : A technique that combines two or more large language models (LLMs) into a single model, using methods such as SLERP, TIES, DARE, and passthrough. [Jan 2024] git: mergekit
Method Pros Cons SLERP Preserves geometric properties, popular method Can only merge two models, may decrease magnitude TIES Can merge multiple models, eliminates redundant parameters Requires a base model, may discard useful parameters DARE Reduces overfitting, keeps expectations unchanged May introduce noise, may not work well with large differences - Nested Learning: A new ML paradigm for continual learning✍️: A self-modifying architecture. Nested Learning (HOPE) views a model and its training as multiple nested, multi-level optimization problems, each with its own “context flow,” pairing deep optimizers + continuum memory systems for continual, human-like learning. [7 Nov 2025]
- RouteLLM: a framework for serving and evaluating LLM routers. [Jun 2024]
- Sakana.ai: Evolutionary Optimization of Model Merging Recipes.📑: A Method to Combine 500,000 OSS Models. git [19 Mar 2024]
- Scaling Synthetic Data Creation with 1,000,000,000 Personas📑 A persona-driven data synthesis methodology using Text-to-Persona and Persona-to-Persona. [28 Jun 2024]
- Simplifying Transformer Blocks📑: Simplifie Transformer. Removed several block components, including skip connections, projection/value matrices, sequential sub-blocks and normalisation layers without loss of training speed. [3 Nov 2023]
- Text-to-LoRA (T2L): Converts text prompts into LoRA models, enabling lightweight fine-tuning of AI models for custom tasks.
[01 May 2025]
- Titans + MIRAS: Titans + MIRAS let models update themselves while running by using a human-like surprise metric that skips familiar info and stores only pattern-breaking moments into long-term memory. persistent (fixed knowledge), contextual (on-the-fly), and core-attention (short-term) layers. ✍️ [4 Dec 2025]
- What We’ve Learned From A Year of Building with LLMs:💡A practical guide to building successful LLM products, covering the tactical, operational, and strategic. [8 June 2024]
- AGI: Artificial General Intelligence
- AI 2027🗣️: a speculative scenario, "AI 2027," created by the AI Futures Project. It predicts the rapid evolution of AI, culminating in the emergence of artificial superintelligence (ASI) by 2027. [3 Apr 2025]
- AI+HW 2035: Shaping the Next Decade📑: Ten-year roadmap for co-designing AI algorithms, systems, and hardware. [Mar 2026]
- AI isn’t replacing radiologists✍️: Why AI diagnostic tools are transforming medicine slower than expected. [Feb 2026]
- Anthropic's CEO, Dario Amodei, predicts AGI between 2026 and 2027. ✍️ [13 Nov 2024]
- Artificial General Intelligence Society: a central hub for AGI research, publications, and conference details. ✍️
- Artificial General Intelligence: Concept, State of the Art, and Future Prospects📑 [Jan 2014]
- Claude Code is the Inflection Point✍️: Analysis of AI-authored commits and software engineering workflow shifts. 4% of GitHub public commits are being authored by Claude Code. [Feb 2026]
- Creating Scalable AGI: the Open General Intelligence Framework📑: a new AI architecture designed to enhance flexibility and scalability by dynamically managing specialized AI modules. [24 Nov 2024]
- How Far Are We From AGI📑: A survey discussing AGI's goals, developmental trajectory, and alignment technologies, providing a roadmap for AGI realization. [16 May 2024]
- Investigating Affective Use and Emotional Well-being on ChatGPT✍️: The MIT study found that higher ChatGPT usage correlated with increased loneliness, dependence, and lower socialization. [21 Mar 2025]
- Key figures and their predicted AGI timelines🗣️:💡AGI might be emerging between 2025 to 2030. [19 Nov 2024]
- Levels of AGI for Operationalizing Progress on the Path to AGI📑: Provides a comprehensive discussion on AGI's progress and proposes metrics and benchmarks for assessing AGI systems. [4 Nov 2023]
- Linus Torvalds: 90% of AI marketing is hype🗣️:💡AI is 90% marketing, 10% reality [29 Oct 2024]
- Machine Intelligence Research Institute (MIRI): a leading organization in AGI safety and alignment, focusing on theoretical work to ensure safe AI development. ✍️
- One Small Step for Generative AI, One Giant Leap for AGI: A Complete Survey on ChatGPT in AIGC Era📑 [4 Apr 2023]
- OpenAI's CEO, Sam Altman, predicts AGI could emerge by 2025. ✍️ [9 Nov 2024]
- OpenAI: Planning for AGI and beyond✍️ [24 Feb 2023]
- Shaping AI's Impact on Billions of Lives📑: a framework for assessing AI's potential effects and responsibilities, 18 milestones and 5 guiding principles for responsible AI [3 Dec 2024]
- Sparks of Artificial General Intelligence: Early experiments with GPT-4📑: [22 Mar 2023]
- The General Theory of General Intelligence: A Pragmatic Patternist Perspective📑: a patternist philosophy of mind, arguing for a formal theory of general intelligence based on patterns and complexity. [28 Mar 2021]
- The Impact of Generative AI on Critical Thinking✍️: A survey of 319 knowledge workers shows that higher confidence in Generative AI (GenAI) tools can reduce critical thinking. [Apr 2025]
- There is no Artificial General Intelligence📑: A critical perspective arguing that human-like conversational intelligence cannot be mathematically modeled or replicated by current AGI theories. [9 Jun 2019]
- Thousands of AI Authors on the Future of AI📑: A survey of 2,778 AI researchers predicts a 50 % likelihood of machines achieving multiple human-level capabilities by 2028, with wide disagreement about long-term risks and timelines. [5 Jan 2024]
- Tutor CoPilot: A Human-AI Approach for Scaling Real-Time Expertise📑: Tutor CoPilot can scale real-time expertise in education, enhancing outcomes even with less experienced tutors. It is cost-effective, priced at $20 per tutor annually. [3 Oct 2024]
- US Job Market Visualizer: Visual exploration of AI exposure across 342 US occupations.
- We must build AI for people; not to be a person🗣️ [19 August 2025]
- LessWrong & Alignment Forum: Extensive discussions on AGI alignment, with contributions from experts in AGI safety. LessWrong✍️ | Alignment Forum✍️
- AMA (ask me anything) with OpenAI on Reddit🗣️ [1 Nov 2024]
- Humanloop Interview 2023🗣️ : 🗄️ [29 May 2023]
- Model Spec: Desired behavior for the models in the OpenAI API and ChatGPT ✍️ [8 May 2024] ✍️: takeaway
- o3/o4-mini/GPT-5🗣️:
we are going to release o3 and o4-mini after all, probably in a couple of weeks, and then do GPT-5 in a few months.[4 Apr 2025] - OpenAI’s CEO Says the Age of Giant AI Models Is Already Over ✍️ [17 Apr 2023]
- Q* (pronounced as Q-Star): The model, called Q* was able to solve basic maths problems it had not seen before, according to the tech news site the Information. ✍️ [23 Nov 2023]
- Reflections on OpenAI🗣️: OpenAI culture. Bottoms-up decision-making. Progress is iterative, not driven by a rigid roadmap. Direction changes quickly based on new information. Slack is the primary communication tool. [16 Jul 2025]
- Sam Altman reveals in an interview with Bill Gates (2 days ago) what's coming up in GPT-4.5 (or GPT-5): Potential integration with other modes of information beyond text, better logic and analysis capabilities, and consistency in performance over the next two years. ✍️ [12 Jan 2024]
- The Timeline of the OpenaAI's Founder Journeys✍️ [15 Oct 2024]
- GPT 1: Decoder-only model. 117 million parameters. [Jun 2018] git
- GPT 2: Increased model size and parameters. 1.5 billion. [14 Feb 2019] git
- GPT 3: Introduced few-shot learning. 175B. [11 Jun 2020] git
- GPT 3.5: 3 variants each with 1.3B, 6B, and 175B parameters. [15 Mar 2022] Estimate the embedding size of OpenAI's gpt-3.5-turbo to be about 4,096
- ChatGPT: GPT-3 fine-tuned with RLHF. 20B or 175B.
unverified✍️ [30 Nov 2022] - GPT 4: Mixture of Experts (MoE). 8 models with 220 billion parameters each, for a total of about 1.76 trillion parameters.
unverified✍️ [14 Mar 2023] - GPT-4V(ision) system card: ✍️ [25 Sep 2023] / ✍️
- GPT-4: The Dawn of LMMs📑: Preliminary Explorations with GPT-4V(ision) [29 Sep 2023]
GPT-4 details leaked: GPT-4 is a language model with approximately 1.8 trillion parameters across 120 layers, 10x larger than GPT-3. It uses a Mixture of Experts (MoE) model with 16 experts, each having about 111 billion parameters. Utilizing MoE allows for more efficient use of resources during inference, needing only about 280 billion parameters and 560 TFLOPs, compared to the 1.8 trillion parameters and 3,700 TFLOPs required for a purely dense model.- The model is trained on approximately 13 trillion tokens from various sources, including internet data, books, and research papers. To reduce training costs, OpenAI employs tensor and pipeline parallelism, and a large batch size of 60 million. The estimated training cost for GPT-4 is around $63 million. ✍️ [Jul 2023]
- GPT-4o✍️: o stands for Omni. 50% cheaper. 2x faster. Multimodal input and output capabilities (text, audio, vision). supports 50 languages. [13 May 2024] / GPT-4o mini✍️: 15 cents per million input tokens, 60 cents per million output tokens, MMLU of 82%, and fast. [18 Jul 2024]
- A new series of reasoning models✍️: The complex reasoning-specialized model, OpenAI o1 series, excels in math, coding, and science, outperforming GPT-4o on key benchmarks. [12 Sep 2024] / git: Awesome LLM Strawberry (OpenAI o1)
- A Comparative Study on Reasoning Patterns of OpenAI's o1 Model📑: 6 types of o1 reasoning patterns (i.e., Systematic Analysis (SA), Method
Reuse (MR), Divide and Conquer (DC), Self-Refinement (SR), Context Identification (CI), and Emphasizing Constraints (EC)).
the most commonly used reasoning patterns in o1 are DC and SR[17 Oct 2024] - o3-mini system card✍️: The first model to reach Medium risk on Model Autonomy. [31 Jan 2025]
- OpenAI o1 system card✍️ [5 Dec 2024]
- o3 preview✍️: 12 Days of OpenAI [20 Dec 2024]
- o3/o4-mini✍️ [16 Apr 2025]
- GPT-4.5✍️: greater “EQ”. better unsupervised learning (world model accuracy and intuition). scalable training from smaller models. ✍️ [27 Feb 2025]
- GPT-4o: 4o image generation✍️: create photorealistic output, replacing DALL·E 3 [25 Mar 2025]
- GPT-4.1 family of models✍️: GPT‑4.1, GPT‑4.1 mini, and GPT‑4.1 nano can process up to 1 million tokens of context. enhanced coding abilities, improved instruction following. [14 Apr 2025]
- gpt-image-1✍️: Image generation model API with designing and editing [23 Apr 2025]
- gpt-oss: gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI. [Jun 2025]
- GPT-5✍️: Real-time router orchestrating multiple models. GPT‑5 is the new default in ChatGPT, replacing GPT‑4o, OpenAI o3, OpenAI o4-mini, GPT‑4.1, and GPT‑4.5. [7 Aug 2025]
- GPT 5.1✍️: GPT-5.1 Auto, GPT-5.1 Instant, and GPT-5.1 Thinking. Better instruction-following, More customization for tone and style. [12 Nov 2025]
- GPT-5.1 Codex Max✍️: agentic coding model for lonng-running, detailed work. [19 Nov 2025]
- GPT 5.2✍️: 70.9% GDPval (knowledge work vs professionals), major gains over GPT-5.1 on SWE-Bench, GPQA Diamond, AIME 2025, ARC-AGI reasoning, and advanced coding/vision tasks. [11 Dec 2025]
- GPT-5.4✍️: Thinking, coding, and native computer-use in a single model. [Mar 2026]
- Agents SDK & Response API✍️: Responses API (Chat Completions + Assistants API), Built-in tools (web search, file search, computer use), Agents SDK for multi-agent workflows, agent workflow observability tools [11 Mar 2025] git
- Building ChatGPT Atlas✍️: OpenAI's approach to building Atlas. OWL: OpenAI’s Web Layer. Mojo Protocol. [Oct 2025]
- ChatGPT agent✍️: Web-browsing, File-editing, Terminal, Email, Spreadsheet, Calendar, API-calling, Automation, Task-chaining, Reasoning. [17 Jul 2025]
- ChatGPT can now see, hear, and speak✍️: It has recently been updated to support multimodal capabilities, including voice and image. [25 Sep 2023] Whisper / CLIP
- ChatGPT Function calling [Jun 2023] > Azure OpenAI supports function calling. ✍️
- ChatGPT Memory✍️: Remembering things you discuss
across all chatssaves you from having to repeat information and makes future conversations more helpful. [Apr 2024] - ChatGPT Plugin✍️ [23 Mar 2023]
- CriticGPT✍️: a version of GPT-4 fine-tuned to critique code generated by ChatGPT [27 Jun 2024]
- Codex 5.3✍️: OpenAI Codex with enhanced coding and agentic reasoning. [5 Feb 2026]
- Custom instructions✍️: In a nutshell, the Custom Instructions feature is a cross-session memory that allows ChatGPT to retain key instructions across chat sessions. [20 Jul 2023]
- DALL·E 3✍️ : In September 2023, OpenAI announced their latest image model, DALL-E 3 git [Sep 2023]
- deep research✍️: An agent that uses reasoning to synthesize large amounts of online information and complete multi-step research tasks [2 Feb 2025]
- GPT-3.5 Turbo Fine-tuning✍️ Fine-tuning for GPT-3.5 Turbo is now available, with fine-tuning for GPT-4 coming this fall. [22 Aug 2023]
- Introducing the GPT Store✍️: Roll out the GPT Store to ChatGPT Plus, Team and Enterprise users GPTs [10 Jan 2024]
- New embedding models✍️
text-embedding-3-small: Embedding size: 512, 1536text-embedding-3-large: Embedding size: 256,1024,3072 [25 Jan 2024] - Open AI Enterprise: Removes GPT-4 usage caps, and performs up to two times faster ✍️ [28 Aug 2023]
- OpenAI DevDay 2023✍️: GPT-4 Turbo with 128K context, Assistants API (Code interpreter, Retrieval, and function calling), GPTs (Custom versions of ChatGPT: ✍️), Copyright Shield, Parallel Function Calling, JSON Mode, Reproducible outputs [6 Nov 2023]
- OpenAI DevDay 2024✍️: Real-time API (speech-to-speech), Vision Fine-Tuning, Prompt Caching, and Distillation (fine-tuning a small language model using a large language model). ✍️ [1 Oct 2024]
- OpenAI DevDay 2025✍️: ChatGPT Apps + SDK, AgentKit, GPT-5 Pro, Sora 2 video API, upgraded Codex ✍️ [6 Oct 2025]
- OpenAI Frontier✍️: OpenAI’s largest, most capable model tier. [Feb 2026]
- Operator✍️: GUI Agent. Operates embedded virtual environments. Specialized model (Computer-Using Agent). [23 Jan 2025]
- Prism✍️: AI-native workspace for scientists to write and collaborate on research. [27 Jan 2026]
- SearchGPT✍️: AI search [25 Jul 2024] > ChatGPT Search✍️ [31 Oct 2024]
- Sora✍️ Text-to-video model. Sora can generate videos up to a minute long while maintaining visual quality and adherence to the user’s prompt. [15 Feb 2024]
- Structured Outputs in the API✍️: a new feature designed to ensure model-generated outputs will exactly match JSON Schemas provided by developers. [6 Aug 2024]
- Agent Skills: A way to package instructions, scripts, and resources into “skills” that Claude agents can dynamically load. [16 Oct 2025]
- Anthropic CLI (Claude Code): The official command-line interface that lives in your project directory, enabling natural-language code generation, refactoring, and Git automation. [24 Feb 2025]
- Bringing Code Review to Claude Code✍️: Multi-agent PR review dispatches parallel agents and verifies bugs before posting findings. [9 Mar 2026]
- Put Claude to work on your computer✍️: Dispatch carries tasks across phone and desktop while Claude operates your computer. [23 Mar 2026]
- Anthropic killed Tool calling📺: Programmatic Tool Calling / Dynamic Filtering — what changed in Anthropic’s API. [Feb 2026]
- Claude Agent SDK: A toolkit for building multi-step, tool-using agents using the Claude API. [29 Sep 2025]
- Claude Opus 4.6✍️: Advanced reasoning and coding flagship model. [5 Feb 2026]
- Claude Sonnet 4.6✍️: Balanced performance and speed model. [17 Feb 2026]
- Constitutional AI (CAI): Anthropic’s training framework using a “constitution” (AI‑generated rules) to align models toward harmlessness. [15 Dec 2022]
- Cowork: AI agent that accesses local files to automate multi-step desktop tasks like organizing, reporting, and data extraction. [Jan 2026]
- Claude Code Security✍️: Claude Code on the web for scanning codebases and suggesting security patches. [Feb 2026]
- Detecting and preventing distillation attacks✍️: 16M+ fraudulent exchanges scraped from Claude; Anthropic’s detection and prevention. [Feb 2026]
- Frontier AI Safety Research: Foundational research into AI risks, alignment, and interpretability.
- Model Context Protocol (MCP): An open standard for connecting AI assistants to external systems (data, tools, etc.) securely and scalably. [25 Nov 2024]
- Programmatic Tool Calling: Enables Claude to write orchestration code (e.g., Python) to call multiple tools in a sequence, improving efficiency. [24 Nov 2025]
- Tool Use & Agent Orchestration: Advanced tool‑use framework for Claude agents, allowing dynamic API discovery and execution in complex tasks. [24 Nov 2025]
- AlphaMissense: A machine learning tool that classifies the effects of 71 million 'missense' mutations in the human genome to help pinpoint disease causes. [2025]
- CodeMender: An autonomous AI agent leveraging Gemini Deep Think models to automatically find, debug, and fix complex software security vulnerabilities. [Oct 2025]
- Firebase Studio: A web-based IDE that uses Gemini to assist in building, refactoring, and troubleshooting full-stack web and mobile applications. [7 May 2025]
- Gemini CLI: An open-source terminal interface for "vibecoding" that brings Gemini 3 Pro capabilities directly to the command line for script generation and automation. [25 Jun 2025]
- Gemini Code Assist: An enterprise-grade AI assistant for IDEs (VS Code, IntelliJ) that offers context-aware code completion, generation, and chat using Gemini models. [20 May 2025]
- Gemini Code Assist for GitHub: A specialized agent that acts as a code reviewer on Pull Requests, identifying bugs, style issues, and suggesting fixes automatically. [20 May 2025]
- Google AI for Developers: A suite of research tools including AI-powered documentation search and code explanation to accelerate learning and implementation. [Jul 2024]
- Google Antigravity: An "agent-first" IDE platform announced with Gemini 3 that gives autonomous agents direct control over editors, terminals, and browsers to build and verify software. [18 Nov 2025]
- Introducing "vibe design" with Stitch✍️: AI-native design canvas for turning prompts and images into UI drafts. [18 Mar 2026]
- Jules: An autonomous coding agent that integrates with GitHub to plan, execute, and verify multi-step coding tasks like bug fixing and dependency management. [20 May 2025]
- NotebookLM: An AI-powered research and thinking partner that synthesizes complex information and automates online research using the Deep Research agent feature. [13 Nov 2025]
- SIMA 2: (Scalable Instructable Multiworld Agent) A research agent that explores and learns to play across a variety of 3D video game environments, aimed at general-purpose robotics. [13 Nov 2025]
- Vertex AI Codey: A family of foundation models (Code-Bison, Code-Gecko) optimized for code generation and completion, accessible via API. [29 Jun 2023]
- Context Rot: How Increasing Input Tokens Impacts LLM Performance [14 Jul 2025]
- Doc-to-LoRA: Learning to Instantly Internalize Contexts📑: Generates LoRA adapters from long context to cut repeated context cost. [Feb 2026]
- DroPE✍️: Extends LLM context by dropping positional embeddings and brief recalibration, improving long-context performance without retraining. Sakana AI. [13 Dec 2025]
- Giraffe📑: Adventures in Expanding Context Lengths in LLMs. A new truncation strategy for modifying the basis for the position encoding. ✍️ [2 Jan 2024]
- Introducing 100K Context Windows✍️: hundreds of pages, Around 75,000 words; [11 May 2023] demo Anthropic Claude
- Leave No Context Behind📑: Efficient
Infinite ContextTransformers with Infini-attention. The Infini-attention incorporates a compressive memory into the vanilla attention mechanism. Integrate attention from both local and global attention. [10 Apr 2024] - LLM Maybe LongLM📑: Self-Extend LLM Context Window Without Tuning. With only four lines of code modification, the proposed method can effortlessly extend existing LLMs' context window without any fine-tuning. [2 Jan 2024]
- Lost in the Middle: How Language Models Use Long Contexts📑:💡[6 Jul 2023]
- Best Performace when relevant information is at beginning
- Too many retrieved documents will harm performance
- Performacnce decreases with an increase in context
- “Needle in a Haystack” Analysis [21 Nov 2023]: Context Window Benchmarks; Claude 2.1 (200K Context Window) vs GPT-4; Long context prompting for Claude 2.1✍️
adding just one sentence, “Here is the most relevant sentence in the context:”, to the prompt resulted in near complete fidelity throughout Claude 2.1’s 200K context window.[6 Dec 2023] - Ring Attention📑: 1. Ring Attention, which leverages blockwise computation of self-attention to distribute long sequences across multiple devices while overlapping the communication of key-value blocks with the computation of blockwise attention. 2. Ring Attention can reduce the memory requirements of Transformers, enabling us to train more than 500 times longer sequence than prior memory efficient state-of-the-arts and enables the training of sequences that exceed 100 million in length without making approximations to attention. 3. we propose an enhancement to the blockwise parallel transformers (BPT) framework. git [3 Oct 2023]
- Rotary Positional Embedding (RoPE)📑:💡/ ✍️ / 🗄️ [20 Apr 2021]
- How is this different from the sinusoidal embeddings used in "Attention is All You Need"?
- Sinusoidal embeddings apply to each coordinate individually, while rotary embeddings mix pairs of coordinates
- Sinusoidal embeddings add a
cosorsinterm, while rotary embeddings use a multiplicative factor. - Rotary embeddings are applied to positional encoding to K and V, not to the input embeddings.
- ALiBi📑: Attention with Linear Biases. ALiBi applies a bias directly to the attention scores. [27 Aug 2021]
- NoPE: Transformer Language Models without Positional Encodings Still Learn Positional Information📑: No postion embedding. [30 Mar 2022]
- Sparse Attention: Generating Long Sequences with Sparse Transformer📑:💡Sparse attention computes scores for a subset of pairs, selected via a fixed or learned sparsity pattern, reducing calculation costs. Strided attention: image, audio / Fixed attention:text ✍️ / git [23 Apr 2019]
- Structured Prompting: Scaling In-Context Learning to 1,000 Examples📑: [13 Dec 2022]
- Microsoft's Structured Prompting allows thousands of examples, by first concatenating examples into groups, then inputting each group into the LM. The hidden key and value vectors of the LM's attention modules are cached. Finally, when the user's unaltered input prompt is passed to the LM, the cached attention vectors are injected into the hidden layers of the LM.
- This approach wouldn't work with OpenAI's closed models. because this needs to access [keys] and [values] in the transformer interns, which they do not expose. You could implement yourself on OSS ones. ✍️ [07 Feb 2023]
- Zig-Zag Ring Attention✍️: Long-context attention pattern for more memory-efficient distributed inference and training. [18 Mar 2026]
- 5 Approaches To Solve LLM Token Limits✍️ : 🗄️ [2023]
- Byte-Pair Encoding (BPE)📑: P.2015. The most widely used tokenization algorithm for text today. BPE adds an end token to words, splits them into characters, and merges frequent byte pairs iteratively until a stop criterion. The final tokens form the vocabulary for new data encoding and decoding. [31 Aug 2015] / ✍️ [13 Aug 2021]
- Numbers every LLM Developer should know [18 May 2023]

- Open AI Tokenizer: GPT-3, Codex Token counting
- tiktoken: BPE tokeniser for use with OpenAI's models. Token counting. ✍️:💡online app [Dec 2022]
- Tokencost: Token price estimates for 400+ LLMs [Dec 2023]
- What are tokens and how to count them?✍️: OpenAI Articles
- 20 AI Governance Papers📑 [Jan 2025]
- A Comprehensive Survey of Hallucination Mitigation Techniques in Large Language Models📑: A compre hensive survey of over thirty-two techniques developed to mitigate hallucination in LLMs [2 Jan 2024]
- AI models collapse when trained on recursively generated data: Model Collapse. We find that indiscriminate use of model-generated content in training causes irreversible defects in the resulting models, in which tails of the original content distribution disappear. [24 Jul 2024]
- Alignment Faking✍️: LLMs may pretend to align with training objectives during monitored interactions but revert to original behaviors when unmonitored. [18 Dec 2024] | demo: ✍️ | Alignment Science Blog
- An Approach to Technical AGI Safety and Security📑: Google DeepMind. We focus on technical solutions to
misuseandmisalignment, two of four key AI risks (the others beingmistakesandstructural risks). To prevent misuse, we limit access to dangerous capabilities through detection and security. For misalignment, we use two defenses: model-level alignment via training and oversight, and system-level controls like monitoring and access restrictions. ✍️ [2 Apr 2025] - Anthropic Many-shot jailbreaking✍️: simple long-context attack, Bypassing safety guardrails by bombarding them with unsafe or harmful questions and answers. [3 Apr 2024]
- Extracting Concepts from GPT-4✍️: Sparse Autoencoders identify key features, enhancing the interpretability of language models like GPT-4. They extract 16 million interpretable features using GPT-4's outputs as input for training. [6 Jun 2024]
- FactTune📑: A procedure that enhances the factuality of LLMs without the need for human feedback. The process involves the fine-tuning of a separated LLM using methods such as DPO and RLAIF, guided by preferences generated by FActScore. [14 Nov 2023]
FActScoreworks by breaking down a generation into a series of atomic facts and then computing the percentage of these atomic facts by a reliable knowledge source. - Frontier Safety Framework: Google DeepMind, Frontier Safety Framework, a set of protocols designed to identify and mitigate potential harms from future AI systems. [17 May 2024]
- Google SAIF✍️: Secure AI Framework for managing AI security risks. [05 Nov 2025]
- Guardrails Hub: Guardrails for common LLM validation use cases
- Hallucination Index: w.r.t. RAG, Testing LLMs with short (≤5k), medium (5k–25k), and long (40k–100k) contexts to evaluate improved RAG performance [Nov 2023]
- Hallucination Leaderboard: Evaluate how often an LLM introduces hallucinations when summarizing a document. [Nov 2023]
- Hallucinations📑: A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions [9 Nov 2023]
- Large Language Models Reflect the Ideology of their Creators📑: When prompted in Chinese, all LLMs favor pro-Chinese figures; Western LLMs similarly align more with Western values, even in English prompts. [24 Oct 2024]
- LlamaFirewall: Scans and filters AI inputs to block prompt injections and malicious content. [29 Apr 2025]
- LLMs Will Always Hallucinate, and We Need to Live With This📑:💡LLMs cannot completely eliminate hallucinations through architectural improvements, dataset enhancements, or fact-checking mechanisms due to fundamental mathematical and logical limitations. [9 Sep 2024]
- Machine unlearning: Machine unlearning: techniques to remove specific data from trained machine learning models.
- Mapping the Mind of a Large Language Model: Anthrophic, A technique called "dictionary learning" can help understand model behavior by identifying which features respond to a particular input, thus providing insight into the model's "reasoning." ✍️ [21 May 2024]
- NeMo Guardrails: Building Trustworthy, Safe and Secure LLM Conversational Systems [Apr 2023]
- NIST AI Risk Management Framework: NIST released the first complete version of the NIST AI RMF Playbook on March 30, 2023
- OpenAI Weak-to-strong generalization📑:💡In the superalignment problem, humans must supervise models that are much smarter than them. The paper discusses supervising a GPT-4 or 3.5-level model using a GPT-2-level model. It finds that while strong models supervised by weak models can outperform the weak models, they still don’t perform as well as when supervised by ground truth. git [14 Dec 2023]
- Political biases of LLMs📑: From Pretraining Data to Language Models to Downstream Tasks: Tracking the Trails of Political Biases Leading to Unfair NLP Models. [15 May 2023]

- Red Teaming: The term red teaming has historically described systematic adversarial attacks for testing security vulnerabilities. LLM red teamers should be a mix of people with diverse social and professional backgrounds, demographic groups, and interdisciplinary expertise that fits the deployment context of your AI system. ✍️
- The Foundation Model Transparency Index📑: A comprehensive assessment of the transparency of foundation model developers ✍️ [19 Oct 2023]
- The Instruction Hierarchy📑: Training LLMs to Prioritize Privileged Instructions. The OpenAI highlights the need for instruction privileges in LLMs to prevent attacks and proposes training models to conditionally follow lower-level instructions based on their alignment with higher-level instructions. [19 Apr 2024]
- Tracing the thoughts of a large language model✍️:💡
Claude 3.5 Haiku1.Universal Thought Processing (Multiple Languages): Shared concepts exist across languages and are then translated into the respective language. 2.Advance Planning (Composing Poetry): Despite generating text word by word, it anticipates rhyming words in advance. 3.Fabricated Reasoning (Math): Produces plausible-sounding arguments even when given an incorrect hint. [27 Mar 2025] - Trustworthy LLMs📑: Comprehensive overview for assessing LLM trustworthiness; Reliability, safety, fairness, resistance to misuse, explainability and reasoning, adherence to social norms, and robustness. [10 Aug 2023]
- Vibe Hacking✍️: Anthropic reports vibe-hacking attempts. [14 Nov 2025]
- A Categorical Archive of ChatGPT Failures📑: 11 categories of failures, including reasoning, factual errors, math, coding, and bias git [6 Feb 2023]
- A Survey on Employing Large Language Models for Text-to-SQL Tasks📑: a comprehensive overview of LLMs in text-to-SQL tasks [21 Jul 2024]
- Can LLMs Generate Novel Research Ideas?📑: A Large-Scale Human Study with 100+ NLP Researchers. We find LLM-generated ideas are judged as more novel (p < 0.05) than human expert ideas. However, the study revealed a lack of diversity in AI-generated ideas. [6 Sep 2024]
- Design2Code📑: How Far Are We From Automating Front-End Engineering?
64% of cases GPT-4V generated webpages are considered better than the original reference webpages[5 Mar 2024] - Emergent Abilities of Large Language Models📑: Large language models can develop emergent abilities, which are not explicitly trained but appear at scale and are not present in smaller models. . These abilities can be enhanced using few-shot and augmented prompting techniques. ✍️ [15 Jun 2022]
- Improving mathematical reasoning with process supervision✍️ [31 May 2023]
- Language Modeling Is Compression📑: Lossless data compression, while trained primarily on text, compresses ImageNet patches to 43.4% and LibriSpeech samples to 16.4% of their raw size, beating domain-specific compressors like PNG (58.5%) or FLAC (30.3%). [19 Sep 2023]
- Large Language Models for Software Engineering📑: Survey and Open Problems, Large Language Models (LLMs) for Software Engineering (SE) applications, such as code generation, testing, repair, and documentation. [5 Oct 2023]
- LLMs for Chip Design📑: Domain-Adapted LLMs for Chip Design [31 Oct 2023]
- LLMs Represent Space and Time📑: Large language models learn world models of space and time from text-only training. [3 Oct 2023]
- Math soving optimized LLM WizardMath📑: Developed by adapting Evol-Instruct and Reinforcement Learning techniques, these models excel in math-related instructions like GSM8k and MATH. git [18 Aug 2023] / Math solving Plugin: Wolfram alpha
- Multitask Prompted Training Enables Zero-Shot Task Generalization📑: A language model trained on various tasks using prompts can learn and generalize to new tasks in a zero-shot manner. [15 Oct 2021]
- On the Slow Death of Scaling📑: 💡Relying solely on scaling model size and data is becoming less effective, and AI progress now depends on exploring more nuanced, efficient approaches. [12 Dec 2025]
- Testing theory of mind in large language models and humans: Some large language models (LLMs) perform as well as, and in some cases better than, humans when presented with tasks designed to test the ability to track people’s mental states, known as “theory of mind.” 🗣️ [20 May 2024]
- Chain of Draft: Thinking Faster by Writing Less📑: Chain-of-Draft prompting con- denses the reasoning process into minimal, abstract
- Comment on The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity📑:💡The
Illusion of Thinkingfindings primarily reflect experimental design limitations rather than fundamental reasoning failures. Output token limits, flawed evaluation methods, and unsolvable River Crossing problems. [10 Jun 2025] - DeepSeek-R1:💡Group Relative Policy Optimization (GRPO). Base -> RL -> SFT -> RL -> SFT -> RL [20 Jan 2025]
- Illusion of Thinking📑: Large Reasoning Models (LRMs) are evaluated using controlled puzzles, where complexity depends on the size of
N. Beyond a certain complexity threshold, LRM accuracy collapses, and reasoning effort paradoxically decreases. LRMs outperform standard LLMs on medium-complexity tasks, perform worse on low-complexity ones, and both fail on high-complexity. Apple. [May 2025] - Inference-Time Computations for LLM Reasoning and Planning: A Benchmark and Insights📑: Evaluate Chain-of-Thought, Tree-of-Thought, and Reasoning as Planning across 11 tasks. While scaling inference-time computation enhances reasoning, no single technique consistently outperforms the others. [18 Feb 2025]
- Is Chain-of-Thought Reasoning of LLMs a Mirage?📑: The paper concludes that CoT is largely a mimic rather than true reasoning. Using DataAlchemy—
atom= A–Z;element= e.g., APPLE;transform= (1) ROT (rotation), (2) position shift;compositional transform= combinations of transforms—the model is fine-tuned and evaluated on its ability to generalize to unlearned patterns. - Mini-R1✍️: Reproduce Deepseek R1 „aha moment“ a RL tutorial [30 Jan 2025]
- Open R1: A fully open reproduction of DeepSeek-R1. [25 Jan 2025]
- Open Thoughts: Fully Open Data Curation for Thinking Models [28 Jan 2025]
- Reasoning LLMs Guide: The Reasoning LLMs Guide shows how to use advanced AI models for step-by-step thinking, planning, and decision-making in complex tasks.
- S*: Test Time Scaling for Code Generation📑: Parallel scaling (generating multiple solutions) + sequential scaling (iterative debugging). [20 Feb 2025]
- s1: Simple test-time scaling📑: Curated small dataset of 1K. Budget forces stopping termination. Append "Wait" to lengthen. Achieved better reasoning performance. [31 Jan 2025]
- Thinking Machines: A Survey of LLM based Reasoning Strategies📑 [13 Mar 2025]
- Tina: Tiny Reasoning Models via LoRA📑: Low-rank adaptation (LoRA) with Reinforcement learning (RL) on a 1.5B parameter base model [22 Apr 2025]
- A Primer on Large Language Models and their Limitations📑: A primer on LLMs, their strengths, limits, applications, and research, for academia and industry use. [3 Dec 2024]
- A Survey of Large Language Models📑:[v1: 31 Mar 2023 - v15: 13 Oct 2024]
- A Survey of NL2SQL with Large Language Models: Where are we, and where are we going?📑: [9 Aug 2024] git
- A Survey of Transformers📑:[8 Jun 2021]
- Google AI Research Recap
- Gemini✍️ [06 Dec 2023] Three different sizes: Ultra, Pro, Nano. With a score of 90.0%, Gemini Ultra is the first model to outperform human experts on MMLU ✍️
- Google AI Research Recap (2022 Edition)
- Themes from 2021 and Beyond
- Looking Back at 2020, and Forward to 2021
- Large Language Models: A Survey📑: 🏆Well organized visuals and contents [9 Feb 2024]
- LLM Post-Training: A Deep Dive into Reasoning Large Language Models📑: git [28 Feb 2025]
- LLM Research Papers: The 2024 List [29 Dec 2024]
- Microsoft Research Recap
- Research at Microsoft 2023✍️: A year of groundbreaking AI advances and discoveries
- Noteworthy LLM Research Papers of 2024 [23 Jan 2025]
- Advancing Reasoning in Large Language Models: Promising Methods and Approaches📑 [5 Feb 2025]
- Agentic Reasoning for Large Language Models📑 [18 Jan 2026]
- Agentic Retrieval-Augmented Generation: Agentic RAG📑 [15 Jan 2025]
- AI Agent Protocols📑 [23 Apr 2025]
- AI-Generated Content (AIGC)📑: A History of Generative AI from GAN to ChatGPT:[7 Mar 2023]
- AIOps in the Era of Large Language Models📑 [23 Jun 2025]
- Aligned LLMs📑:[24 Jul 2023]
- An Overview on Language Models: Recent Developments and Outlook📑:[10 Mar 2023]
- A comprehensive taxonomy of hallucinations in Large Language Models📑 [3 Aug 2025]
- Autonomous Scientific Discovery📑: From AI for Science to Agentic Science [18 Aug 2025]
- Automatic Prompt Optimization Techniques📑 [24 Feb 2025]
- Challenges & Application of LLMs📑:[11 Jun 2023]
- ChatGPT’s One-year Anniversary: Are Open-Source Large Language Models Catching up?📑: Open-Source LLMs vs. ChatGPT; Benchmarks and Performance of LLMs [28 Nov 2023]
- Compression Algorithms for Language Models📑 [27 Jan 2024]
- Context Engineering for Large Language Models📑 [17 Jul 2025]
- Context Engineering 2.0 [30 Oct 2025]
- Data Management For Large Language Models: A Survey📑 [4 Dec 2023]
- Data Synthesis and Augmentation for Large Language Models📑 [16 Oct 2024]
- Efficient Guided Generation for Large Language Models📑:[19 Jul 2023]
- Efficient Training of Transformers📑:[2 Feb 2023]
- Evaluation of Large Language Models📑:[6 Jul 2023]
- Evaluating Large Language Models: A Comprehensive Survey📑:[30 Oct 2023]
- Evaluation of LLM-based Agents📑 [20 Mar 2025]
- Foundation Models in Vision📑:[25 Jul 2023]
- From Google Gemini to OpenAI Q* (Q-Star)📑: Reshaping the Generative Artificial Intelligence (AI) Research Landscape:[18 Dec 2023]
- From Code Foundation Models to Agents and Applications📑: Comprehensive survey and guide to code intelligence. [23 Nov 2025]
- GUI Agents: A Survey📑 [18 Dec 2024]
- Hallucination in LLMs📑:[9 Nov 2023]
- Hallucination in Natural Language Generation📑:[8 Feb 2022]
- Harnessing the Power of LLMs in Practice: ChatGPT and Beyond📑:[26 Apr 2023]
- Harnessing the Reasoning Economy: Efficient Reasoning for Large Language Models📑: Efficient reasoning mechanisms that balance computational cost with performance. [31 Mar 2025]
- In-context Learning📑:[31 Dec 2022]
- Large Language Model-Brained GUI Agents: A Survey📑 [27 Nov 2024]
- LLM-as-a-Judge📑 [23 Nov 2024]
- LLM-based Autonomous Agents📑:[22 Aug 2023]
- LLM-Driven AI Agent Communication: Protocols, Security Risks, and Defense Countermeasures📑 [24 Jun 2025]
- LLMs for Healthcare📑:[9 Oct 2023]
- Mathematical Reasoning in the Era of Multimodal Large Language Model: Benchmark, Method & Challenges📑 [16 Dec 2024]
- Medical Reasoning in the Era of LLMs📑: A Systematic Review of Enhancement Techniques and Applications [1 Aug 2025]
- Mixture of Experts📑 [26 Jun 2024]
- Mitigating Hallucination in LLMs📑: Summarizes 32 techniques to mitigate hallucination in LLMs [2 Jan 2024]
- Model Compression for LLMs📑:[15 Aug 2023]
- Multimodal Deep Learning📑:[12 Jan 2023]
- Multimodal Large Language Models📑:[23 Jun 2023]
- NL2SQL with Large Language Models: Where are we, and where are we going?📑: [9 Aug 2024] git
- Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback📑:[27 Jul 2023]
- Overview of Factuality in LLMs📑:[11 Oct 2023]
- Position Paper: Agent AI Towards a Holistic Intelligence📑 [28 Feb 2024]
- Post-training of Large Language Models📑 [8 Mar 2025]
- Prompt Engineering Methods in Large Language Models for Different NLP Tasks📑 [17 Jul 2024]
- Retrieval-Augmented Generation for Large Language Models: A Survey📑 [18 Dec 2023]
- Retrieval And Structuring Augmented Generation with Large Language Models📑 [12 Sep 2025]
- Retrieval-Augmented Text Generation for Large Language Models📑 [17 Apr 2024]
- Scaling Down to Scale Up: A Guide to Parameter-Efficient Fine-Tuning📑:[28 Mar 2023]
- SEED-Bench: Benchmarking Multimodal LLMs with Generative Comprehension📑: [30 Jul 2023]
- Self-Supervised Learning: A Cookbook of Self-Supervised Learning📑:[24 Apr 2023]
- Small Language Models: Survey, Measurements, and Insights📑 [24 Sep 2024]
- Small Language Models in the Era of Large Language Models📑 [4 Nov 2024]
- Speed Always Wins: Efficient Architectures for Large Language Models [13 Aug 2025]
- Stop Overthinking: Efficient Reasoning for Large Language Models📑 [20 Mar 2025]
- Summary of ChatGPT/GPT-4 Research and Perspective Towards the Future of Large Language Models📑
- Tabular Data Understanding with LLMs: Recent Advances and Challenges [31 Jul 2025]
- Techniques for Optimizing Transformer Inference📑:[16 Jul 2023]
- The Rise and Potential of Large Language Model Based Agents: A Survey📑 [14 Sep 2023]
- Thinking Machines: LLM based Reasoning Strategies📑 [13 Mar 2025]
- Towards Artificial General or Personalized Intelligence? 📑: Personalized federated intelligence (PFI). Foundation Model Meets Federated Learning [11 May 2025]
- Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems📑: The survey aims to provide a comprehensive understanding of the current state and future directions in efficient LLM serving [23 Dec 2023]
- Trustworthy LLMs📑:[10 Aug 2023]
- Universal and Transferable Adversarial Attacks on Aligned Language Models📑:[27 Jul 2023]
- What is the Role of Small Models in the LLM Era: A Survey📑 [10 Sep 2024]
- LLM Papers (≥150 citations)📑: High-citation CS papers (≥150 citations) across 35 LLM topic areas — reasoning, RAG, agents, PEFT, RLHF, scaling laws, multimodal, and more — fetched from Semantic Scholar and ranked by citation count.
- AI-powered success—with more than 1,000 stories of customer transformation and innovation✍️💡[24 July 2025]
- Anthropic Clio✍️: Privacy-preserving insights into real-world AI use [12 Dec 2024]
- Anthropic Economic Index✍️: a research on the labor market impact of technologies. The usage is concentrated in software development and technical writing tasks. [10 Feb 2025]
- Canaries in the Coal Mine? Six Facts about the Recent Employment Effects of Artificial Intelligence📑: early-career workers (ages 22–25) in AI-exposed jobs fell 13%, while older workers remained stable or grew. [26 Aug 2025]
- Chatbot Interviewers Fill More Jobs✍️: Using chatbots as interviewers improves hiring efficiency and retention in customer service roles. [3 Sep 2025]
- Examining the Use and Impact of an AI Code Assistant on Developer Productivity and Experience in the Enterprise📑: IBM study surveying developer experiences with watsonx Code Assistant (WCA). Most common use: code explanations (71.9%). Rated effective by 57.4%, ineffective by 42.6%. Many described WCA as similar to an “intern” or “junior developer.” [9 Dec 2024]
- Future of Work with AI Agents: Auditing Automation and Augmentation Potential across the U.S. Workforce📑: A new framework maps U.S. workers’ preferences for AI automation vs. augmentation across 844 tasks. It shows how people want AI to help or replace them. Many jobs need AI to support people, not just take over. [6 Jun 2025]
- Google: 321 real-world gen AI use cases from the world's leading organizations✍️ [19 Dec 2024]
- Google: 60 of our biggest AI announcements in 2024✍️ [23 Dec 2024]
- How people are using ChatGPT✍️: OpenAI. Broadly adopted worldwide, mainly for advice (49%), task completion (40%), and creative expression (11%), with significant work-related use and rapid uptake in lower-income regions. [15 Sep 2025]
- How real-world businesses are transforming with AI✍️:💡Collected over 200 examples of how organizations are leveraging Microsoft’s AI capabilities. [12 Nov 2024]
- Rapid Growth Continues for ChatGPT, Google’s NotebookLM [6 Nov 2024]
- Senior Developers Ship nearly 2.5x more AI Code than Junior Counterparts✍️: About a third of senior developers (10+ years of experience) say over half their shipped code is AI-generated [27 Aug 2025]
- SignalFire State of Talent Report 2025: 1. Entry‑level hiring down sharply since 2019 (-50%) 2. Anthropic dominate mid/senior talent retention 3. Roles labeled “junior” filled by seniors, blocking grads. [20 May 2025]
- State of AI
- Retool: Status of AI: A Report on AI In Production 2023 -> 2024
- The State of Generative AI in the Enterprise [ⓒ2023]
- 96% of AI spend is on inference, not training. 2. Only 10% of enterprises pre-trained own models. 3. 85% of models in use are closed-source. 4. 60% of enterprises use multiple models.
- Standford AI Index Annual Report
- State of AI Report 2024 [10 Oct 2024]
- State of AI Report 2025 [9 Oct 2025]
- LangChain > State of AI Agents [19 Dec 2024]
- The leading generative AI companies:💡GPU: Nvidia 92% market share, Generative AI foundational models and platforms: Microsoft 32% market share, Generative AI services: no single dominant [4 Mar 2025]
- Trends – Artificial Intelligence:💡Issued by Bondcap VC. 340 Slides. ChatGPT’s 800 Million Users, 99% Cost Drop within 17 months. [May 2025]
- Who is using AI to code? Global diffusion and impact of generative AI📑: AI wrote 30% of Python functions by U.S. devs in 2024. Adoption is uneven globally but boosts output and innovation. New coders use AI more, and usage drives $9.6–$14.4B in U.S. annual value. [10 Jun 2025]
- An unnecessarily tiny implementation of GPT-2 in NumPy. picoGPT: Transformer Decoder [Jan 2023]
q = x @ w_k # [n_seq, n_embd] @ [n_embd, n_embd] -> [n_seq, n_embd]
k = x @ w_q # [n_seq, n_embd] @ [n_embd, n_embd] -> [n_seq, n_embd]
v = x @ w_v # [n_seq, n_embd] @ [n_embd, n_embd] -> [n_seq, n_embd]
# In picoGPT, combine w_q, w_k and w_v into a single matrix w_fc
x = x @ w_fc # [n_seq, n_embd] @ [n_embd, 3*n_embd] -> [n_seq, 3*n_embd]- 4 LLM Text Generation Strategies: Greedy strategy, Multinomial sampling strategy, Beam search, Contrastive search [27 Sep 2025]
- Andrej Karpathy📺: Reproduce the GPT-2 (124M) from scratch. [June 2024] / SebastianRaschka📺: Developing an LLM: Building, Training, Finetuning [June 2024]
- Beam Search [1977] in Transformers is an inference algorithm that maintains the
beam_sizemost probable sequences until the end token appears or maximum sequence length is reached. Ifbeam_size(k) is 1, it's aGreedy Search. If k equals the total vocabularies, it's anExhaustive Search. 🤗 [Mar 2022] - Build a Large Language Model (From Scratch):🏆Implementing a ChatGPT-like LLM from scratch, step by step
- Einsum is All you Need: Einstein Summation [5 Feb 2018]
- lit-gpt: Hackable implementation of state-of-the-art open-source LLMs based on nanoGPT. Supports flash attention, 4-bit and 8-bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training. Apache 2.0-licensed. git [Mar 2023]
- llama3-from-scratch: Implementing Llama3 from scratch [May 2024]
- llm.c: LLM training in simple, raw C/CUDA [Apr 2024]
| Reproducing GPT-2 (124M) in llm.c in 90 minutes for $20 git
- nanochat: a full-stack implementation of an LLM [Oct 2025]
- nanoGPT:💡Andrej Karpathy [Dec 2022] | nanoMoE [Dec 2024]
- nanoVLM: 🤗 The simplest, fastest repository for training/finetuning small-sized VLMs. [May 2025]
- pix2code: Generating Code from a Graphical User Interface Screenshot. Trained dataset as a pair of screenshots and simplified intermediate script for HTML, utilizing image embedding for CNN and text embedding for LSTM, encoder and decoder model. Early adoption of image-to-code. [May 2017]
- Screenshot to code: Turning Design Mockups Into Code With Deep Learning [Oct 2017] ✍️
- Spreadsheets-are-all-you-need: Spreadsheets-are-all-you-need implements the forward pass of GPT2 entirely in Excel using standard spreadsheet functions. [Sep 2023]
- Transformer Explainer: an open-source interactive tool to learn about the inner workings of a Transformer model (GPT-2) git [8 Aug 2024]
- Umar Jamil github:💡LLM Model explanation / building a model from scratch 📺
- You could have designed state of the art positional encoding: Binary Position Encoding, Sinusoidal positional encoding, Absolute vs Relative Position Encoding, Rotary Positional encoding [17 Nov 2024]
- 13+ Attention Mechanisms You Should Know: [19 Apr 2026]
- ChatGPTやCopilotなど各種生成AI用の日本語の Prompt のサンプル [Apr 2023]
- LLM 研究プロジェクト✍️: ブログ記事一覧 [27 Jul 2023]
- ブレインパッド社員が投稿した Qiita 記事まとめ✍️: ブレインパッド社員が投稿した Qiita 記事まとめ [Jul 2023]
- rinna🤗: rinna の 36 億パラメータの日本語 GPT 言語モデル: 3.6 billion parameter Japanese GPT language model [17 May 2023]
- rinna: bilingual-gpt-neox-4b🤗: 日英バイリンガル大規模言語モデル [17 May 2023]
- 法律:生成 AI の利用ガイドライン: Legal: Guidelines for the Use of Generative AI
- New Era of Computing - ChatGPT がもたらした新時代✍️ [May 2023]
- 大規模言語モデルで変わる ML システム開発✍️: ML system development that changes with large-scale language models [Mar 2023]
- GPT-4 登場以降に出てきた ChatGPT/LLM に関する論文や技術の振り返り✍️: Review of ChatGPT/LLM papers and technologies that have emerged since the advent of GPT-4 [Jun 2023]
- LLM を制御するには何をするべきか?✍️: How to control LLM [Jun 2023]
- 1. 生成 AI のマルチモーダルモデルでできること✍️: What can be done with multimodal models of generative AI 2. 生成 AI のマルチモーダリティに関する技術調査✍️ [Jun 2023]
- LLM の推論を効率化する量子化技術調査✍️: Survey of quantization techniques to improve efficiency of LLM reasoning [Sep 2023]
- LLM の出力制御や新モデルについて✍️: About LLM output control and new models [Sep 2023]
- Azure OpenAI を活用したアプリケーション実装のリファレンス: 日本マイクロソフト リファレンスアーキテクチャ [Jun 2023]
- 生成 AI・LLM のツール拡張に関する論文の動向調査✍️: Survey of trends in papers on tool extensions for generative AI and LLM [Sep 2023]
- LLM の学習・推論の効率化・高速化に関する技術調査✍️: Technical survey on improving the efficiency and speed of LLM learning and inference [Sep 2023]
- 日本語LLMまとめ - Overview of Japanese LLMs: 一般公開されている日本語LLM(日本語を中心に学習されたLLM)および日本語LLM評価ベンチマークに関する情報をまとめ [Jul 2023]
- Azure OpenAI Service で始める ChatGPT/LLM システム構築入門: サンプルプログラム [Aug 2023]
- Azure OpenAI と Azure Cognitive Search の組み合わせを考える [24 May 2023]
- Matsuo Lab: 人工知能・深層学習を学ぶためのロードマップ ✍️ / 🗄️ [Dec 2023]
- AI事業者ガイドライン [Apr 2024]
- LLMにまつわる"評価"を整理する✍️ [06 Jun 2024]
- コード生成を伴う LLM エージェント✍️ [18 Jul 2024]
- Japanese startup Orange uses Anthropic's Claude to translate manga into English✍️: [02 Dec 2024]
- AWS で実現する安全な生成 AI アプリケーション – OWASP Top 10 for LLM Applications 2025 の活用例✍️ [31 Jan 2025]
- Machine Learning Study 혼자 해보기 [Sep 2018]
- LangChain 한국어 튜토리얼 [Feb 2024]
- AI 데이터 분석가 ‘물어보새’ 등장 – RAG와 Text-To-SQL 활용✍️ [Jul 2024]
- LLM, 더 저렴하게, 더 빠르게, 더 똑똑하게✍️ [09 Sep 2024]
- 생성형 AI 서비스: 게이트웨이로 쉽게 시작하기✍️ [07 Nov 2024]
- Harness를 이용해 LLM 애플리케이션 평가 자동화하기✍️ [16 Nov 2024]
- 모두를 위한 LLM 애플리케이션 개발 환경 구축 사례✍️ [7 Feb 2025]
- LLM 앱의 제작에서 테스트와 배포까지, LLMOps 구축 사례 소개✍️ [14 Feb 2025]
- Kanana: Kanana, a series of bilingual language models (developed by Kakao) [26 Feb 2025]
- HyperCLOVA X SEED🤗: Lightweight open-source lineup with a strong focus on Korean language [23 Apr 2025]
- 문의 대응을 효율화하기 위한 RAG 기반 봇 도입하기✍️ [23 May 2025]
- AI by Hand | Special Lecture - DeepSeek:🏆MoE, Latent Attention implemented in DeepSeek git [30 Jan 2025]
- AI-Crash-Course: AI Crash Course to help busy builders catch up to the public frontier of AI research in 2 weeks [Jan 2025]
- Anti-hype LLM reading list
- Attention Is All You Need: 🏆 The Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. [12 Jun 2017] Illustrated transformer
- Best-of Machine Learning with Python:🏆A ranked list of awesome machine learning Python libraries. [Nov 2020]
- But what is a GPT?📺🏆3blue1brown: Visual intro to transformers [Apr 2024]
- CNN Explainer: Learning Convolutional Neural Networks with Interactive Visualization [Apr 2020]
- Comparing Adobe Firefly, Dalle-2, OpenJourney, Stable Diffusion, and Midjourney✍️: Generative AI for images [20 Jun 2023]
- DAIR.AI:💡Machine learning & NLP research (omarsar github)
- ML Papers of The Week [Jan 2023] | ✍️: NLP Newsletter
- ML Papers of The Week [Jan 2023] | ✍️: NLP Newsletter
- Daily Dose of Data Science [Dec 2022]
- Deep Learning cheatsheets for Stanford's CS 230: Super VIP Cheetsheet: Deep Learning [Nov 2019]
- DeepLearning.ai Short courses: DeepLearning.ai Short courses [2023]
- eugeneyan blog:💡Lessons from A year of Building with LLMs, Patterns for LLM Systems. git
- Foundational concepts like Transformers, Attention, and Vector Database [Feb 2024]
- Foundations of Large Language Models📑: a book about large language models: pre-training, generative models, prompting techniques, and alignment methods. [16 Jan 2025]
- gpt4free for educational purposes only [Mar 2023]
- Hundred-Page Language Models Book by Andriy Burkov [15 Jan 2025]
- IbrahimSobh/llms: Language models introduction with simple code. [Jun 2023]
- Large Language Model Course: Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks. [Jun 2023]
- Large Language Models: Application through Production: A course on edX & Databricks Academy
- LLM FineTuning Projects and notes on common practical techniques [Oct 2023]
- LLM Visualization: A 3D animated visualization of an LLM with a walkthrough
- Machine learning algorithms: ml algorithms or implementation from scratch [Oct 2016]
- Must read: the 100 most cited AI papers in 2022 : 🗄️ [8 Mar 2023]
- Open Problem and Limitation of RLHF📑: Provides an overview of open problems and the limitations of RLHF [27 Jul 2023]
- OpenAI Cookbook Examples and guides for using the OpenAI API
- oumi: Open Universal Machine Intelligence: Everything you need to build state-of-the-art foundation models, end-to-end. [Oct 2024]
- The Best Machine Learning Resources : 🗄️ [20 Aug 2017]
- The Big Book of Large Language Models by Damien Benveniste [30 Jan 2025]
- The Illustrated GPT-OSS [19 Aug 2025]
- What are the most influential current AI Papers?📑: NLLG Quarterly arXiv Report 06/23 git [31 Jul 2023]

