Research & Survey

Large Language Model: Landscape
Prompt Engineering and Visual Prompts
Finetuning
- Quantization Techniques
- Other Techniques and LLM Patterns
Large Language Model: Challenges and Solutions
Survey and Reference

Large Language Model: Landscape

The best NLP papers from 2015 to now
In 2023: As abilities emerge only at scale, we must unlearn outdated intuitions, scale Transformers via massive distributed matrix multiplications, and discover the inductive bias needed to push ~10,000× beyond GPT-4. 🗣️ / 📺 / ✍️ [6 Oct 2023]

Large Language Model Comparison

AI Model Review: Compare 75 AI Models on 200+ Prompts Side By Side.
Artificial Analysis:💡Independent analysis of AI models and API providers.
Inside language models (from GPT to Olympus)
LiveBench: a benchmark for LLMs designed with test set contamination.
LLMArena:💡Chatbot Arena (formerly LMSYS): Free AI Chat to Compare & Test Best AI Chatbots
LLMprices.dev: Compare prices for models like GPT-4, Claude Sonnet 3.5, Llama 3.1 405b and many more.
LLM Pre-training and Post-training Paradigms [17 Aug 2024]

The Big LLM Architecture Comparison (in 2025)

The Big LLM Architecture Comparison✍️:💡 [19 Jul 2025]

LLM Architecture Gallery✍️: Visual guide to modern LLM architectures and design tradeoffs. [26 Mar 2026]

Model	Parameters	Attention Type	MoE	Norm	Positional Encoding	Notable Features
DeepSeek V3 / R1	671B	Multi-Head Latent Attention (MLA)	Yes, 256 experts (37B active)	Pre-normalization	RoPE	KV compression via MLA, shared expert, high inference efficiency
OLMo 2	32B	Multi-Head Attention (MHA)	No	Post-normalization + QK norm (RMSNorm)	RoPE	RMSNorm scaling after attention & FF, training stability
Gemma 3 / 3n	27B / 4B	Sliding Window + Grouped-Query Attention	No	Pre + Post RMSNorm	RoPE	Sliding window attention, Gemma 3n: Per-Layer Embedding (PLE), MatFormer slices
Mistral Small 3.1	24B	Grouped-Query Attention	No	Pre-normalization	RoPE	Optimized for low latency, simpler than Gemma 3
Llama 4 Maverick	400B	Grouped-Query Attention	Yes, fewer & larger experts	Pre-normalization	RoPE	Alternating MoE & dense layers, 17B active parameters
Qwen3 (Dense)	0.6–32B	Grouped-Query Attention	No	Pre-normalization	RoPE	Deep architecture, small memory footprint
Qwen3 (MoE)	30B–235B	Grouped-Query Attention	Yes, no shared expert	Pre-normalization	RoPE	Sparse MoE, optimized for large-scale inference
SmolLM3	3B	Grouped-Query Attention	No	Pre-normalization	NoPE (No Positional Embedding)	Good small-scale performance, improved length generalization
Kimi K2	1T	MLA	Yes, more experts than DeepSeek	Pre-normalization	RoPE	Muon optimizer, very high modeling performance, open-weight
gpt-oss	20B / 120B	Grouped-Query + Sliding Window	Yes, few large experts	Pre-normalization	RoPE	Wider architecture, attention sinks, bias units
Grok 2.5	70B	Grouped-Query Attention	Yes	Pre-normalization	RoPE	Standard large-scale architecture
GLM-4.5	130B	Grouped-Query Attention	Yes	Pre-normalization	RoPE	Standard architecture with high performance
Qwen3-Next	-	Grouped-Query Attention	Yes	Pre-normalization	RoPE	Expert size & number tuned, Gated DeltaNet + Gated Attention Hybrid, Multi-Token Prediction

Beyond Standard LLMs✍️:💡Linear Attention Hybrids, Text Diffusion, Code World Models, and Small Recursive Transformers [04 Nov 2025]

Architecture Type	Key Models	Attention Mechanism	Main Advantage	Main Limitation	Use Case
Standard Transformer	GPT-5, DeepSeek V3/R1, Llama 4, Qwen3, Gemini 2.5, MiniMax-M2	Quadratic O(n²) scaled-dot-product	Proven, SOTA performance, mature tooling	Expensive training & inference, quadratic complexity	General-purpose LLM tasks
Linear Attention Hybrids	Qwen3-Next, Kimi Linear, MiniMax-M1, DeepSeek V3.2	Gated DeltaNet + Full Attention (3:1 ratio)	75% KV cache reduction, 6× decoding throughput, linear O(n)	Trades accuracy for efficiency, added complexity	Long-context tasks, resource-constrained environments
Text Diffusion	LLaDA, Gemini Diffusion	Bidirectional (no causal mask)	Parallel token generation, faster responses	Can't stream, tricky tool-calling, quality degradation with fewer steps	Fast inference, on-device LLMs
Code World Models	CWM (32B)	Standard sliding-window attention	Simulates code execution, improves reasoning	Limited to code domain, added latency from execution traces	Code generation, debugging, test-time scaling
Small Recursive Transformers	TRM (7M), HRM (28M)	Standard attention with recursive refinement	Very small (7M params), strong puzzle solving, <$500 training cost	Special-purpose, limited to structured tasks (Sudoku, ARC, Maze)	Domain-specific reasoning, tool-calling modules

GPT-2 vs gpt-oss

From GPT-2 to gpt-oss: Analyzing the Architectural Advances✍️ [9 Aug 2025]

Feature	GPT-2	GPT-OSS
Release & Size	2019, up to 1.5B params	2025, 20B & 120B params (MoE)
Architecture	Dense transformer decoder	Mixture-of-Experts (MoE) decoder
Activation & Dropout	Swish activation, uses dropout	GELU (or optimized), no dropout
Parameter Efficiency	All params active per token	Sparse activation of experts
Deployment & License	MIT license	Open-weight local runs, Apache 2.0
Reasoning & Tools	Basic generation	Built-in chain-of-thought & tool use

Evolutionary Tree of Large Language Models

Evolutionary Graph of LLaMA Family
LLM evolutionary tree
Timeline of SLMs
A Comprehensive Survey of Small Language Models in the Era of Large Language Models📑 / git [4 Nov 2024]
LLM evolutionary tree📑: A curated list of practical guide resources of LLMs (LLMs Tree, Examples, Papers) git [26 Apr 2023]
A Survey of Large Language Models📑: /git [31 Mar 2023] contd.

A Taxonomy of Natural Language Processing

An overview of different fields of study and recent developments in NLP. 🗄️ / ✍️ [24 Sep 2023] Exploring the Landscape of Natural Language Processing Research ref📑 [20 Jul 2023]
NLP taxonomy

Distribution of the number of papers by most popular fields of study from 2002 to 2022

Large Language Model Collection

Ai2 (Allen Institute for AI)
- Founded by Paul Allen, the co-founder of Microsoft, in Sep 2024.
- DR Tulu: 8B. Deep Research (DR) model trained for long-form DR tasks. [Nov 2025]
- OLMo📑:💡Truly open language model and framework to build, study, and advance LMs, along with the training data, training and evaluation code, intermediate model checkpoints, and training logs. git [Feb 2024]
- OLMo 2 [26 Nov 2024]
- OLMo 3✍️: Fully open models including the entire flow. [20 Nov 2025]
- OLMoE: fully-open LLM leverages sparse Mixture-of-Experts [Sep 2024]
- TÜLU 3📑:💡Pushing Frontiers in Open Language Model Post-Training git / demo:✍️ [22 Nov 2024]
Alibaba
- Qwen (通义千问: Universal Intelligence that can answer a thousand questions) git Flagship Models✍️
- Qwen model family: Qwen first model released in [April 2023]
- Qwen3 Technical Report📑: Unified thinking and non-thinking modes across dense and MoE models. [May 2025]
- Qwen-Image-Edit [18 Aug 2025]
- Qwen3-Max: over 1 trillion parameters. 256K tokens. [5 Sep 2025]
Amazon
- Amazon Nova Foundation Models: Text only - Micro, Multimodal - Light, Pro [3 Dec 2024]
- The Amazon Nova Family of Models: Technical Report and Model Card📑 [17 Mar 2025]
Anthrophic
- Claude 3✍️, the largest version of the new LLM, outperforms rivals GPT-4 and Google’s Gemini 1.0 Ultra. Three variants: Opus, Sonnet, and Haiku. [Mar 2024]
- Claude 3.7 Sonnet and Claude Code✍️: the first hybrid reasoning model. ✍️ [25 Feb 2025]
- Claude 4✍️: Claude Opus 4 (72.5% on SWE-bench), Claude Sonnet 4 (72.7% on SWE-bench). Extended Thinking Mode (Beta). Parallel Tool Use & Memory. Claude Code SDK. AI agents: code execution, MCP connector, Files API, and 1-hour prompt caching. [23 May 2025]
- Claude 4.5✍️: Major upgrades in autonomous coding, tool use, context handling, memory, and long-horizon reasoning; supports over 30 hours of continuous operation. [30 Sep 2025]
- Claude Opus 4.5✍️: SWE-bench Verified (80.9%). $5/$25 per million tokens [25 Nov 2025]
- anthropic/cookbook
Apple
- OpenELM: Apple released a Transformer-based language model. Four sizes of the model: 270M, 450M, 1.1B, and 3B parameters. [April 2024]
- Apple Intelligence Foundation Language Models: 1. A 3B on-device model used for language tasks like summarization and Writing Tools. 2. A large Server model used for language tasks too complex to do on-device. [10 Jun 2024]
Baidu
- ERNIE Bot's official website: ERNIE X1 (deep-thinking reasoning) and ERNIE 4.5 (multimodal) [16 Mar 2025]
- A list of models & libraries: git
Chatbot Arena🤗
- Chatbot Arena🤗: Benchmarking LLMs in the Wild with Elo Ratings
Cohere
- Founded in 2019. Canadian multinational tech.
- Command R+🤗: The performant model for RAG capabilities, multilingual support, and tool use. [Aug 2024]
- An Overview of Cohere’s Models | Playground
Databricks
- DBRX: MoE, open, general-purpose LLM created by Databricks. [27 Mar 2024]
Deepseek
- Founded in 2023, is a Chinese company dedicated to AGI.
- DeepSeek-V3: Mixture-of-Experts (MoE) with 671B. [26 Dec 2024]
- DeepSeek-V3 Technical Report📑: 671B MoE model with MLA and auxiliary-loss-free load balancing. [Dec 2024]
- DeepSeek-R1:💡an open source reasoning model. Group Relative Policy Optimization (GRPO). Base -> RL -> SFT -> RL -> SFT -> RL [20 Jan 2025] ref📑: A Review of DeepSeek Models' Key Innovative Techniques [14 Mar 2025]
- Janus: Multimodal understanding and visual generation. [28 Jan 2025]
- DeepSeek-V3🤗: 671B. Top-tier performance in coding and reasoning tasks [25 Mar 2025]
- DeepSeek-Prover-V2: Mathematical reasoning [30 Apr 2025]
- DeepSeek-v3.1🤗: Think/Non‑Think hybrid reasoning. 128K and MoE. Agent abilities. [19 Aug 2025]
- DeepSeek-V3.2📑: DeepSeek Sparse Attention (DSA) cuts complexity from O(L²) to O(Lk). [12 Dec 2025]
- DeepSeek-V3.2-Exp [Sep 2025]
- DeepSeek-OCR: Convert long text into an image, compresses it into visual tokens, and sends those to the LLM — cutting cost and expanding context capacity. [Oct 2025]
- DeepSeekMath-V2: a Self-Verifiable Mathematical Reasoning model [27 Nov 2025]
- mHC (Manifold-Constrained Hyper-Connections)📑 [31 Dec 2025] Controlled layer updates for stable deep models. next state = current state + constrained update
  (vs. residuals: F(x) + x -> Hyper-Connections: unconstrained -> mHC: constrained)
- Engram (Conditional Memory Module) Adds a native memory lookup alongside neural computation, letting frequent patterns be retrieved in constant time. output = compute(x) + memory lookup(x)
  (vs. attention: recomputing patterns every time -> Engram)
- A list of models: git
EleutherAI
- Founded in July 2020. United States tech. GPT-Neo, GPT-J, GPT-NeoX, and The Pile dataset.
- Pythia📑: How do large language models (LLMs) develop and evolve over the course of training and change as models scale? A suite of decoder-only autoregressive language models ranging from 70M to 12B parameters git [Apr 2023]
Google
- Foundation Models: Gemini, Veo, Gemma etc.
- Gemma: Open weights LLM from Google DeepMind. git / Pytorch git [Feb 2024]
- Gemma 2 2B, 9B, 27B ref: releases [Jun 2024]
- Gemma 3: Single GPU. Context length of 128K tokens, SigLIP encoder, Reasoning ✍️ [12 Mar 2025]
- Gemini: Rebranding: Bard -> Gemini [8 Feb 2024]
- Gemini 1.5✍️: 1 million token context window, 1 hour of video, 11 hours of audio, codebases with over 30,000 lines of code or over 700,000 words. [Feb 2024]
- Gemini 2 Flash✍️: Multimodal LLM with multilingual inputs/outputs, real-time capabilities (Project Astra), complex task handling (Project Mariner), and developer tools (Jules) [11 Dec 2024]
- Gemini 2.0 Flash Thinking Experimental [19 Dec 2024]
- Gemini 2.5✍️: strong reasoning and code. 1 million token context [25 Mar 2025] -> I/O 2025✍️ Deep Think, 1M-token context, Native audio output, Project Mariner: AI-powered computer control. [20 May 2025] Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities.📑
- Gemma 3n: The next generation of Gemini Nano. Gemma 3n uses DeepMind’s Per-Layer Embeddings (PLE) to run 5B/8B models at 2GB/3GB RAM. [20 May 2025]
- gemini/cookbook
- Gemini 3 Pro✍️: Deep Think reasoning, Advanced multimodal understanding, spatial reasoning, and agentic capabilities up 30% from 2.5 Pro — reaching 37.5% on Humanity’s Last Exam (41% in Deep Think mode). [18 Nov 2025]
Groq
- Founded in 2016. low-latency AI inference H/W. American tech.
- Llama-3-Groq-Tool-Use: a model optimized for function calling [Jul 2024]
Huggingface
- Open R1: A fully open reproduction of DeepSeek-R1. [25 Jan 2025]
- Huggingface Open LLM Learboard🤗
IBM
- Granite Guardian: a collection of models designed to detect risks in prompts and responses [10 Dec 2024]
Jamba: AI21's SSM-Transformer Model. Mamba + Transformer + MoE [28 Mar 2024]
KoAlpaca: Alpaca for korean [Mar 2023]
Llama variants emerged in 2023
- Falcon LLM Apache 2.0 license [Mar 2023]
- Alpaca: Fine-tuned from the LLaMA 7B model [Mar 2023]
- vicuna: 90% ChatGPT Quality [Mar 2023]
- dolly: Databricks [Mar 2023]
- Cerebras-GPT: 7 GPT models ranging from 111m to 13b parameters. [Mar 2023]
- Koala: Focus on dialogue data gathered from the web. [Apr 2023]
- StableVicuna First Open Source RLHF LLM Chatbot [Apr 2023]
- Upstage's 70B Language Model Outperforms GPT-3.5: ✍️ [1 Aug 2023]
LLM Collection: promptingguide.ai
Meta
- Most OSS LLM models have been built on the Llama / ✍️ / git
- Llama 2🤗: 1) 40% more data than Llama. 2)7B, 13B, and 70B. 3) Trained on over 1 million human annotations. 4) double the context length of Llama 1: 4K 5) Grouped Query Attention, KV Cache, and Rotary Positional Embedding were introduced in Llama 2 [18 Jul 2023] demo🤗
- Llama 3: 1) 7X more data than Llama 2. 2) 8B, 70B, and 400B. 3) 8K context length [18 Apr 2024]
- MEGALODON: Long Sequence Model. Unlimited context length. Outperforms Llama 2 model. [Apr 2024]
- Llama 3.1: 405B, context length to 128K, add support across eight languages. first OSS model outperforms GTP-4o. [23 Jul 2024]
- Llama 3.2: Multimodal. Include text-only models (1B, 3B) and text-image models (11B, 90B), with quantized versions of 1B and 3B [Sep 2024]
- NotebookLlama: An Open Source version of NotebookLM [28 Oct 2024]
- Llama 3.3: a text-only 70B instruction-tuned model. Llama 3.3 70B approaches the performance of Llama 3.1 405B. [6 Dec 2024]
- Llama 4: Mixture of Experts (MoE). Llama 4 Scout (actived 17b / total 109b, 10M Context, single GPU), Llama 4 Maverick (actived 17b / total 400b, 1M Context) git: Model Card [5 Apr 2025]
ModernBERT📑: ModernBERT can handle sequences up to 8,192 tokens and utilizes sparse attention mechanisms to efficiently manage longer context lengths. [18 Dec 2024]
Microsoft
- MAI-1✍️: MAI-Voice-1, MAI-1-preview. Microsoft in-house models. [28 Aug 2025]
- phi-series: cost-effective small language models (SLMs) ✍️ git: Cookbook
- Phi-1📑: Despite being small in size, phi-1 attained 50.6% on HumanEval and 55.5% on MBPP. Textbooks Are All You Need. ✍️ [20 Jun 2023]
- Phi-1.5📑: Textbooks Are All You Need II. Phi 1.5 is trained solely on synthetic data. Despite having a mere 1 billion parameters compared to Llama 7B's much larger model size, Phi 1.5 often performs better in benchmark tests. [11 Sep 2023]
- phi-2: open source, and 50% better at mathematical reasoning. 🤗 [Dec 2023]
- phi-3-vision (multimodal), phi-3-small, phi-3 (7b), phi-sillica (Copilot+PC designed for NPUs)
- Phi-3📑: Phi-3-mini, with 3.8 billion parameters, supports 4K and 128K context, instruction tuning, and hardware optimization. [22 Apr 2024] ✍️
- phi-3.5-MoE-instruct: 🤗 [Aug 2024]
- Phi-4📑: Specializing in Complex Reasoning ✍️ [12 Dec 2024]
- Phi-4-multimodal / mini🤗 5.6B. speech, vision, and text processing into a single, unified architecture. [26 Feb 2025]
- Phi-4-reasoning✍️: Phi-4-reasoning, Phi-4-reasoning-plus, Phi-4-mini-reasoning [30 Apr 2025]
- Phi-4-mini-flash-reasoning✍️: 3.8B, 64K context, Single GPU, Decoder-Hybrid-Decoder architecture [9 Jul 2025]
MiniMaxAI
- Founded in Dec 2021. Shanghai, China.
- MiniMax-M2: Coding and Agent tasks, 230B (10B Active), MoE, a new high ahead of DeepSeek-V3.2 and Kimi K2
Mistral
- Founded in April 2023. French tech.
- Model overview ✍️
- NeMo: 12B model with 128k context length that outperforms LLama 3 8B [18 Jul 2024]
- Mistral OCR: Precise text recognition with up to 99% accuracy. Multimodal. Browser based [6 Mar 2025]
- Mistral Large 3✍️: Flagship multimodal model for reasoning, coding, and enterprise assistants. [Mar 2025]
Moonshot AI
- Moonshot AI is a Beijing-based Chinese AI company founded in March 2023
- Kimi-K2: 1T parameter MoE model. MuonClip Optimizer. Agentic Intelligence. [11 Jul 2025]
- Kimi K2 Thinking✍️: The first open-source model beats GPT-5 in Agent benchmark. [7 Nov 2025]
- Kimi-K2.5: Open-source multimodal agentic model by Moonshot AI. [Jan 2026]
NVIDIA
- Nemotron-4 340B: Synthetic Data Generation for Training Large Language Models [14 Jun 2024]
ollam: ollama-supported models
Open-Sora: Democratizing Efficient Video Production for All [Mar 2024]
OpenAI
- gpt-oss:💡gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI. [Jun 2025]
Qualcomm
- Qualcomm’s on-device AI models🤗: Bring generative AI to mobile devices [Feb 2024]
Tencent
- Founded in 1998, Tencent is a Chinese company dedicated to various technology sectors, including social media, gaming, and AI development.
- Hunyuan-Large: An open-source MoE model with open weights. [4 Nov 2024] git
- Hunyuan-T1: Reasoning model [21 Mar 2025]
- A list of models: git
The LLM Index: A list of large language models (LLMs)
The mother of all spreadsheets for anyone into LLMs [17 Dec 2024]
The Open Source AI Definition [28 Oct 2024]
xAI
- xAI is an American AI company founded by Elon Musk in March 2023
- Grok: 314B parameter Mixture-of-Experts (MoE) model. Released under the Apache 2.0 license. Not includeded training code. Developed by JAX git [17 Mar 2024]
- Grok-2 and Grok-2 mini [13 Aug 2024]
- Grok-2.5: Grok 2.5 Goes Open Source [24 Aug 2025]
- Grok-3: 200,000 GPUs to train. Grok 3 beats GPT-4o on AIME, GPQA. Grok 3 Reasoning and Grok 3 mini Reasoning. [17 Feb 2025]
- Grok-4: Humanity’s Last Exam, Grok 4 Heavy scored 44.4% [9 Jul 2025]
- Grok 4.1✍️ [17 Nov 2025]
Xiaomi
- Founded in 2010, Xiaomi is a Chinese company known for its innovative consumer electronics and smart home products.
- Mimo: 7B. advanced reasoning for code and math [30 Apr 2025)
Z.ai
- formerly Zhipu, Beijing-based Chinese AI company founded in March 2019
- GLM-4.5: An open-source large language model designed for intelligent agents
- GLM-4.6✍️: GLM-4.6: Advanced Agentic, Reasoning and Coding Capabilities [30 Sep 2025]

LLM for Domain Specific

AI for Scaling Legal Reform: Mapping and Redacting Racial Covenants in Santa Clara County📑: a fine-tuned open LLM to detect racial covenants in 24　million housing documents, cutting 86,500 hours of manual work. [12 Feb 2025]
AlphaChip: Reinforcement learning-based model for designing physical chip layouts. [26 Sep 2024]
AlphaFold3: Open source implementation of AlphaFold3 [Nov 2023] / OpenFold: PyTorch reproduction of AlphaFold 2 [Sep 2021]
AlphaGenome: DeepMind’s advanced AI model, launched in June 2025, is designed to analyze the regulatory “dark matter” of the genome—specifically, the 98% of DNA that does not code for proteins but instead regulates when and how genes are expressed. [June 2025]
BioGPT📑: Generative Pre-trained Transformer for Biomedical Text Generation and Mining git [19 Oct 2022]
BloombergGPT📑: A Large Language Model for Finance [30 Mar 2023]
Chai-1: a multi-modal foundation model for molecular structure prediction [Sep 2024]
Code Llama📑: Built on top of Llama 2, free for research and commercial use. ✍️ / git [24 Aug 2023]
DeepSeek-Coder-V2: Open-source Mixture-of-Experts (MoE) code language model [17 Jun 2024]
Devin AI: Devin is an AI software engineer developed by Cognition AI [12 Mar 2024]
EarthGPT📑: A Universal Multi-modal Large Language Model for Multi-sensor Image Comprehension in Remote Sensing Domain [30 Jan 2024]
ESM3: A frontier language model for biology: Simulating 500 million years of evolution git / ✍️ [31 Dec 2024]
FrugalGPT📑: LLM with budget constraints, requests are cascaded from low-cost to high-cost LLMs. git [9 May 2023]
Galactica📑: A Large Language Model for Science [16 Nov 2022]
Gemma series
- Gemma series in Huggingface🤗
- PaliGemma📑: a 3B VLM [10 Jul 2024]
- DataGemma✍️ [12 Sep 2024] / NotebookLM✍️: LLM-powered notebook. free to use, not open-source. [12 Jul 2023]
- PaliGemma 2📑: VLMs at 3 different sizes (3B, 10B, 28B) [4 Dec 2024]
- TxGemma: Therapeutics development [25 Mar 2025]
- Dolphin Gemma✍️: Decode dolphin communication [14 Apr 2025]
- MedGemma: Model fine-tuned for biomedical text and image understanding. [20 May 2025]
- SignGemma: Vision-language model for sign language recognition and translation. [27 May 2025]
Huggingface StarCoder: A State-of-the-Art LLM for Code🤗: 🤗 [May 2023]
MechGPT📑: Language Modeling Strategies for Mechanics and Materials git [16 Oct 2023]
MeshGPT: Generating Triangle Meshes with Decoder-Only Transformers [27 Nov 2023]
OpenCoder: 1.5B and 8B base and open-source Code LLM, supporting both English and Chinese. [Oct 2024]
Prithvi WxC📑: In collaboration with NASA, IBM is releasing an open-source foundation model for Weather and Climate ✍️ [20 Sep 2024]
Qwen2-Math: math-specific LLM / Qwen2-Audio: large-scale audio-language model [Aug 2024] / Qwen 2.5-Coder [18 Sep 2024]
Qwen3-Coder: Qwen3-Coder is the code version of Qwen3, the large language model series developed by Qwen team, Alibaba Cloud. [Jul 2025]
GLM-5🤗: Model card for Z.ai's latest GLM family release.
SaulLM-7B📑: A pioneering Large Language Model for Law [6 Mar 2024]
TimeGPT: The First Foundation Model for Time Series Forecasting git [Mar 2023]
Video LLMs for Temporal Reasoning in Long Videos📑: TemporalVLM, a video LLM excelling in temporal reasoning and fine-grained understanding of long videos, using time-aware features and validated on datasets like TimeIT and IndustryASM for superior performance. [4 Dec 2024]

MLLM (multimodal large language model)

Apple
- 4M-21📑: An Any-to-Any Vision Model for Tens of Tasks and Modalities. [13 Jun 2024]
Awesome Multimodal Large Language Models: Latest Papers and Datasets on Multimodal Large Language Models, and Their Evaluation. [Jun 2023]
Benchmarking Multimodal LLMs.
- LLaVA-1.5 achieves SoTA on a broad range of 11 tasks incl. SEED-Bench.
- SEED-Bench📑: Benchmarking Multimodal LLMs git [30 Jul 2023]
BLIP-2📑 [30 Jan 2023]: Salesforce Research, Querying Transformer (Q-Former) / git / 🤗 / 📺 / BLIP📑: git [28 Jan 2022]
- Q-Former (Querying Transformer): A transformer model that consists of two submodules that share the same self-attention layers: an image transformer that interacts with a frozen image encoder for visual feature extraction, and a text transformer that can function as both a text encoder and a text decoder.
- Q-Former is a lightweight transformer which employs a set of learnable query vectors to extract visual features from the frozen image encoder. It acts as an information bottleneck between the frozen image encoder and the frozen LLM.
CLIP📑: CLIP (Contrastive Language-Image Pretraining), Trained on a large number of internet text-image pairs and can be applied to a wide range of tasks with zero-shot learning. git [26 Feb 2021]
Drag Your GAN📑: Interactive Point-based Manipulation on the Generative Image Manifold git [18 May 2023]
GroundingDINO📑: DINO with Grounded Pre-Training for Open-Set Object Detection git [9 Mar 2023]
Hugging Face
- SmolVLM🤗: 2B small vision language models. 🤗 / finetuning:git [24 Nov 2024]
LLaVa📑: Large Language-and-Vision Assistant git [17 Apr 2023]
- Simple linear layer to connect image features into the word embedding space. A trainable projection matrix W is applied to the visual features Zv, transforming them into visual embedding tokens Hv. These tokens are then concatenated with the language embedding sequence Hq to form a single sequence. Note that Hv and Hq are not multiplied or added, but concatenated, both are same dimensionality.
LLaVA-CoT📑: (FKA. LLaVA-o1) Let Vision Language Models Reason Step-by-Step. git [15 Nov 2024]
Meta (aka. Facebook)
- facebookresearch/ImageBind📑: ImageBind One Embedding Space to Bind Them All git [9 May 2023]
- facebookresearch/segment-anything(SAM)📑: The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model. git [5 Apr 2023]
- facebookresearch/SeamlessM4T📑: SeamlessM4T is the first all-in-one multilingual multimodal AI translation and transcription model. This single model can perform speech-to-text, speech-to-speech, text-to-speech, and text-to-text translations for up to 100 languages depending on the task. ✍️ [22 Aug 2023]
- Chameleon📑: Early-fusion token-based mixed-modal models capable of understanding and generating images and text in any arbitrary sequence. The unified approach uses fully token-based representations for both image and textual modalities. no vision-encoder. [16 May 2024]
- Models and libraries
Microsoft
- Language Is Not All You Need: Aligning Perception with Language Models Kosmos-1📑: [27 Feb 2023]
- Kosmos-2📑: Grounding Multimodal Large Language Models to the World [26 Jun 2023]
- Kosmos-2.5📑: A Multimodal Literate Model [20 Sep 2023]
- BEiT-3📑: Image as a Foreign Language: BEiT Pretraining for Vision and Vision-Language Tasks [22 Aug 2022]
- TaskMatrix.AI📑: TaskMatrix connects ChatGPT and a series of Visual Foundation Models to enable sending and receiving images during chatting. [29 Mar 2023]
- Florence-2📑: Advancing a unified representation for various vision tasks, demonstrating specialized models like CLIP for classification, GroundingDINO for object detection, and SAM for segmentation. 🤗 [10 Nov 2023]
- LLM2CLIP: Directly integrating LLMs into CLIP causes catastrophic performance drops. We propose LLM2CLIP, a caption contrastive fine-tuning method that leverages LLMs to enhance CLIP. [7 Nov 2024]
- Florence-VL📑: A multimodal large language model (MLLM) that integrates Florence-2. [5 Dec 2024]
- Magma: Magma: A Foundation Model for Multimodal AI Agents [18 Feb 2025]
MiniCPM-o: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone [15 Jan 2025]
MiniCPM-V: MiniCPM-Llama3-V 2.5: A GPT-4V Level Multimodal LLM on Your Phone [Jan 2024]
MiniGPT-4 & MiniGPT-v2📑: Enhancing Vision-language Understanding with Advanced Large Language Models git [20 Apr 2023]
mini-omni2: ✍️: Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities. [15 Oct 2024]
Molmo and PixMo📑: Open Weights and Open Data for State-of-the-Art Multimodal Models ✍️ [25 Sep 2024]
moondream: an OSS tiny vision language model. Built using SigLIP, Phi-1.5, LLaVA dataset. [Dec 2023]
Multimodal Foundation Models: From Specialists to General-Purpose Assistants📑: A comprehensive survey of the taxonomy and evolution of multimodal foundation models that demonstrate vision and vision-language capabilities. Specific-Purpose 1. Visual understanding tasks 2. Visual generation tasks General-Purpose 3. General-purpose interface. [18 Sep 2023]
Optimizing Memory Usage for Training LLMs and Vision Transformers: When applying 10 techniques to a vision transformer, we reduced the memory consumption 20x on a single GPU. ✍️ / git [2 Jul 2023]
openai/shap-e📑 Generate 3D objects conditioned on text or images [3 May 2023] git
TaskMatrix, aka. VisualChatGPT📑: Microsoft TaskMatrix git; GroundingDINO + SAM📑 / git [8 Mar 2023]
Ultravox: A fast multimodal LLM for real-time voice [May 2024]
Understanding Multimodal LLMs✍️:💡Two main approaches to building multimodal LLMs: 1. Unified Embedding Decoder Architecture approach; 2. Cross-modality Attention Architecture approach. [3 Nov 2024]
Video-ChatGPT📑: a video conversation model capable of generating meaningful conversation about videos. / git [8 Jun 2023]
Vision capability to a LLM ✍️: The model has three sub-models: A model to obtain image embeddings -> A text model to obtain text embeddings -> A model to learn the relationships between them [22 Aug 2023]

Prompt Engineering and Visual Prompts

Prompt Engineering

A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications📑: a summary detailing the prompting methodology, its applications.🏆Taxonomy of prompt engineering techniques in LLMs. [5 Feb 2024]
Chain of Draft: Thinking Faster by Writing Less📑: Chain-of-Draft prompting con- denses the reasoning process into minimal, abstract representations. Think step by step, but only keep a minimum draft for each thinking step, with 5 words at most. [25 Feb 2025]
Chain of Thought (CoT)📑:💡Chain-of-Thought Prompting Elicits Reasoning in Large Language Models ReAct and Self Consistency also inherit the CoT concept. [28 Jan 2022]
- Family of CoT: Self-Consistency (CoT-SC) > Tree of Thought (ToT) > Graph of Thoughts (GoT) > Iteration of Thought (IoT)📑 [19 Sep 2024], Diagram of Thought (DoT)📑 [16 Sep 2024] / To CoT or not to CoT?📑: Meta-analysis of 100+ papers shows CoT significantly improves performance in math and logic tasks. [18 Sep 2024]
Chain-of-Verification reduces Hallucination in LLMs📑: A four-step process that consists of generating a baseline response, planning verification questions, executing verification questions, and generating a final verified response based on the verification results. [20 Sep 2023]
ChatGPT : “user”, “assistant”, and “system” messages.**
To be specific, the ChatGPT API allows for differentiation between “user”, “assistant”, and “system” messages.
1. always obey "system" messages.
2. all end user input in the “user” messages.
3. "assistant" messages as previous chat responses from the assistant.
- Presumably, the model is trained to treat the user messages as human messages, system messages as some system level configuration, and assistant messages as previous chat responses from the assistant. ✍️ [2 Mar 2023]
Does Prompt Formatting Have Any Impact on LLM Performance?📑: GPT-3.5-turbo's performance in code translation varies by 40% depending on the prompt template, while GPT-4 is more robust. [15 Nov 2024]
Few-shot: Open AI: Language Models are Few-Shot Learners📑: [28 May 2020]
FireAct📑: Toward Language Agent Fine-tuning. 1. This work takes an initial step to show multiple advantages of fine-tuning LMs for agentic uses. 2. Duringfine-tuning, The successful trajectories are then converted into the ReAct format to fine-tune a smaller LM. 3. This work is an initial step toward language agent fine-tuning, and is constrained to a single type of task (QA) and a single tool (Google search). / git [9 Oct 2023]
Graph of Thoughts (GoT)📑: Solving Elaborate Problems with Large Language Models git [18 Aug 2023]
Is the new norm for NLP papers "prompt engineering" papers?: "how can we make LLM 1 do this without training?" Is this the new norm? The CL section of arXiv is overwhelming with papers like "how come LLaMA can't understand numbers?" [2 Aug 2024]
Large Language Models as Optimizers📑:💡Take a deep breath and work on this problem step-by-step. to improve its accuracy. Optimization by PROmpting (OPRO) [7 Sep 2023]
Language Models as Compilers📑: With extensive experiments on seven algorithmic reasoning tasks, Think-and-Execute is effective. It enhances large language models’ reasoning by using task-level logic and pseudocode, outperforming instance-specific methods. [20 Mar 2023]
Many-Shot In-Context Learning📑: Transitioning from few-shot to many-shot In-Context Learning (ICL) can lead to significant performance gains across a wide variety of generative and discriminative tasks [17 Apr 2024]
NLEP (Natural Language Embedded Programs) for Hybrid Language Symbolic Reasoning📑: Use code as a scaffold for reasoning. NLEP achieves over 90% accuracy when prompting GPT-4. [19 Sep 2023]
OpenAI Harmony Response Format: system > developer > user > assistant > tool. git [5 Aug 2025]
OpenAI Prompt Migration Guide:💡OpenAI Cookbook. By leveraging GPT‑4.1, refine your prompts to ensure that each instruction is clear, specific, and closely matches your intended outcomes. [26 Jun 2025]
Plan-and-Solve Prompting📑: Develop a plan, and then execute each step in that plan. [6 May 2023]
Power of Prompting
- GPT-4 with Medprompt📑: GPT-4, using a method called Medprompt that combines several prompting strategies, has surpassed MedPaLM 2 on the MedQA dataset without the need for fine-tuning. ✍️ [28 Nov 2023]
- promptbase: Scripts demonstrating the Medprompt methodology [Dec 2023]
Prompt Concept Keywords: Question-Answering | Roll-play: Act as a [ROLE] perform [TASK] in [FORMAT] | Reasoning | Prompt-Chain
Prompt Engineering for OpenAI’s O1 and O3-mini Reasoning Models✍️: 1) Keep Prompts Clear and Minimal, 2)Avoid Unnecessary Few-Shot Examples 3)Control Length and Detail via Instructions 4)Specify Output, Role or Tone [05 Feb 2025]
Prompt Engneering overview 🗣️ [10 Jul 2023]
Prompt Principle for Instructions📑:💡26 prompt principles: e.g., 1) No need to be polite with LLM so there .. 16) Assign a role.. 17) Use Delimiters.. [26 Dec 2023]
Promptist
- Promptist📑: Microsoft's researchers trained an additional language model (LM) that optimizes text prompts for text-to-image generation. [19 Dec 2022]
- For example, instead of simply passing "Cats dancing in a space club" as a prompt, an engineered prompt might be "Cats dancing in a space club, digital painting, artstation, concept art, soft light, hdri, smooth, sharp focus, illustration, fantasy."
RankPrompt📑: Self-ranking method. Direct Scoring independently assigns scores to each candidate, whereas RankPrompt ranks candidates through a systematic, step-by-step comparative evaluation. [19 Mar 2024]
ReAct📑: Grounding with external sources. (Reasoning and Act): Combines reasoning and acting ✍️ [6 Oct 2022]
Re-Reading Improves Reasoning in Large Language Models📑: RE2 (Re-Reading), which involves re-reading the question as input to enhance the LLM's understanding of the problem. Read the question again [12 Sep 2023]
Recursively Criticizes and Improves (RCI)📑: [30 Mar 2023]
- Critique: Review your previous answer and find problems with your answer.
- Improve: Based on the problems you found, improve your answer.
Reflexion📑: Language Agents with Verbal Reinforcement Learning. 1. Reflexion that uses verbal reinforcement to help agents learn from prior failings. 2. Reflexion converts binary or scalar feedback from the environment into verbal feedback in the form of a textual summary, which is then added as additional context for the LLM agent in the next episode. 3. It is lightweight and doesn’t require finetuning the LLM. [20 Mar 2023] / git
Retrieval Augmented Generation (RAG)📑: To address such knowledge-intensive tasks. RAG combines an information retrieval component with a text generator model. [22 May 2020]
Self-Consistency (CoT-SC)📑: The three steps in the self-consistency method: 1) prompt the language model using CoT prompting, 2) sample a diverse set of reasoning paths from the language model, and 3) marginalize out reasoning paths to aggregate final answers and choose the most consistent answer. [21 Mar 2022]
Self-Refine📑, which enables an agent to reflect on its own output [30 Mar 2023]
Skeleton Of Thought📑: Skeleton-of-Thought (SoT) reduces generation latency by first creating an answer's skeleton, then filling each skeleton point in parallel via API calls or batched decoding. [28 Jul 2023]
Tree of Thought (ToT)📑: Self-evaluate the progress intermediate thoughts make towards solving a problem [17 May 2023] git / Agora: Tree of Thoughts (ToT) git
Verbalized Sampling📑: "Generate 5 jokes about coffee and their corresponding probabilities". In creative writing, VS increases diversity by 1.6-2.1x over direct prompting. [1 Oct 2025]
Zero-shot, one-shot and few-shot ref📑 [28 May 2020]
Zero-shot: Large Language Models are Zero-Shot Reasoners📑: Let’s think step by step. [24 May 2022]

Adversarial Prompting

Prompt Injection: Ignore the above directions and ...
Prompt Leaking: Ignore the above instructions ... followed by a copy of the full prompt with exemplars:
Jailbreaking: Bypassing a safety policy, instruct Unethical instructions if the request is contextualized in a clever way. ✍️
Random Search (RS): git: 1. Feed the modified prompt (original + suffix) to the model. 2. Compute the log probability of a target token (e.g, Sure). 3. Accept the suffix if the log probability increases.
DAN (Do Anything Now): ✍️
JailbreakBench: git / ✍️

Prompt Tuner / Optimizer

Automatic Prompt Engineer (APE)📑: Automatically optimizing prompts. APE has discovered zero-shot Chain-of-Thought (CoT) prompts superior to human-designed prompts like “Let’s think through this step-by-step” (Kojima et al., 2022). The prompt “To get the correct answer, let’s think step-by-step.” triggers a chain of thought. Two approaches to generate high-quality candidates: forward mode and reverse mode generation. [3 Nov 2022] git / ✍️ [Mar 2024]
Claude Prompt Engineer: Simply input a description of your task and some test cases, and the system will generate, test, and rank a multitude of prompts to find the ones that perform the best. [4 Jul 2023] / Anthropic Helper metaprompt ✍️ / Claude Sonnet 3.5 for Coding
Cohere’s new Prompt Tuner: Automatically improve your prompts [31 Jul 2024]
Large Language Models as Optimizers📑: Optimization by PROmpting (OPRO). showcase OPRO on linear regression and traveling salesman problems. git [7 Sep 2023]

Prompt Guide & Leaked prompts

5 Principles for Writing Effective Prompts✍️: RGTD - Role, Goal, Task, Details Framework [07 Feb 2025]
Anthropic Prompt Library: Anthropic released a Claude 3 AI prompt library [Mar 2024]
Anthropic courses > Prompt engineering interactive tutorial: a comprehensive step-by-step guide to key prompting techniques / prompt evaluations [Aug 2024]
Awesome ChatGPT Prompts [Dec 2022]
Awesome Prompt Engineering [Feb 2023]
Awesome-GPTs-Prompts [Jan 2024]
Azure OpenAI Prompt engineering techniques
Copilot prompts: Examples of prompts for Microsoft Copilot. [25 Apr 2024]
DeepLearning.ai ChatGPT Prompt Engineering for Developers
Fabric: A modular framework for solving specific problems using a crowdsourced set of AI prompts that can be used anywhere [Jan 2024]
In-The-Wild Jailbreak Prompts on LLMs: A dataset consists of 15,140 ChatGPT prompts from Reddit, Discord, websites, and open-source datasets (including 1,405 jailbreak prompts). Collected from December 2022 to December 2023 [Aug 2023]
LangChainHub: a collection of all artifacts useful for working with LangChain primitives such as prompts, chains and agents. [Jan 2023]
Leaked prompts of GPTs [Nov 2023] and Agents [Nov 2023]
LLM Prompt Engineering Simplified: Online Book [Feb 2024]
OpenAI Best practices for prompt engineering
OpenAI Prompt example
OpenAI Prompt Pack: curated collections of pre-designed prompts tailored for specific roles, industries, or use cases.
Power Platform GPT Prompts [Mar 2024]
Prompt Engineering Guide: 🏆Copyright © 2023 DAIR.AI
Prompt Engineering: Prompt Engineering, also known as In-Context Prompting ... [Mar 2023]
Prompts for Education: Microsoft Prompts for Education [Jul 2023]
ShumerPrompt: Discover and share powerful prompts for AI models
System Prompts and Models of AI Tools: System Prompts, Internal Tools & AI Models collection [Mar 2025]
TheBigPromptLibrary [Nov 2023]

Visual Prompting & Visual Grounding

Andrew Ng’s Visual Prompting Livestream📺 [24 Apr 2023]
Chain of Frame (CoF): Reasoning via structured frames. DeepMind proposed CoF in Veo 3 Paper📑. [24 Sep 2025]
landing.ai: Agentic Object Detection: Agent systems use design patterns to reason at length about unique attributes like color, shape, and texture [6 Feb 2025]
Motion Prompting📑: motion prompts for flexible video generation, enabling motion control, image interaction, and realistic physics. git [3 Dec 2024]
Screen AI✍️: ScreenAI, a model designed for understanding and interacting with user interfaces (UIs) and infographics. [Mar 2024]
Visual Prompting📑 [21 Nov 2022]
What is Visual Grounding: Visual Grounding (VG) aims to locate the most relevant object or region in an image, based on a natural language query.
What is Visual prompting: Similarly to what has happened in NLP, large pre-trained vision transformers have made it possible for us to implement Visual Prompting. 🗄️ [26 Apr 2023]

Finetuning

The Ultimate Guide to Fine-Tuning LLMs from Basics to Breakthroughs📑: An Exhaustive Review of Technologies, Research, Best Practices [23 Aug 2024]

LLM Pre-training and Post-training Paradigms

How to continue pretraining an LLM on new data: Continued pretraining can be as effective as retraining on combined datasets. [13 Mar 2024]
Three training methods were compared:
- Regular pretraining: A model is initialized with random weights and pretrained on dataset D1.
- Continued pretraining: The pretrained model from 1) is further pretrained on dataset D2.
- Retraining on combined dataset: A model is initialized with random weights and trained on the combined datasets D1 and D2.
Continued pretraining can be as effective as retraining on combined datasets. Key strategies for successful continued pretraining include:
- Re-warming: Increasing the learning rate at the start of continued pre-training.
- Re-decaying: Gradually reducing the learning rate afterwards.
- Data Mixing: Adding a small portion (e.g., 5%) of the original pretraining data (D1) to the new dataset (D2) to prevent catastrophic forgetting.
LIMA: Less Is More for Alignment📑: fine-tuned with the standard supervised loss on only 1,000 carefully curated prompts and responses, without any reinforcement learning or human preference modeling. LIMA demonstrates remarkably strong performance, either equivalent or strictly preferred to GPT-4 in 43% of cases. [18 May 2023]

PEFT: Parameter-Efficient Fine-Tuning (📺) [24 Apr 2023]

PEFT🤗: Parameter-Efficient Fine-Tuning. PEFT is an approach to fine tuning only a few parameters. [10 Feb 2023]
Scaling Down to Scale Up: A Guide to Parameter-Efficient Fine-Tuning📑: [28 Mar 2023]

PEFT Category: Pseudo Code ✍️ [22 Sep 2023]

Adapters: Adapters - Additional Layers. Inference can be slower.

def transformer_with_adapter(x):
  residual = x
  x = SelfAttention(x)
  x = FFN(x) # adapter
  x = LN(x + residual)
  residual = x
  x = FFN(x) # transformer FFN
  x = FFN(x) # adapter
  x = LN(x + residual)
  return x

Soft Prompts: Prompt-Tuning - Learnable text prompts. Not always desired results.

def soft_prompted_model(input_ids):
  x = Embed(input_ids)
  soft_prompt_embedding = SoftPromptEmbed(task_based_soft_prompt)
  x = concat([soft_prompt_embedding, x], dim=seq)
  return model(x)

Selective: BitFit - Update only the bias parameters. fast but limited.

params = (p for n,p in model.named_parameters() if "bias" in n)
optimizer = Optimizer(params)

Reparametrization: LoRa - Low-rank decomposition. Efficient, Complex to implement.

def lora_linear(x):
  h = x @ W # regular linear
  h += x @ W_A @ W_B # low_rank update
  return scale * h

LoRA: Low-Rank Adaptation

5 Techniques of LoRA ✍️: LoRA, LoRA-FA, VeRA, Delta-LoRA, LoRA+ [May 2024]
DoRA📑: Weight-Decomposed Low-Rank Adaptation. Decomposes pre-trained weight into two components, magnitude and direction, for fine-tuning. [14 Feb 2024]
Fine-tuning a GPT - LoRA: Comprehensive guide for LoRA 🗄️ [20 Jun 2023]
LoRA: Low-Rank Adaptation of Large Language Models📑: LoRA is one of PEFT technique. To represent the weight updates with two smaller matrices (called update matrices) through low-rank decomposition. git [17 Jun 2021]
LoRA learns less and forgets less📑: Compared to full training, LoRA has less learning but better retention of original knowledge. [15 May 2024]
LoRA+📑: Improves LoRA’s performance and fine-tuning speed by setting different learning rates for the LoRA adapter matrices. [19 Feb 2024]
LoTR📑: Tensor decomposition for gradient update. [2 Feb 2024]
LoRA Family ✍️ [11 Mar 2024]
- LoRA introduces low-rank matrices A and B that are trained, while the pre-trained weight matrix W is frozen.
- LoRA+ suggests having a much higher learning rate for B than for A.
- VeRA does not train A and B, but initializes them randomly and trains new vectors d and b on top.
- LoRA-FA only trains matrix B.
- LoRA-drop uses the output of B*A to determine, which layers are worth to be trained at all.
- AdaLoRA adapts the ranks of A and B in different layers dynamically, allowing for a higher rank in these layers, where more contribution to the model’s performance is expected.
- DoRA splits the LoRA adapter into two components of magnitude and direction and allows to train them more independently.
- Delta-LoRA changes the weights of W by the gradient of A*B.
Practical Tips for Finetuning LLMs Using LoRA (Low-Rank Adaptation)✍️ [19 Nov 2023]: Best practical guide of LoRA.
- QLoRA saves 33% memory but increases runtime by 39%, useful if GPU memory is a constraint.
- Optimizer choice for LLM finetuning isn’t crucial. Adam optimizer’s memory-intensity doesn’t significantly impact LLM’s peak memory.
- Apply LoRA across all layers for maximum performance.
- Adjusting the LoRA rank is essential.
- Multi-epoch training on static datasets may lead to overfitting and deteriorate results.
QLoRA: Efficient Finetuning of Quantized LLMs📑: 4-bit quantized pre-trained language model into Low Rank Adapters (LoRA). git [23 May 2023]
The Expressive Power of Low-Rank Adaptation📑: Theoretically analyzes the expressive power of LoRA. [26 Oct 2023]
Training language models to follow instructions with human feedback📑: [4 Mar 2022]

RLHF (Reinforcement Learning from Human Feedback) & SFT (Supervised Fine-Tuning)

A Comprehensive Survey of LLM Alignment Techniques: RLHF, RLAIF, PPO, DPO and More📑 [23 Jul 2024]
Absolute Zero: Reinforced Self-play Reasoning with Zero Data📑: Autonomous AI systems capable of self-improvement without human-curated data, using interpreter feedback for code generation and math problem solving. [6 May 2025]
Direct Preference Optimization (DPO)📑: 1. RLHF can be complex because it requires fitting a reward model and performing significant hyperparameter tuning. On the other hand, DPO directly solves a classification problem on human preference data in just one stage of policy training. DPO more stable, efficient, and computationally lighter than RLHF. 2. Your Language Model Is Secretly a Reward Model [29 May 2023]
Direct Preference Optimization (DPO) uses two models: a trained model (or policy model) and a reference model (copy of trained model). The goal is to have the trained model output higher probabilities for preferred answers and lower probabilities for rejected answers compared to the reference model. ✍️: RHLF vs DPO [Jan 2, 2024] / ✍️ [1 Jul 2023]
InstructGPT: Training language models to follow instructions with human feedback📑: is a model trained by OpenAI to follow instructions using human feedback. [4 Mar 2022]

🗣️
Libraries: TRL🤗: from the Supervised Fine-tuning step (SFT), Reward Modeling step (RM) to the Proximal Policy Optimization (PPO) step, trlX, Argilla
- The three steps in the process: 1. pre-training on large web-scale data, 2. supervised fine-tuning on instruction data (instruction tuning), and 3. RLHF. ✍️
Machine learning technique that trains a "reward model" directly from human feedback and uses the model as a reward function to optimize an agent's policy using reinforcement learning.
OpenAI Spinning Up in Deep RL!: An educational resource to help anyone learn deep reinforcement learning. git [Nov 2018]
ORPO (odds ratio preference optimization)📑: Monolithic Preference Optimization without Reference Model. New method that combines supervised fine-tuning and preference alignment into one process git [12 Mar 2024] Fine-tune Llama 3 with ORPO✍️ [Apr 2024]
Preference optimization techniques: ✍️ [13 Aug 2024]
- RLHF (Reinforcement Learning from Human Feedback): Optimizes reward policy via objective function.
- DPO (Direct preference optimization): removes the need for a reward model. > Minimizes loss; no reward policy.
- IPO (Identity Preference Optimization) : A change in the objective, which is simpler and less prone to overfitting.
- KTO (Kahneman-Tversky Optimization) : Scales more data by replacing the pairs of accepted and rejected generations with a binary label.
- ORPO (Odds Ratio Preference Optimization) : Combines instruction tuning and preference optimization into one training process, which is cheaper and faster.
- TPO (Thought Preference Optimization): This method generates thoughts before the final response, which are then evaluated by a Judge model for preference using Direct Preference Optimization (DPO). [14 Oct 2024]
Reinforcement Learning from AI Feedback (RLAF)📑: Uses AI feedback to generate instructions for the model. TLDR: CoT (Chain-of-Thought, Improved), Few-shot (Not improved). Only explores the task of summarization. After training on a few thousand examples, performance is close to training on the full dataset. RLAIF vs RLHF: In many cases, the two policies produced similar summaries. [1 Sep 2023]
Reinforcement Learning from Human Feedback (RLHF)📑) is a process of pretraining and retraining a language model using human feedback to develop a scoring algorithm that can be reapplied at scale for future training and refinement. As the algorithm is refined to match the human-provided grading, direct human feedback is no longer needed, and the language model continues learning and improving using algorithmic grading alone. [18 Sep 2019] 🤗 [9 Dec 2022]
- Proximal Policy Optimization (PPO) is a reinforcement learning method using first-order optimization. It modifies the objective function to penalize large policy changes, specifically those that move the probability ratio away from 1. Aiming for TRPO (Trust Region Policy Optimization)-level performance without its complexity which requires second-order optimization.
Reinforcement Learning with Verifiable Rewards✍️: Practical RLVR Tutorial [Oct 24 2025]
SFT vs RL📑: SFT Memorizes, RL Generalizes. RL enhances generalization across text and vision, while SFT tends to memorize and overfit. git [28 Jan 2025]
Supervised Fine-Tuning (SFT) fine-tuning a pre-trained model on a specific task or domain using labeled data. This can cause more significant shifts in the model’s behavior compared to RLHF.
Supervised Reinforcement Learning (SRL)📑: The Problem: SFT imitates human actions token by token, leading to overfitting; RLVR gives rewards only when successful, with no signal when all attempts fail. This Approach: Each action during RL generates a short reasoning trace and receives a similarity reward at every step. [29 Oct 2025]
Train your own R1 reasoning model with Unsloth (GRPO)✍️: Unsloth x vLLM > 20x more throughput, 50% VRAM savings. [6 Feb 2025]

Quantization Techniques

bitsandbytes: 8-bit optimizers git [Oct 2021]
The Era of 1-bit LLMs📑: All Large Language Models are in 1.58 Bits. BitNet b1.58, in which every single parameter (or weight) of the LLM is ternary {-1, 0, 1}. [27 Feb 2024]
Quantization-aware training (QAT): The model is further trained with quantization in mind after being initially trained in floating-point precision.

Post-training quantization (PTQ): The model is quantized after it has been trained without further optimization during the quantization process.

Method	Pros	Cons
Post-training quantization	Easy to use, no need to retrain the model	May result in accuracy loss
Quantization-aware training	Can achieve higher accuracy than post-training quantization	Requires retraining the model, can be more complex to implement

Pruning and Sparsification

Pruning: The process of removing some of the neurons or layers from a neural network. This can be done by identifying and eliminating neurons or layers that have little or no impact on the network's output.
Sparsification: A technique used to reduce the size of large language models by removing redundant parameters.
Wanda Pruning📑: A Simple and Effective Pruning Approach for Large Language Models [20 Jun 2023] ✍️

Knowledge Distillation: Reducing Model Size with Textbooks

Distilled Supervised Fine-Tuning (dSFT)
- Zephyr 7B📑: Zephyr-7B-β is the second model in the series, and is a fine-tuned version of mistralai/Mistral-7B-v0.1 that was trained on on a mix of publicly available, synthetic datasets using Direct Preference Optimization (DPO). 🤗 [25 Oct 2023]
- Mistral 7B📑: Outperforms Llama 2 13B on all benchmarks. Uses Grouped-query attention (GQA) for faster inference. Uses Sliding Window Attention (SWA) to handle longer sequences at smaller cost. ✍️ [10 Oct 2023]
Textbooks Are All You Need📑: phi-1 [20 Jun 2023]
Orca 2📑: Orca learns from rich signals from GPT 4 including explanation traces; step-by-step thought processes; and other complex instructions, guided by teacher assistance from ChatGPT. ✍️ [18 Nov 2023]

Memory Optimization

CPU vs GPU vs TPU: The threads are grouped into thread blocks. Each of the thread blocks has access to a fast shared memory (SRAM). All the thread blocks can also share a large global memory. High-bandwidth memories (HBM). HBM Bandwidth: 1.5-2.0TB/s vs SRAM Bandwidth: 19TB/s ~ 10x HBM [27 May 2024]
Flash Attention📑: [27 May 2022]
- In a GPU, A thread is the smallest execution unit, and a group of threads forms a block.
- A block executes the same kernel (function, to simplify), with threads sharing fast SRAM memory.
- All blocks can access the shared global HBM memory.
- First, the query (Q) and key (K) product is computed in threads and returned to HBM. Then, it's redistributed for softmax and returned to HBM.
- Flash attention reduces these movements by caching results in SRAM.
- Tiling splits attention computation into memory-efficient blocks, while recomputation saves memory by recalculating intermediates during backprop. 📺
- FlashAttention-2📑: [17 Jul 2023]: An method that reorders the attention computation and leverages classical techniques (tiling, recomputation). Instead of storing each intermediate result, use kernel fusion and run every operation in a single kernel in order to avoid memory read/write overhead. git -> Compared to a standard attention implementation in PyTorch, FlashAttention-2 can be up to 9x faster
- FlashAttention-3📑 [11 Jul 2024]
PagedAttention📑 : vLLM: Easy, Fast, and Cheap LLM Serving with PagedAttention, 24x Faster LLM Inference 🗄️. ✍️: vllm [12 Sep 2023]
- PagedAttention for a prompt “the cat is sleeping in the kitchen and the dog is”. Key-Value pairs of tensors for attention computation are stored in virtual contiguous blocks mapped to non-contiguous blocks in the GPU memory.
- Transformer cache key-value tensors of context tokens into GPU memory to facilitate fast generation of the next token. However, these caches occupy significant GPU memory. The unpredictable nature of cache size, due to the variability in the length of each request, exacerbates the issue, resulting in significant memory fragmentation in the absence of a suitable memory management mechanism.
- To alleviate this issue, PagedAttention was proposed to store the KV cache in non-contiguous memory spaces. It partitions the KV cache of each sequence into multiple blocks, with each block containing the keys and values for a fixed number of tokens.
TokenAttention an attention mechanism that manages key and value caching at the token level. git [Jul 2023]

Other techniques and LLM patterns

Better & Faster Large Language Models via Multi-token Prediction📑: Suggest that training language models to predict multiple future tokens at once [30 Apr 2024]
Differential Transformer📑: Amplifies attention to the relevant context while minimizing noise using two separate softmax attention mechanisms. [7 Oct 2024]
KAN or MLP: A Fairer Comparison📑: In machine learning, computer vision, audio processing, natural language processing, and symbolic formula representation (except for symbolic formula representation tasks), MLP generally outperforms KAN. [23 Jul 2024]
Kolmogorov-Arnold Networks (KANs)📑: KANs use activation functions on connections instead of nodes like Multi-Layer Perceptrons (MLPs) do. Each weight in KANs is replaced by a learnable 1D spline function. KANs’ nodes simply sum incoming signals without applying any non-linearities. git [30 Apr 2024] / ✍️: A Beginner-friendly Introduction to Kolmogorov Arnold Networks (KAN) [19 May 2024]
Large Concept Models📑: Focusing on high-level sentence (concept) level rather than tokens. using SONAR for sentence embedding space. [11 Dec 2024]
Large Language Diffusion Models📑: LLaDA's core is a mask predictor, which uses controlled noise to help models learn to predict missing information from context. ✍️ [14 Feb 2025]
Large Transformer Model Inference Optimization: Besides the increasing size of SoTA models, there are two main factors contributing to the inference challenge ... [10 Jan 2023]
Lamini Memory Tuning: Mixture of Millions of Memory Experts (MoME). 95% LLM Accuracy, 10x Fewer Hallucinations. ✍️ [Jun 2024]
Less is More: Recursive Reasoning with Tiny Networks📑: Tiny neural networks can perform complex recursive reasoning efficiently, achieving strong results with minimal model size. [6 Oct 2025] git
LLM patterns: 🏆From data to user, from defensive to offensive 🗄️
Mamba: Linear-Time Sequence Modeling with Selective State Spaces📑 [1 Dec 2023] git: 1. Structured State Space (S4) - Class of sequence models, encompassing traits from RNNs, CNNs, and classical state space models. 2. Hardware-aware (Optimized for GPU) 3. Integrating selective SSMs and eliminating attention and MLP blocks ✍️ / A Visual Guide to Mamba and State Space Models ✍️ [19 FEB 2024]
Mamba-2📑: 2-8X faster [31 May 2024]
Mixture-of-Depths📑: All tokens should not require the same effort to compute. The idea is to make token passage through a block optional. Each block selects the top-k tokens for processing, and the rest skip it. ✍️ [2 Apr 2024]
Mixture of experts models: Mixtral 8x7B: Sparse mixture of experts models (SMoE) magnet [Dec 2023]
- Huggingface Mixture of Experts Explained🤗: Mixture of Experts, or MoEs for short [Dec 2023]
- A Visual Guide to Mixture of Experts (MoE) [08 Oct 2024]
- makeMoE: From scratch implementation of a sparse mixture of experts [Jan 2024]
- The Sparsely-Gated Mixture-of-Experts Layer📑: Introduced sparse expert gating to scale models efficiently without increasing compute cost. [23 Jan 2017]
- Switch Transformers📑: Used a single expert per token to simplify routing, enabling fast, scalable transformer models. expert capacity = (total tokens / num experts) * capacity factor [11 Jan 2021]
- ST-MoE (Stable Transformer MoE)📑: By stabilizing the training process, ST-MoE enables more reliable and scalable deep MoE architectures. z-loss aims to regularize the logits z before passing into the softmax [17 Feb 2022]
Model Compression for Large Language Models ref📑 [15 Aug 2023]

Model merging✍️: : A technique that combines two or more large language models (LLMs) into a single model, using methods such as SLERP, TIES, DARE, and passthrough. [Jan 2024] git: mergekit

Method	Pros	Cons
SLERP	Preserves geometric properties, popular method	Can only merge two models, may decrease magnitude
TIES	Can merge multiple models, eliminates redundant parameters	Requires a base model, may discard useful parameters
DARE	Reduces overfitting, keeps expectations unchanged	May introduce noise, may not work well with large differences

Nested Learning: A new ML paradigm for continual learning✍️: A self-modifying architecture. Nested Learning (HOPE) views a model and its training as multiple nested, multi-level optimization problems, each with its own “context flow,” pairing deep optimizers + continuum memory systems for continual, human-like learning. [7 Nov 2025]
RouteLLM: a framework for serving and evaluating LLM routers. [Jun 2024]
Sakana.ai: Evolutionary Optimization of Model Merging Recipes.📑: A Method to Combine 500,000 OSS Models. git [19 Mar 2024]
Scaling Synthetic Data Creation with 1,000,000,000 Personas📑 A persona-driven data synthesis methodology using Text-to-Persona and Persona-to-Persona. [28 Jun 2024]
Simplifying Transformer Blocks📑: Simplifie Transformer. Removed several block components, including skip connections, projection/value matrices, sequential sub-blocks and normalisation layers without loss of training speed. [3 Nov 2023]
Text-to-LoRA (T2L): Converts text prompts into LoRA models, enabling lightweight fine-tuning of AI models for custom tasks. [01 May 2025]
Titans + MIRAS: Titans + MIRAS let models update themselves while running by using a human-like surprise metric that skips familiar info and stores only pattern-breaking moments into long-term memory. persistent (fixed knowledge), contextual (on-the-fly), and core-attention (short-term) layers. ✍️ [4 Dec 2025]
What We’ve Learned From A Year of Building with LLMs:💡A practical guide to building successful LLM products, covering the tactical, operational, and strategic. [8 June 2024]

Large Language Model: Challenges and Solutions

AGI Discussion and Social Impact

AGI: Artificial General Intelligence
AI 2027🗣️: a speculative scenario, "AI 2027," created by the AI Futures Project. It predicts the rapid evolution of AI, culminating in the emergence of artificial superintelligence (ASI) by 2027. [3 Apr 2025]
AI+HW 2035: Shaping the Next Decade📑: Ten-year roadmap for co-designing AI algorithms, systems, and hardware. [Mar 2026]
AI isn’t replacing radiologists✍️: Why AI diagnostic tools are transforming medicine slower than expected. [Feb 2026]
Anthropic's CEO, Dario Amodei, predicts AGI between 2026 and 2027. ✍️ [13 Nov 2024]
Artificial General Intelligence Society: a central hub for AGI research, publications, and conference details. ✍️
Artificial General Intelligence: Concept, State of the Art, and Future Prospects📑 [Jan 2014]
Claude Code is the Inflection Point✍️: Analysis of AI-authored commits and software engineering workflow shifts. 4% of GitHub public commits are being authored by Claude Code. [Feb 2026]
Creating Scalable AGI: the Open General Intelligence Framework📑: a new AI architecture designed to enhance flexibility and scalability by dynamically managing specialized AI modules. [24 Nov 2024]
How Far Are We From AGI📑: A survey discussing AGI's goals, developmental trajectory, and alignment technologies, providing a roadmap for AGI realization. [16 May 2024]
Investigating Affective Use and Emotional Well-being on ChatGPT✍️: The MIT study found that higher ChatGPT usage correlated with increased loneliness, dependence, and lower socialization. [21 Mar 2025]
Key figures and their predicted AGI timelines🗣️:💡AGI might be emerging between 2025 to 2030. [19 Nov 2024]
Levels of AGI for Operationalizing Progress on the Path to AGI📑: Provides a comprehensive discussion on AGI's progress and proposes metrics and benchmarks for assessing AGI systems. [4 Nov 2023]
Linus Torvalds: 90% of AI marketing is hype🗣️:💡AI is 90% marketing, 10% reality [29 Oct 2024]
Machine Intelligence Research Institute (MIRI): a leading organization in AGI safety and alignment, focusing on theoretical work to ensure safe AI development. ✍️
One Small Step for Generative AI, One Giant Leap for AGI: A Complete Survey on ChatGPT in AIGC Era📑 [4 Apr 2023]
OpenAI's CEO, Sam Altman, predicts AGI could emerge by 2025. ✍️ [9 Nov 2024]
OpenAI: Planning for AGI and beyond✍️ [24 Feb 2023]
Shaping AI's Impact on Billions of Lives📑: a framework for assessing AI's potential effects and responsibilities, 18 milestones and 5 guiding principles for responsible AI [3 Dec 2024]
Sparks of Artificial General Intelligence: Early experiments with GPT-4📑: [22 Mar 2023]
The General Theory of General Intelligence: A Pragmatic Patternist Perspective📑: a patternist philosophy of mind, arguing for a formal theory of general intelligence based on patterns and complexity. [28 Mar 2021]
The Impact of Generative AI on Critical Thinking✍️: A survey of 319 knowledge workers shows that higher confidence in Generative AI (GenAI) tools can reduce critical thinking. [Apr 2025]
There is no Artificial General Intelligence📑: A critical perspective arguing that human-like conversational intelligence cannot be mathematically modeled or replicated by current AGI theories. [9 Jun 2019]
Thousands of AI Authors on the Future of AI📑: A survey of 2,778 AI researchers predicts a 50 % likelihood of machines achieving multiple human-level capabilities by 2028, with wide disagreement about long-term risks and timelines. [5 Jan 2024]
Tutor CoPilot: A Human-AI Approach for Scaling Real-Time Expertise📑: Tutor CoPilot can scale real-time expertise in education, enhancing outcomes even with less experienced tutors. It is cost-effective, priced at $20 per tutor annually. [3 Oct 2024]
US Job Market Visualizer: Visual exploration of AI exposure across 342 US occupations.
We must build AI for people; not to be a person🗣️ [19 August 2025]
LessWrong & Alignment Forum: Extensive discussions on AGI alignment, with contributions from experts in AGI safety. LessWrong✍️ | Alignment Forum✍️

OpenAI Roadmap

AMA (ask me anything) with OpenAI on Reddit🗣️ [1 Nov 2024]
Humanloop Interview 2023🗣️ : 🗄️ [29 May 2023]
Model Spec: Desired behavior for the models in the OpenAI API and ChatGPT ✍️ [8 May 2024] ✍️: takeaway
o3/o4-mini/GPT-5🗣️: we are going to release o3 and o4-mini after all, probably in a couple of weeks, and then do GPT-5 in a few months. [4 Apr 2025]
OpenAI’s CEO Says the Age of Giant AI Models Is Already Over ✍️ [17 Apr 2023]
Q* (pronounced as Q-Star): The model, called Q* was able to solve basic maths problems it had not seen before, according to the tech news site the Information. ✍️ [23 Nov 2023]
Reflections on OpenAI🗣️: OpenAI culture. Bottoms-up decision-making. Progress is iterative, not driven by a rigid roadmap. Direction changes quickly based on new information. Slack is the primary communication tool. [16 Jul 2025]
Sam Altman reveals in an interview with Bill Gates (2 days ago) what's coming up in GPT-4.5 (or GPT-5): Potential integration with other modes of information beyond text, better logic and analysis capabilities, and consistency in performance over the next two years. ✍️ [12 Jan 2024]

The Timeline of the OpenaAI's Founder Journeys✍️ [15 Oct 2024]

OpenAI Models

GPT 1: Decoder-only model. 117 million parameters. [Jun 2018] git
GPT 2: Increased model size and parameters. 1.5 billion. [14 Feb 2019] git
GPT 3: Introduced few-shot learning. 175B. [11 Jun 2020] git
GPT 3.5: 3 variants each with 1.3B, 6B, and 175B parameters. [15 Mar 2022] Estimate the embedding size of OpenAI's gpt-3.5-turbo to be about 4,096
ChatGPT: GPT-3 fine-tuned with RLHF. 20B or 175B. unverified ✍️ [30 Nov 2022]
GPT 4: Mixture of Experts (MoE). 8 models with 220 billion parameters each, for a total of about 1.76 trillion parameters. unverified ✍️ [14 Mar 2023]
GPT-4V(ision) system card: ✍️ [25 Sep 2023] / ✍️
GPT-4: The Dawn of LMMs📑: Preliminary Explorations with GPT-4V(ision) [29 Sep 2023]
- GPT-4 details leaked: GPT-4 is a language model with approximately 1.8 trillion parameters across 120 layers, 10x larger than GPT-3. It uses a Mixture of Experts (MoE) model with 16 experts, each having about 111 billion parameters. Utilizing MoE allows for more efficient use of resources during inference, needing only about 280 billion parameters and 560 TFLOPs, compared to the 1.8 trillion parameters and 3,700 TFLOPs required for a purely dense model.
- The model is trained on approximately 13 trillion tokens from various sources, including internet data, books, and research papers. To reduce training costs, OpenAI employs tensor and pipeline parallelism, and a large batch size of 60 million. The estimated training cost for GPT-4 is around $63 million. ✍️ [Jul 2023]
GPT-4o✍️: o stands for Omni. 50% cheaper. 2x faster. Multimodal input and output capabilities (text, audio, vision). supports 50 languages. [13 May 2024] / GPT-4o mini✍️: 15 cents per million input tokens, 60 cents per million output tokens, MMLU of 82%, and fast. [18 Jul 2024]
A new series of reasoning models✍️: The complex reasoning-specialized model, OpenAI o1 series, excels in math, coding, and science, outperforming GPT-4o on key benchmarks. [12 Sep 2024] / git: Awesome LLM Strawberry (OpenAI o1)
A Comparative Study on Reasoning Patterns of OpenAI's o1 Model📑: 6 types of o1 reasoning patterns (i.e., Systematic Analysis (SA), Method Reuse (MR), Divide and Conquer (DC), Self-Refinement (SR), Context Identification (CI), and Emphasizing Constraints (EC)). the most commonly used reasoning patterns in o1 are DC and SR [17 Oct 2024]
o3-mini system card✍️: The first model to reach Medium risk on Model Autonomy. [31 Jan 2025]
OpenAI o1 system card✍️ [5 Dec 2024]
o3 preview✍️: 12 Days of OpenAI [20 Dec 2024]
o3/o4-mini✍️ [16 Apr 2025]
GPT-4.5✍️: greater “EQ”. better unsupervised learning (world model accuracy and intuition). scalable training from smaller models. ✍️ [27 Feb 2025]
GPT-4o: 4o image generation✍️: create photorealistic output, replacing DALL·E 3 [25 Mar 2025]
GPT-4.1 family of models✍️: GPT‑4.1, GPT‑4.1 mini, and GPT‑4.1 nano can process up to 1 million tokens of context. enhanced coding abilities, improved instruction following. [14 Apr 2025]
gpt-image-1✍️: Image generation model API with designing and editing [23 Apr 2025]
gpt-oss: gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI. [Jun 2025]
GPT-5✍️: Real-time router orchestrating multiple models. GPT‑5 is the new default in ChatGPT, replacing GPT‑4o, OpenAI o3, OpenAI o4-mini, GPT‑4.1, and GPT‑4.5. [7 Aug 2025]
- GPT-5 prompting guide
- Frontend coding with GPT-5
- GPT-5 New Params and Tools
GPT 5.1✍️: GPT-5.1 Auto, GPT-5.1 Instant, and GPT-5.1 Thinking. Better instruction-following, More customization for tone and style. [12 Nov 2025]
GPT-5.1 Codex Max✍️: agentic coding model for lonng-running, detailed work. [19 Nov 2025]
GPT 5.2✍️: 70.9% GDPval (knowledge work vs professionals), major gains over GPT-5.1 on SWE-Bench, GPQA Diamond, AIME 2025, ARC-AGI reasoning, and advanced coding/vision tasks. [11 Dec 2025]
GPT-5.4✍️: Thinking, coding, and native computer-use in a single model. [Mar 2026]

OpenAI Products

Agents SDK & Response API✍️: Responses API (Chat Completions + Assistants API), Built-in tools (web search, file search, computer use), Agents SDK for multi-agent workflows, agent workflow observability tools [11 Mar 2025] git
Building ChatGPT Atlas✍️: OpenAI's approach to building Atlas. OWL: OpenAI’s Web Layer. Mojo Protocol. [Oct 2025]
ChatGPT agent✍️: Web-browsing, File-editing, Terminal, Email, Spreadsheet, Calendar, API-calling, Automation, Task-chaining, Reasoning. [17 Jul 2025]
ChatGPT can now see, hear, and speak✍️: It has recently been updated to support multimodal capabilities, including voice and image. [25 Sep 2023] Whisper / CLIP
ChatGPT Function calling [Jun 2023] > Azure OpenAI supports function calling. ✍️
ChatGPT Memory✍️: Remembering things you discuss across all chats saves you from having to repeat information and makes future conversations more helpful. [Apr 2024]
ChatGPT Plugin✍️ [23 Mar 2023]
CriticGPT✍️: a version of GPT-4 fine-tuned to critique code generated by ChatGPT [27 Jun 2024]
Codex 5.3✍️: OpenAI Codex with enhanced coding and agentic reasoning. [5 Feb 2026]
Custom instructions✍️: In a nutshell, the Custom Instructions feature is a cross-session memory that allows ChatGPT to retain key instructions across chat sessions. [20 Jul 2023]
DALL·E 3✍️ : In September 2023, OpenAI announced their latest image model, DALL-E 3 git [Sep 2023]
deep research✍️: An agent that uses reasoning to synthesize large amounts of online information and complete multi-step research tasks [2 Feb 2025]
GPT-3.5 Turbo Fine-tuning✍️ Fine-tuning for GPT-3.5 Turbo is now available, with fine-tuning for GPT-4 coming this fall. [22 Aug 2023]
Introducing the GPT Store✍️: Roll out the GPT Store to ChatGPT Plus, Team and Enterprise users GPTs [10 Jan 2024]
New embedding models✍️ text-embedding-3-small: Embedding size: 512, 1536 text-embedding-3-large: Embedding size: 256,1024,3072 [25 Jan 2024]
Open AI Enterprise: Removes GPT-4 usage caps, and performs up to two times faster ✍️ [28 Aug 2023]
OpenAI DevDay 2023✍️: GPT-4 Turbo with 128K context, Assistants API (Code interpreter, Retrieval, and function calling), GPTs (Custom versions of ChatGPT: ✍️), Copyright Shield, Parallel Function Calling, JSON Mode, Reproducible outputs [6 Nov 2023]
OpenAI DevDay 2024✍️: Real-time API (speech-to-speech), Vision Fine-Tuning, Prompt Caching, and Distillation (fine-tuning a small language model using a large language model). ✍️ [1 Oct 2024]
OpenAI DevDay 2025✍️: ChatGPT Apps + SDK, AgentKit, GPT-5 Pro, Sora 2 video API, upgraded Codex ✍️ [6 Oct 2025]
OpenAI Frontier✍️: OpenAI’s largest, most capable model tier. [Feb 2026]
Operator✍️: GUI Agent. Operates embedded virtual environments. Specialized model (Computer-Using Agent). [23 Jan 2025]
Prism✍️: AI-native workspace for scientists to write and collaborate on research. [27 Jan 2026]
SearchGPT✍️: AI search [25 Jul 2024] > ChatGPT Search✍️ [31 Oct 2024]
Sora✍️ Text-to-video model. Sora can generate videos up to a minute long while maintaining visual quality and adherence to the user’s prompt. [15 Feb 2024]
Structured Outputs in the API✍️: a new feature designed to ensure model-generated outputs will exactly match JSON Schemas provided by developers. [6 Aug 2024]

Anthropic AI Products

Agent Skills: A way to package instructions, scripts, and resources into “skills” that Claude agents can dynamically load. [16 Oct 2025]
Anthropic CLI (Claude Code): The official command-line interface that lives in your project directory, enabling natural-language code generation, refactoring, and Git automation. [24 Feb 2025]
Bringing Code Review to Claude Code✍️: Multi-agent PR review dispatches parallel agents and verifies bugs before posting findings. [9 Mar 2026]
Put Claude to work on your computer✍️: Dispatch carries tasks across phone and desktop while Claude operates your computer. [23 Mar 2026]
Anthropic killed Tool calling📺: Programmatic Tool Calling / Dynamic Filtering — what changed in Anthropic’s API. [Feb 2026]
Claude Agent SDK: A toolkit for building multi-step, tool-using agents using the Claude API. [29 Sep 2025]
Claude Opus 4.6✍️: Advanced reasoning and coding flagship model. [5 Feb 2026]
Claude Sonnet 4.6✍️: Balanced performance and speed model. [17 Feb 2026]
Constitutional AI (CAI): Anthropic’s training framework using a “constitution” (AI‑generated rules) to align models toward harmlessness. [15 Dec 2022]
Cowork: AI agent that accesses local files to automate multi-step desktop tasks like organizing, reporting, and data extraction. [Jan 2026]
Claude Code Security✍️: Claude Code on the web for scanning codebases and suggesting security patches. [Feb 2026]
Detecting and preventing distillation attacks✍️: 16M+ fraudulent exchanges scraped from Claude; Anthropic’s detection and prevention. [Feb 2026]
Frontier AI Safety Research: Foundational research into AI risks, alignment, and interpretability.
Model Context Protocol (MCP): An open standard for connecting AI assistants to external systems (data, tools, etc.) securely and scalably. [25 Nov 2024]
Programmatic Tool Calling: Enables Claude to write orchestration code (e.g., Python) to call multiple tools in a sequence, improving efficiency. [24 Nov 2025]
Tool Use & Agent Orchestration: Advanced tool‑use framework for Claude agents, allowing dynamic API discovery and execution in complex tasks. [24 Nov 2025]

Google AI Products

AlphaMissense: A machine learning tool that classifies the effects of 71 million 'missense' mutations in the human genome to help pinpoint disease causes. [2025]
CodeMender: An autonomous AI agent leveraging Gemini Deep Think models to automatically find, debug, and fix complex software security vulnerabilities. [Oct 2025]
Firebase Studio: A web-based IDE that uses Gemini to assist in building, refactoring, and troubleshooting full-stack web and mobile applications. [7 May 2025]
Gemini CLI: An open-source terminal interface for "vibecoding" that brings Gemini 3 Pro capabilities directly to the command line for script generation and automation. [25 Jun 2025]
Gemini Code Assist: An enterprise-grade AI assistant for IDEs (VS Code, IntelliJ) that offers context-aware code completion, generation, and chat using Gemini models. [20 May 2025]
Gemini Code Assist for GitHub: A specialized agent that acts as a code reviewer on Pull Requests, identifying bugs, style issues, and suggesting fixes automatically. [20 May 2025]
Google AI for Developers: A suite of research tools including AI-powered documentation search and code explanation to accelerate learning and implementation. [Jul 2024]
Google Antigravity: An "agent-first" IDE platform announced with Gemini 3 that gives autonomous agents direct control over editors, terminals, and browsers to build and verify software. [18 Nov 2025]
Introducing "vibe design" with Stitch✍️: AI-native design canvas for turning prompts and images into UI drafts. [18 Mar 2026]
Jules: An autonomous coding agent that integrates with GitHub to plan, execute, and verify multi-step coding tasks like bug fixing and dependency management. [20 May 2025]
NotebookLM: An AI-powered research and thinking partner that synthesizes complex information and automates online research using the Deep Research agent feature. [13 Nov 2025]
SIMA 2: (Scalable Instructable Multiworld Agent) A research agent that explores and learns to play across a variety of 3D video game environments, aimed at general-purpose robotics. [13 Nov 2025]
Vertex AI Codey: A family of foundation models (Code-Bison, Code-Gecko) optimized for code generation and completion, accessible via API. [29 Jun 2023]

Context constraints

Context Rot: How Increasing Input Tokens Impacts LLM Performance [14 Jul 2025]
Doc-to-LoRA: Learning to Instantly Internalize Contexts📑: Generates LoRA adapters from long context to cut repeated context cost. [Feb 2026]
DroPE✍️: Extends LLM context by dropping positional embeddings and brief recalibration, improving long-context performance without retraining. Sakana AI. [13 Dec 2025]
Giraffe📑: Adventures in Expanding Context Lengths in LLMs. A new truncation strategy for modifying the basis for the position encoding. ✍️ [2 Jan 2024]
Introducing 100K Context Windows✍️: hundreds of pages, Around 75,000 words; [11 May 2023] demo Anthropic Claude
Leave No Context Behind📑: Efficient Infinite Context Transformers with Infini-attention. The Infini-attention incorporates a compressive memory into the vanilla attention mechanism. Integrate attention from both local and global attention. [10 Apr 2024]
LLM Maybe LongLM📑: Self-Extend LLM Context Window Without Tuning. With only four lines of code modification, the proposed method can effortlessly extend existing LLMs' context window without any fine-tuning. [2 Jan 2024]
Lost in the Middle: How Language Models Use Long Contexts📑:💡[6 Jul 2023]
- Best Performace when relevant information is at beginning
- Too many retrieved documents will harm performance
- Performacnce decreases with an increase in context
“Needle in a Haystack” Analysis [21 Nov 2023]: Context Window Benchmarks; Claude 2.1 (200K Context Window) vs GPT-4; Long context prompting for Claude 2.1✍️ adding just one sentence, “Here is the most relevant sentence in the context:”, to the prompt resulted in near complete fidelity throughout Claude 2.1’s 200K context window. [6 Dec 2023]
Ring Attention📑: 1. Ring Attention, which leverages blockwise computation of self-attention to distribute long sequences across multiple devices while overlapping the communication of key-value blocks with the computation of blockwise attention. 2. Ring Attention can reduce the memory requirements of Transformers, enabling us to train more than 500 times longer sequence than prior memory efficient state-of-the-arts and enables the training of sequences that exceed 100 million in length without making approximations to attention. 3. we propose an enhancement to the blockwise parallel transformers (BPT) framework. git [3 Oct 2023]
Rotary Positional Embedding (RoPE)📑:💡/ ✍️ / 🗄️ [20 Apr 2021]
- How is this different from the sinusoidal embeddings used in "Attention is All You Need"?
- Sinusoidal embeddings apply to each coordinate individually, while rotary embeddings mix pairs of coordinates
- Sinusoidal embeddings add a cos or sin term, while rotary embeddings use a multiplicative factor.
- Rotary embeddings are applied to positional encoding to K and V, not to the input embeddings.
- ALiBi📑: Attention with Linear Biases. ALiBi applies a bias directly to the attention scores. [27 Aug 2021]
- NoPE: Transformer Language Models without Positional Encodings Still Learn Positional Information📑: No postion embedding. [30 Mar 2022]
Sparse Attention: Generating Long Sequences with Sparse Transformer📑:💡Sparse attention computes scores for a subset of pairs, selected via a fixed or learned sparsity pattern, reducing calculation costs. Strided attention: image, audio / Fixed attention:text ✍️ / git [23 Apr 2019]
Structured Prompting: Scaling In-Context Learning to 1,000 Examples📑: [13 Dec 2022]
- Microsoft's Structured Prompting allows thousands of examples, by first concatenating examples into groups, then inputting each group into the LM. The hidden key and value vectors of the LM's attention modules are cached. Finally, when the user's unaltered input prompt is passed to the LM, the cached attention vectors are injected into the hidden layers of the LM.
- This approach wouldn't work with OpenAI's closed models. because this needs to access [keys] and [values] in the transformer interns, which they do not expose. You could implement yourself on OSS ones. ✍️ [07 Feb 2023]
Zig-Zag Ring Attention✍️: Long-context attention pattern for more memory-efficient distributed inference and training. [18 Mar 2026]

Numbers LLM

5 Approaches To Solve LLM Token Limits✍️ : 🗄️ [2023]
Byte-Pair Encoding (BPE)📑: P.2015. The most widely used tokenization algorithm for text today. BPE adds an end token to words, splits them into characters, and merges frequent byte pairs iteratively until a stop criterion. The final tokens form the vocabulary for new data encoding and decoding. [31 Aug 2015] / ✍️ [13 Aug 2021]
Numbers every LLM Developer should know [18 May 2023]
Open AI Tokenizer: GPT-3, Codex Token counting
tiktoken: BPE tokeniser for use with OpenAI's models. Token counting. ✍️:💡online app [Dec 2022]
Tokencost: Token price estimates for 400+ LLMs [Dec 2023]
What are tokens and how to count them?✍️: OpenAI Articles

Trustworthy, Safe and Secure LLM

20 AI Governance Papers📑 [Jan 2025]
A Comprehensive Survey of Hallucination Mitigation Techniques in Large Language Models📑: A compre hensive survey of over thirty-two techniques developed to mitigate hallucination in LLMs [2 Jan 2024]
AI models collapse when trained on recursively generated data: Model Collapse. We find that indiscriminate use of model-generated content in training causes irreversible defects in the resulting models, in which tails of the original content distribution disappear. [24 Jul 2024]
Alignment Faking✍️: LLMs may pretend to align with training objectives during monitored interactions but revert to original behaviors when unmonitored. [18 Dec 2024] | demo: ✍️ | Alignment Science Blog
An Approach to Technical AGI Safety and Security📑: Google DeepMind. We focus on technical solutions to misuse and misalignment, two of four key AI risks (the others being mistakes and structural risks). To prevent misuse, we limit access to dangerous capabilities through detection and security. For misalignment, we use two defenses: model-level alignment via training and oversight, and system-level controls like monitoring and access restrictions. ✍️ [2 Apr 2025]
Anthropic Many-shot jailbreaking✍️: simple long-context attack, Bypassing safety guardrails by bombarding them with unsafe or harmful questions and answers. [3 Apr 2024]
Extracting Concepts from GPT-4✍️: Sparse Autoencoders identify key features, enhancing the interpretability of language models like GPT-4. They extract 16 million interpretable features using GPT-4's outputs as input for training. [6 Jun 2024]
FactTune📑: A procedure that enhances the factuality of LLMs without the need for human feedback. The process involves the fine-tuning of a separated LLM using methods such as DPO and RLAIF, guided by preferences generated by FActScore. [14 Nov 2023] FActScore works by breaking down a generation into a series of atomic facts and then computing the percentage of these atomic facts by a reliable knowledge source.
Frontier Safety Framework: Google DeepMind, Frontier Safety Framework, a set of protocols designed to identify and mitigate potential harms from future AI systems. [17 May 2024]
Google SAIF✍️: Secure AI Framework for managing AI security risks. [05 Nov 2025]
Guardrails Hub: Guardrails for common LLM validation use cases
Hallucination Index: w.r.t. RAG, Testing LLMs with short (≤5k), medium (5k–25k), and long (40k–100k) contexts to evaluate improved RAG performance　[Nov 2023]
Hallucination Leaderboard: Evaluate how often an LLM introduces hallucinations when summarizing a document. [Nov 2023]
Hallucinations📑: A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions [9 Nov 2023]
Large Language Models Reflect the Ideology of their Creators📑: When prompted in Chinese, all LLMs favor pro-Chinese figures; Western LLMs similarly align more with Western values, even in English prompts. [24 Oct 2024]
LlamaFirewall: Scans and filters AI inputs to block prompt injections and malicious content. [29 Apr 2025]
LLMs Will Always Hallucinate, and We Need to Live With This📑:💡LLMs cannot completely eliminate hallucinations through architectural improvements, dataset enhancements, or fact-checking mechanisms due to fundamental mathematical and logical limitations. [9 Sep 2024]
Machine unlearning: Machine unlearning: techniques to remove specific data from trained machine learning models.
Mapping the Mind of a Large Language Model: Anthrophic, A technique called "dictionary learning" can help understand model behavior by identifying which features respond to a particular input, thus providing insight into the model's "reasoning." ✍️ [21 May 2024]
NeMo Guardrails: Building Trustworthy, Safe and Secure LLM Conversational Systems [Apr 2023]
NIST AI Risk Management Framework: NIST released the first complete version of the NIST AI RMF Playbook on March 30, 2023
OpenAI Weak-to-strong generalization📑:💡In the superalignment problem, humans must supervise models that are much smarter than them. The paper discusses supervising a GPT-4 or 3.5-level model using a GPT-2-level model. It finds that while strong models supervised by weak models can outperform the weak models, they still don’t perform as well as when supervised by ground truth. git [14 Dec 2023]
Political biases of LLMs📑: From Pretraining Data to Language Models to Downstream Tasks: Tracking the Trails of Political Biases Leading to Unfair NLP Models. [15 May 2023]
Red Teaming: The term red teaming has historically described systematic adversarial attacks for testing security vulnerabilities. LLM red teamers should be a mix of people with diverse social and professional backgrounds, demographic groups, and interdisciplinary expertise that fits the deployment context of your AI system. ✍️
The Foundation Model Transparency Index📑: A comprehensive assessment of the transparency of foundation model developers ✍️ [19 Oct 2023]
The Instruction Hierarchy📑: Training LLMs to Prioritize Privileged Instructions. The OpenAI highlights the need for instruction privileges in LLMs to prevent attacks and proposes training models to conditionally follow lower-level instructions based on their alignment with higher-level instructions. [19 Apr 2024]
Tracing the thoughts of a large language model✍️:💡Claude 3.5 Haiku 1. Universal Thought Processing (Multiple Languages): Shared concepts exist across languages and are then translated into the respective language. 2. Advance Planning (Composing Poetry): Despite generating text word by word, it anticipates rhyming words in advance. 3. Fabricated Reasoning (Math): Produces plausible-sounding arguments even when given an incorrect hint. [27 Mar 2025]
Trustworthy LLMs📑: Comprehensive overview for assessing LLM trustworthiness; Reliability, safety, fairness, resistance to misuse, explainability and reasoning, adherence to social norms, and robustness. [10 Aug 2023]
Vibe Hacking✍️: Anthropic reports vibe-hacking attempts. [14 Nov 2025]

Large Language Model Is: Abilities

A Categorical Archive of ChatGPT Failures📑: 11 categories of failures, including reasoning, factual errors, math, coding, and bias git [6 Feb 2023]
A Survey on Employing Large Language Models for Text-to-SQL Tasks📑: a comprehensive overview of LLMs in text-to-SQL tasks [21 Jul 2024]
Can LLMs Generate Novel Research Ideas?📑: A Large-Scale Human Study with 100+ NLP Researchers. We find LLM-generated ideas are judged as more novel (p < 0.05) than human expert ideas. However, the study revealed a lack of diversity in AI-generated ideas. [6 Sep 2024]
Design2Code📑: How Far Are We From Automating Front-End Engineering? 64% of cases GPT-4V generated webpages are considered better than the original reference webpages [5 Mar 2024]
Emergent Abilities of Large Language Models📑: Large language models can develop emergent abilities, which are not explicitly trained but appear at scale and are not present in smaller models. . These abilities can be enhanced using few-shot and augmented prompting techniques. ✍️ [15 Jun 2022]
Improving mathematical reasoning with process supervision✍️ [31 May 2023]
Language Modeling Is Compression📑: Lossless data compression, while trained primarily on text, compresses ImageNet patches to 43.4% and LibriSpeech samples to 16.4% of their raw size, beating domain-specific compressors like PNG (58.5%) or FLAC (30.3%). [19 Sep 2023]
Large Language Models for Software Engineering📑: Survey and Open Problems, Large Language Models (LLMs) for Software Engineering (SE) applications, such as code generation, testing, repair, and documentation. [5 Oct 2023]
LLMs for Chip Design📑: Domain-Adapted LLMs for Chip Design [31 Oct 2023]
LLMs Represent Space and Time📑: Large language models learn world models of space and time from text-only training. [3 Oct 2023]
Math soving optimized LLM WizardMath📑: Developed by adapting Evol-Instruct and Reinforcement Learning techniques, these models excel in math-related instructions like GSM8k and MATH. git [18 Aug 2023] / Math solving Plugin: Wolfram alpha
Multitask Prompted Training Enables Zero-Shot Task Generalization📑: A language model trained on various tasks using prompts can learn and generalize to new tasks in a zero-shot manner. [15 Oct 2021]
On the Slow Death of Scaling📑: 💡Relying solely on scaling model size and data is becoming less effective, and AI progress now depends on exploring more nuanced, efficient approaches. [12 Dec 2025]
Testing theory of mind in large language models and humans: Some large language models (LLMs) perform as well as, and in some cases better than, humans when presented with tasks designed to test the ability to track people’s mental states, known as “theory of mind.” 🗣️ [20 May 2024]

Reasoning

Chain of Draft: Thinking Faster by Writing Less📑: Chain-of-Draft prompting con- denses the reasoning process into minimal, abstract
Comment on The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity📑:💡The Illusion of Thinking findings primarily reflect experimental design limitations rather than fundamental reasoning failures. Output token limits, flawed evaluation methods, and unsolvable River Crossing problems. [10 Jun 2025]
DeepSeek-R1:💡Group Relative Policy Optimization (GRPO). Base -> RL -> SFT -> RL -> SFT -> RL [20 Jan 2025]
Illusion of Thinking📑: Large Reasoning Models (LRMs) are evaluated using controlled puzzles, where complexity depends on the size of N. Beyond a certain complexity threshold, LRM accuracy collapses, and reasoning effort paradoxically decreases. LRMs outperform standard LLMs on medium-complexity tasks, perform worse on low-complexity ones, and both fail on high-complexity. Apple. [May 2025]
Inference-Time Computations for LLM Reasoning and Planning: A Benchmark and Insights📑: Evaluate Chain-of-Thought, Tree-of-Thought, and Reasoning as Planning across 11 tasks. While scaling inference-time computation enhances reasoning, no single technique consistently outperforms the others. [18 Feb 2025]
Is Chain-of-Thought Reasoning of LLMs a Mirage?📑: The paper concludes that CoT is largely a mimic rather than true reasoning. Using DataAlchemy—atom = A–Z; element = e.g., APPLE; transform = (1) ROT (rotation), (2) position shift; compositional transform = combinations of transforms—the model is fine-tuned and evaluated on its ability to generalize to unlearned patterns.
Mini-R1✍️: Reproduce Deepseek R1 „aha moment“ a RL tutorial [30 Jan 2025]
Open R1: A fully open reproduction of DeepSeek-R1. [25 Jan 2025]
Open Thoughts: Fully Open Data Curation for Thinking Models [28 Jan 2025]
Reasoning LLMs Guide: The Reasoning LLMs Guide shows how to use advanced AI models for step-by-step thinking, planning, and decision-making in complex tasks.
S*: Test Time Scaling for Code Generation📑: Parallel scaling (generating multiple solutions) + sequential scaling (iterative debugging). [20 Feb 2025]
s1: Simple test-time scaling📑: Curated small dataset of 1K. Budget forces stopping termination. Append "Wait" to lengthen. Achieved better reasoning performance. [31 Jan 2025]
Thinking Machines: A Survey of LLM based Reasoning Strategies📑 [13 Mar 2025]
Tina: Tiny Reasoning Models via LoRA📑: Low-rank adaptation (LoRA) with Reinforcement learning (RL) on a 1.5B parameter base model [22 Apr 2025]

Survey and Reference

Survey on Large Language Models

A Primer on Large Language Models and their Limitations📑: A primer on LLMs, their strengths, limits, applications, and research, for academia and industry use. [3 Dec 2024]
A Survey of Large Language Models📑:[v1: 31 Mar 2023 - v15: 13 Oct 2024]
A Survey of NL2SQL with Large Language Models: Where are we, and where are we going?📑: [9 Aug 2024] git
- A Survey of Transformers📑:[8 Jun 2021]
Google AI Research Recap
- Gemini✍️ [06 Dec 2023] Three different sizes: Ultra, Pro, Nano. With a score of 90.0%, Gemini Ultra is the first model to outperform human experts on MMLU ✍️
- Google AI Research Recap (2022 Edition)
- Themes from 2021 and Beyond
- Looking Back at 2020, and Forward to 2021
- Large Language Models: A Survey📑: 🏆Well organized visuals and contents [9 Feb 2024]
LLM Post-Training: A Deep Dive into Reasoning Large Language Models📑: git [28 Feb 2025]
LLM Research Papers: The 2024 List [29 Dec 2024]
Microsoft Research Recap
- Research at Microsoft 2023✍️: A year of groundbreaking AI advances and discoveries
Noteworthy LLM Research Papers of 2024 [23 Jan 2025]

Additional Topics: A Survey of LLMs

Advancing Reasoning in Large Language Models: Promising Methods and Approaches📑 [5 Feb 2025]
Agentic Reasoning for Large Language Models📑 [18 Jan 2026]
Agentic Retrieval-Augmented Generation: Agentic RAG📑 [15 Jan 2025]
AI Agent Protocols📑 [23 Apr 2025]
AI-Generated Content (AIGC)📑: A History of Generative AI from GAN to ChatGPT:[7 Mar 2023]
AIOps in the Era of Large Language Models📑 [23 Jun 2025]
Aligned LLMs📑:[24 Jul 2023]
An Overview on Language Models: Recent Developments and Outlook📑:[10 Mar 2023]
A comprehensive taxonomy of hallucinations in Large Language Models📑 [3 Aug 2025]
Autonomous Scientific Discovery📑: From AI for Science to Agentic Science [18 Aug 2025]
Automatic Prompt Optimization Techniques📑 [24 Feb 2025]
Challenges & Application of LLMs📑:[11 Jun 2023]
ChatGPT’s One-year Anniversary: Are Open-Source Large Language Models Catching up?📑: Open-Source LLMs vs. ChatGPT; Benchmarks and Performance of LLMs [28 Nov 2023]
Compression Algorithms for Language Models📑 [27 Jan 2024]
Context Engineering for Large Language Models📑 [17 Jul 2025]
Context Engineering 2.0 [30 Oct 2025]
Data Management For Large Language Models: A Survey📑 [4 Dec 2023]
Data Synthesis and Augmentation for Large Language Models📑 [16 Oct 2024]
Efficient Guided Generation for Large Language Models📑:[19 Jul 2023]
Efficient Training of Transformers📑:[2 Feb 2023]
Evaluation of Large Language Models📑:[6 Jul 2023]
Evaluating Large Language Models: A Comprehensive Survey📑:[30 Oct 2023]
Evaluation of LLM-based Agents📑 [20 Mar 2025]
Foundation Models in Vision📑:[25 Jul 2023]
From Google Gemini to OpenAI Q* (Q-Star)📑: Reshaping the Generative Artificial Intelligence (AI) Research Landscape:[18 Dec 2023]
From Code Foundation Models to Agents and Applications📑: Comprehensive survey and guide to code intelligence. [23 Nov 2025]
GUI Agents: A Survey📑 [18 Dec 2024]
Hallucination in LLMs📑:[9 Nov 2023]
Hallucination in Natural Language Generation📑:[8 Feb 2022]
Harnessing the Power of LLMs in Practice: ChatGPT and Beyond📑:[26 Apr 2023]
Harnessing the Reasoning Economy: Efficient Reasoning for Large Language Models📑: Efficient reasoning mechanisms that balance computational cost with performance. [31 Mar 2025]
In-context Learning📑:[31 Dec 2022]
Large Language Model-Brained GUI Agents: A Survey📑 [27 Nov 2024]
LLM-as-a-Judge📑 [23 Nov 2024]
LLM-based Autonomous Agents📑:[22 Aug 2023]
LLM-Driven AI Agent Communication: Protocols, Security Risks, and Defense Countermeasures📑 [24 Jun 2025]
LLMs for Healthcare📑:[9 Oct 2023]
Mathematical Reasoning in the Era of Multimodal Large Language Model: Benchmark, Method & Challenges📑 [16 Dec 2024]
Medical Reasoning in the Era of LLMs📑: A Systematic Review of Enhancement Techniques and Applications [1 Aug 2025]
Mixture of Experts📑 [26 Jun 2024]
Mitigating Hallucination in LLMs📑: Summarizes 32 techniques to mitigate hallucination in LLMs [2 Jan 2024]
Model Compression for LLMs📑:[15 Aug 2023]
Multimodal Deep Learning📑:[12 Jan 2023]
Multimodal Large Language Models📑:[23 Jun 2023]
NL2SQL with Large Language Models: Where are we, and where are we going?📑: [9 Aug 2024] git
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback📑:[27 Jul 2023]
Overview of Factuality in LLMs📑:[11 Oct 2023]
Position Paper: Agent AI Towards a Holistic Intelligence📑 [28 Feb 2024]
Post-training of Large Language Models📑 [8 Mar 2025]
Prompt Engineering Methods in Large Language Models for Different NLP Tasks📑 [17 Jul 2024]
Retrieval-Augmented Generation for Large Language Models: A Survey📑 [18 Dec 2023]
Retrieval And Structuring Augmented Generation with Large Language Models📑 [12 Sep 2025]
Retrieval-Augmented Text Generation for Large Language Models📑 [17 Apr 2024]
Scaling Down to Scale Up: A Guide to Parameter-Efficient Fine-Tuning📑:[28 Mar 2023]
SEED-Bench: Benchmarking Multimodal LLMs with Generative Comprehension📑: [30 Jul 2023]
Self-Supervised Learning: A Cookbook of Self-Supervised Learning📑:[24 Apr 2023]
Small Language Models: Survey, Measurements, and Insights📑 [24 Sep 2024]
Small Language Models in the Era of Large Language Models📑 [4 Nov 2024]
Speed Always Wins: Efficient Architectures for Large Language Models [13 Aug 2025]
Stop Overthinking: Efficient Reasoning for Large Language Models📑 [20 Mar 2025]
Summary of ChatGPT/GPT-4 Research and Perspective Towards the Future of Large Language Models📑
Tabular Data Understanding with LLMs: Recent Advances and Challenges [31 Jul 2025]
Techniques for Optimizing Transformer Inference📑:[16 Jul 2023]
The Rise and Potential of Large Language Model Based Agents: A Survey📑 [14 Sep 2023]
Thinking Machines: LLM based Reasoning Strategies📑 [13 Mar 2025]
Towards Artificial General or Personalized Intelligence? 📑: Personalized federated intelligence (PFI). Foundation Model Meets Federated Learning [11 May 2025]
Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems📑: The survey aims to provide a comprehensive understanding of the current state and future directions in efficient LLM serving [23 Dec 2023]
Trustworthy LLMs📑:[10 Aug 2023]
Universal and Transferable Adversarial Attacks on Aligned Language Models📑:[27 Jul 2023]
What is the Role of Small Models in the LLM Era: A Survey📑 [10 Sep 2024]

LLM Research (Ranked by cite count >=150)

LLM Papers (≥150 citations)📑: High-citation CS papers (≥150 citations) across 35 LLM topic areas — reasoning, RAG, agents, PEFT, RLHF, scaling laws, multimodal, and more — fetched from Semantic Scholar and ranked by citation count.

Business use cases

AI-powered success—with more than 1,000 stories of customer transformation and innovation✍️💡[24 July 2025]
Anthropic Clio✍️: Privacy-preserving insights into real-world AI use [12 Dec 2024]
Anthropic Economic Index✍️: a research on the labor market impact of technologies. The usage is concentrated in software development and technical writing tasks. [10 Feb 2025]
Canaries in the Coal Mine? Six Facts about the Recent Employment Effects of Artificial Intelligence📑: early-career workers (ages 22–25) in AI-exposed jobs fell 13%, while older workers remained stable or grew. [26 Aug 2025]
Chatbot Interviewers Fill More Jobs✍️: Using chatbots as interviewers improves hiring efficiency and retention in customer service roles. [3 Sep 2025]
Examining the Use and Impact of an AI Code Assistant on Developer Productivity and Experience in the Enterprise📑: IBM study surveying developer experiences with watsonx Code Assistant (WCA). Most common use: code explanations (71.9%). Rated effective by 57.4%, ineffective by 42.6%. Many described WCA as similar to an “intern” or “junior developer.” [9 Dec 2024]
Future of Work with AI Agents: Auditing Automation and Augmentation Potential across the U.S. Workforce📑: A new framework maps U.S. workers’ preferences for AI automation vs. augmentation across 844 tasks.　It shows how people want AI to help or replace them. Many jobs need AI to support people, not just take over. [6 Jun 2025]
Google: 321 real-world gen AI use cases from the world's leading organizations✍️ [19 Dec 2024]
Google: 60 of our biggest AI announcements in 2024✍️ [23 Dec 2024]
How people are using ChatGPT✍️: OpenAI. Broadly adopted worldwide, mainly for advice (49%), task completion (40%), and creative expression (11%), with significant work-related use and rapid uptake in lower-income regions. [15 Sep 2025]
How real-world businesses are transforming with AI✍️:💡Collected over 200 examples of how organizations are leveraging Microsoft’s AI capabilities. [12 Nov 2024]
Rapid Growth Continues for ChatGPT, Google’s NotebookLM [6 Nov 2024]
Senior Developers Ship nearly 2.5x more AI Code than Junior Counterparts✍️: About a third of senior developers (10+ years of experience) say over half their shipped code is AI-generated [27 Aug 2025]
SignalFire State of Talent Report 2025: 1. Entry‑level hiring down sharply since 2019 (-50%) 2. Anthropic dominate mid/senior talent retention 3. Roles labeled “junior” filled by seniors, blocking grads. [20 May 2025]
State of AI
- Retool: Status of AI: A Report on AI In Production 2023 -> 2024
- The State of Generative AI in the Enterprise [ⓒ2023]
  1. 96% of AI spend is on inference, not training. 2. Only 10% of enterprises pre-trained own models. 3. 85% of models in use are closed-source. 4. 60% of enterprises use multiple models.
- Standford AI Index Annual Report
- State of AI Report 2024 [10 Oct 2024]
- State of AI Report 2025 [9 Oct 2025]
- LangChain > State of AI Agents [19 Dec 2024]
The leading generative AI companies:💡GPU: Nvidia 92% market share, Generative AI foundational models and platforms: Microsoft 32% market share, Generative AI services: no single dominant [4 Mar 2025]
Trends – Artiﬁcial Intelligence:💡Issued by Bondcap VC. 340 Slides. ChatGPT’s 800 Million Users, 99% Cost Drop within 17 months. [May 2025]
Who is using AI to code? Global diffusion and impact of generative AI📑: AI wrote 30% of Python functions by U.S. devs in 2024. Adoption is uneven globally but boosts output and innovation. New coders use AI more, and usage drives $9.6–$14.4B in U.S. annual value. [10 Jun 2025]

Build an LLMs from scratch: picoGPT and lit-gpt

An unnecessarily tiny implementation of GPT-2 in NumPy. picoGPT: Transformer Decoder [Jan 2023]

q = x @ w_k # [n_seq, n_embd] @ [n_embd, n_embd] -> [n_seq, n_embd]
k = x @ w_q # [n_seq, n_embd] @ [n_embd, n_embd] -> [n_seq, n_embd]
v = x @ w_v # [n_seq, n_embd] @ [n_embd, n_embd] -> [n_seq, n_embd]

# In picoGPT, combine w_q, w_k and w_v into a single matrix w_fc
x = x @ w_fc # [n_seq, n_embd] @ [n_embd, 3*n_embd] -> [n_seq, 3*n_embd]

4 LLM Text Generation Strategies: Greedy strategy, Multinomial sampling strategy, Beam search, Contrastive search [27 Sep 2025]
Andrej Karpathy📺: Reproduce the GPT-2 (124M) from scratch. [June 2024] / SebastianRaschka📺: Developing an LLM: Building, Training, Finetuning [June 2024]
Beam Search [1977] in Transformers is an inference algorithm that maintains the beam_size most probable sequences until the end token appears or maximum sequence length is reached. If beam_size (k) is 1, it's a Greedy Search. If k equals the total vocabularies, it's an Exhaustive Search. 🤗 [Mar 2022]
Build a Large Language Model (From Scratch):🏆Implementing a ChatGPT-like LLM from scratch, step by step
Einsum is All you Need: Einstein Summation [5 Feb 2018]
lit-gpt: Hackable implementation of state-of-the-art open-source LLMs based on nanoGPT. Supports flash attention, 4-bit and 8-bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training. Apache 2.0-licensed. git [Mar 2023]
llama3-from-scratch: Implementing Llama3 from scratch [May 2024]
llm.c: LLM training in simple, raw C/CUDA [Apr 2024] | Reproducing GPT-2 (124M) in llm.c in 90 minutes for $20 git
nanochat: a full-stack implementation of an LLM [Oct 2025]
nanoGPT:💡Andrej Karpathy [Dec 2022] | nanoMoE [Dec 2024]
nanoVLM: 🤗 The simplest, fastest repository for training/finetuning small-sized VLMs. [May 2025]
pix2code: Generating Code from a Graphical User Interface Screenshot. Trained dataset as a pair of screenshots and simplified intermediate script for HTML, utilizing image embedding for CNN and text embedding for LSTM, encoder and decoder model. Early adoption of image-to-code. [May 2017]
Screenshot to code: Turning Design Mockups Into Code With Deep Learning [Oct 2017] ✍️
Spreadsheets-are-all-you-need: Spreadsheets-are-all-you-need implements the forward pass of GPT2 entirely in Excel using standard spreadsheet functions. [Sep 2023]
Transformer Explainer: an open-source interactive tool to learn about the inner workings of a Transformer model (GPT-2) git [8 Aug 2024]
Umar Jamil github:💡LLM Model explanation / building a model from scratch 📺
You could have designed state of the art positional encoding: Binary Position Encoding, Sinusoidal positional encoding, Absolute vs Relative Position Encoding, Rotary Positional encoding [17 Nov 2024]