🧠 Claude Code First Principles Skill

English · 简体中文 · 日本語 · Español · Français · Deutsch · 한국어

A Claude Code Agent Skill that forces disciplined first-principles reasoning — deconstruction, Socratic questioning, inversion, reconstruction, and falsification — before any architecture or implementation is proposed.

"The first basis from which a thing is known." — Aristotle

Instead of reasoning by analogy ("how did others solve this?"), the skill makes the AI reason from bedrock ("what is actually true here, and what can we build from that?"). It activates automatically on architecture decisions, technology selection, hard debugging, performance work, and migrations, and stays dormant on trivial tasks.

🎯 Why this exists

Large language models default to analogy-driven design. On architectural prompts, that default pulls them toward the most-discussed solution in their training data (microservices, Kafka, Redis, Saga, mesh), regardless of whether those components are justified by the problem's actual constraints.

This Skill counteracts the pull by inserting a disciplined reasoning pipeline between the prompt and the answer:

Intake — restate the problem in outcome terms, not solution terms.
Socratic questioning — surface hidden assumptions with six question types and a red-flag-phrase watchlist.
Decomposition — tag every component as [TRUTH], [ASSUMPTION], or [UNKNOWN], with a three-question ground-truth test.
Inversion — "what would guarantee failure?" — to catch gaps the forward analysis misses (Munger's rule).
Reconstruction — build 2–3 candidate paths using only the verified truths; Chesterton's Fence before removing legacy.
Verification — falsifiability statement + 5 Whys on the chosen path + explicit reversibility cost.
Artifact — structured "First Principles Analysis" block that stays in context and guides all downstream work.

See first-principles-thinking/SKILL.md for the full specification, references/techniques.md for the reasoning toolbox, and references/examples.md for worked examples.

📦 Installation

Claude Code discovers skills in two locations:

Scope	Path
User-wide	`~/.claude/skills/first-principles-thinking/` (macOS / Linux) `%USERPROFILE%\.claude\skills\first-principles-thinking\` (Windows)
Project-only	`<your-repo>/.claude/skills/first-principles-thinking/`

macOS / Linux

git clone https://github.com/B143KC47/claude-code-first-principles-skill.git
mkdir -p ~/.claude/skills
cp -r claude-code-first-principles-skill/first-principles-thinking ~/.claude/skills/

Windows (PowerShell)

git clone https://github.com/B143KC47/claude-code-first-principles-skill.git
New-Item -ItemType Directory -Force -Path "$env:USERPROFILE\.claude\skills" | Out-Null
Copy-Item -Recurse claude-code-first-principles-skill\first-principles-thinking `
          "$env:USERPROFILE\.claude\skills\"

Project-local (alternative)

mkdir -p .claude/skills
cp -r path/to/first-principles-thinking .claude/skills/

The skill loads automatically the next time Claude Code starts.

🚀 How to trigger it

Claude Code's Skill system auto-invokes based on the description field. You can also trigger the skill explicitly with any of:

first principles / FP mode / from scratch
challenge my assumptions
why are we doing it this way?
think from bedrock

The skill also auto-activates on architecture / technology-selection / hard-debugging / migration prompts, and on convention language (best practice, industry standard, everyone uses X).

It explicitly skips on trivial requests (rename X to Y, fix this typo, scaffold a component) — see the Skip Signals section of SKILL.md.

📂 Structure

first-principles-thinking/
├── SKILL.md                     # Entry file. Phased instructions. ~400 LOC.
└── references/
    ├── techniques.md            # Reasoning toolbox: Socratic, 5 Whys, Inversion...
    └── examples.md              # Four worked engineering examples.

The skill follows the Claude Code Agent Skills spec:

Directory name matches the name field in the YAML frontmatter.
SKILL.md stays under the recommended 500-line budget.
Detailed material lives in references/ and is loaded progressively.

Reproducible benchmark harness (DeepSeek)

This repository includes a runnable A/B benchmark harness for comparing baseline prompting (no skill) vs. skill prompting (inject full SKILL.md).

benchmarks/tasks.json — multi-mode benchmark tasks
benchmarks/run_experiment.py — DeepSeek runner + scoring
benchmarks/BENCHMARK_SOURCES.md — authoritative benchmark provenance + fairness notes
EXPERIMENT.md — methodology, outputs, interpretation guide

Quick start:

python benchmarks/run_experiment.py --dry-run --runs 1
python benchmarks/run_experiment.py --runs 2 --judge both

📊 Experiment: does it actually change the output?

An automated A/B benchmark was run against the DeepSeek API (deepseek-chat) across 8 engineering-reasoning tasks x 3 runs x 2 conditions = 48 outputs, with fairness controls enabled. Raw artifacts are in benchmarks/results/20260423-204733/.

Headline numbers:

Metric	Baseline	Skill	Delta (Skill - Baseline)
Rule score (composite)	0.507	0.588	+0.081
Claim-Ledger lanes filled (/5)	2.92	4.13	+1.21
Structural score	0.594	0.691	+0.096
Skill-marker coverage	0.639	0.792	+0.153
Anti-pattern rate (lower is better)	0.125	0.083	-0.042
Word count	1882	1802	-80

Paired rule-score delta: mean +0.081, 95% bootstrap CI [+0.044, +0.118].
Sign of the delta: positive in 18 / 24 paired runs.
LLM blind judge: 3 skill wins / 2 baseline wins / 19 ties out of 24.

What the experiment shows:

Explicit skill activation produces a statistically positive rule-score lift.
The lift is concentrated in process signals, not output length.
Anti-pattern rate drops: the Skill condition less often reaches for convention-driven defaults without justification.

Honest limitations and threats to validity:

The baseline is not a true "no-skill" condition. The measured +0.081 is a conservative lower bound.
The skill arm was rate-limited. MAX_TOOL_ITERATIONS = 6.
Single model, single provider, single day.
LLM judge is inconclusive.
8 tasks is small.

Full methodology is in EXPERIMENT.md.

📐 Design rationale

Compared to other first-principles skills in the ecosystem, this one makes three explicit design choices:

Three-label tagging (TRUTH / ASSUMPTION / UNKNOWN) rather than two.
Explicit Inversion phase separate from forward Socratic questioning.
Chesterton's Fence + Falsifiability + Reversibility are first-class verification steps.

The Skill is opinionated about process, neutral about solution.

🤝 Contributing

Issues and PRs welcome, especially:

Additional worked examples in references/examples.md
Blind-scored replications of the experiment on other tasks
Additional reasoning techniques for references/techniques.md

Please keep SKILL.md under 500 lines; add depth by extending the references.

📄 License

Apache-2.0 — see LICENSE and NOTICE.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
benchmarks		benchmarks
docs		docs
first-principles-thinking		first-principles-thinking
.env.example		.env.example
.gitignore		.gitignore
CITATION.cff		CITATION.cff
EXPERIMENT.md		EXPERIMENT.md
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧠 Claude Code First Principles Skill

🎯 Why this exists

📦 Installation

macOS / Linux

Windows (PowerShell)

Project-local (alternative)

🚀 How to trigger it

📂 Structure

Reproducible benchmark harness (DeepSeek)

📊 Experiment: does it actually change the output?

📐 Design rationale

🤝 Contributing

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🧠 Claude Code First Principles Skill

🎯 Why this exists

📦 Installation

macOS / Linux

Windows (PowerShell)

Project-local (alternative)

🚀 How to trigger it

📂 Structure

Reproducible benchmark harness (DeepSeek)

📊 Experiment: does it actually change the output?

📐 Design rationale

🤝 Contributing

📄 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages