English · 简体中文 · 日本語 · Español · Français · Deutsch · 한국어
A Claude Code Agent Skill that forces disciplined first-principles reasoning — deconstruction, Socratic questioning, inversion, reconstruction, and falsification — before any architecture or implementation is proposed.
"The first basis from which a thing is known." — Aristotle
Instead of reasoning by analogy ("how did others solve this?"), the skill makes the AI reason from bedrock ("what is actually true here, and what can we build from that?"). It activates automatically on architecture decisions, technology selection, hard debugging, performance work, and migrations, and stays dormant on trivial tasks.
Large language models default to analogy-driven design. On architectural prompts, that default pulls them toward the most-discussed solution in their training data (microservices, Kafka, Redis, Saga, mesh), regardless of whether those components are justified by the problem's actual constraints.
This Skill counteracts the pull by inserting a disciplined reasoning pipeline between the prompt and the answer:
- Intake — restate the problem in outcome terms, not solution terms.
- Socratic questioning — surface hidden assumptions with six question types and a red-flag-phrase watchlist.
- Decomposition — tag every component as
[TRUTH],[ASSUMPTION], or[UNKNOWN], with a three-question ground-truth test. - Inversion — "what would guarantee failure?" — to catch gaps the forward analysis misses (Munger's rule).
- Reconstruction — build 2–3 candidate paths using only the verified truths; Chesterton's Fence before removing legacy.
- Verification — falsifiability statement + 5 Whys on the chosen path + explicit reversibility cost.
- Artifact — structured "First Principles Analysis" block that stays in context and guides all downstream work.
See first-principles-thinking/SKILL.md for the full specification, references/techniques.md for the reasoning toolbox, and references/examples.md for worked examples.
Claude Code discovers skills in two locations:
| Scope | Path |
|---|---|
| User-wide | ~/.claude/skills/first-principles-thinking/ (macOS / Linux)%USERPROFILE%\.claude\skills\first-principles-thinking\ (Windows) |
| Project-only | <your-repo>/.claude/skills/first-principles-thinking/ |
git clone https://github.com/B143KC47/claude-code-first-principles-skill.git
mkdir -p ~/.claude/skills
cp -r claude-code-first-principles-skill/first-principles-thinking ~/.claude/skills/git clone https://github.com/B143KC47/claude-code-first-principles-skill.git
New-Item -ItemType Directory -Force -Path "$env:USERPROFILE\.claude\skills" | Out-Null
Copy-Item -Recurse claude-code-first-principles-skill\first-principles-thinking `
"$env:USERPROFILE\.claude\skills\"mkdir -p .claude/skills
cp -r path/to/first-principles-thinking .claude/skills/The skill loads automatically the next time Claude Code starts.
Claude Code's Skill system auto-invokes based on the description field. You can also trigger the skill explicitly with any of:
first principles/FP mode/from scratchchallenge my assumptionswhy are we doing it this way?think from bedrock
The skill also auto-activates on architecture / technology-selection / hard-debugging / migration prompts, and on convention language (best practice, industry standard, everyone uses X).
It explicitly skips on trivial requests (rename X to Y, fix this typo, scaffold a component) — see the Skip Signals section of SKILL.md.
first-principles-thinking/
├── SKILL.md # Entry file. Phased instructions. ~400 LOC.
└── references/
├── techniques.md # Reasoning toolbox: Socratic, 5 Whys, Inversion...
└── examples.md # Four worked engineering examples.
The skill follows the Claude Code Agent Skills spec:
- Directory name matches the
namefield in the YAML frontmatter. SKILL.mdstays under the recommended 500-line budget.- Detailed material lives in
references/and is loaded progressively.
This repository includes a runnable A/B benchmark harness for comparing baseline prompting (no skill) vs. skill prompting (inject full SKILL.md).
benchmarks/tasks.json— multi-mode benchmark tasksbenchmarks/run_experiment.py— DeepSeek runner + scoringbenchmarks/BENCHMARK_SOURCES.md— authoritative benchmark provenance + fairness notesEXPERIMENT.md— methodology, outputs, interpretation guide
Quick start:
python benchmarks/run_experiment.py --dry-run --runs 1
python benchmarks/run_experiment.py --runs 2 --judge bothAn automated A/B benchmark was run against the DeepSeek API (deepseek-chat) across 8 engineering-reasoning tasks x 3 runs x 2 conditions = 48 outputs, with fairness controls enabled. Raw artifacts are in benchmarks/results/20260423-204733/.
Headline numbers:
| Metric | Baseline | Skill | Delta (Skill - Baseline) |
|---|---|---|---|
| Rule score (composite) | 0.507 | 0.588 | +0.081 |
| Claim-Ledger lanes filled (/5) | 2.92 | 4.13 | +1.21 |
| Structural score | 0.594 | 0.691 | +0.096 |
| Skill-marker coverage | 0.639 | 0.792 | +0.153 |
| Anti-pattern rate (lower is better) | 0.125 | 0.083 | -0.042 |
| Word count | 1882 | 1802 | -80 |
- Paired rule-score delta: mean +0.081, 95% bootstrap CI [+0.044, +0.118].
- Sign of the delta: positive in 18 / 24 paired runs.
- LLM blind judge: 3 skill wins / 2 baseline wins / 19 ties out of 24.
What the experiment shows:
- Explicit skill activation produces a statistically positive rule-score lift.
- The lift is concentrated in process signals, not output length.
- Anti-pattern rate drops: the Skill condition less often reaches for convention-driven defaults without justification.
Honest limitations and threats to validity:
- The baseline is not a true "no-skill" condition. The measured +0.081 is a conservative lower bound.
- The skill arm was rate-limited.
MAX_TOOL_ITERATIONS = 6. - Single model, single provider, single day.
- LLM judge is inconclusive.
- 8 tasks is small.
Full methodology is in EXPERIMENT.md.
Compared to other first-principles skills in the ecosystem, this one makes three explicit design choices:
- Three-label tagging (
TRUTH/ASSUMPTION/UNKNOWN) rather than two. - Explicit Inversion phase separate from forward Socratic questioning.
- Chesterton's Fence + Falsifiability + Reversibility are first-class verification steps.
The Skill is opinionated about process, neutral about solution.
Issues and PRs welcome, especially:
- Additional worked examples in
references/examples.md - Blind-scored replications of the experiment on other tasks
- Additional reasoning techniques for
references/techniques.md
Please keep SKILL.md under 500 lines; add depth by extending the references.
Apache-2.0 — see LICENSE and NOTICE.