Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 28 additions & 0 deletions WIP.md
Original file line number Diff line number Diff line change
Expand Up @@ -428,6 +428,34 @@ WIP.md itself (and other files outside `docs/`) is not rendered through tbdocs a

Python scripts are reserved for non-render concerns: one-off content conversion (e.g. `scripts/convert_em_dash_separators.py`), repo audits, dev tooling, link checks beyond `check.bat`. They are never a prerequisite for the render pipeline.

`wisdom/` — Discord knowledge-harvesting tool (three-phase: export → process → extract). Plans in `wisdom/PLAN-{1,2,3}.md`; implementation under `wisdom/`. Uses only Node.js built-in APIs.

### Wisdom Phase 3 --- Extract invocation

Phase 3 runs Claude agents over the processed thread `.md` files and drafts documentation additions. The default flow is **incremental**: only threads whose Discord-side content has actually changed since the last successful merge are re-extracted. Combined with the incremental `export` and `process` phases, the routine update is three commands with no flags:

```
node wisdom/wisdom.mjs export
node wisdom/wisdom.mjs process
node wisdom/wisdom.mjs extract
```

The `extract` step itself is a three-stage flow that Claude orchestrates: prep → workflow → merge. The merge step grafts new findings into the long-lived `staging.md`, replacing matching sections in place and emitting `[REFINED?]` markers for findings whose prior version has been reviewed and removed --- so pending review work is never clobbered.

1. **Prep**: `node wisdom/wisdom.mjs extract` filters threads against `wisdom/data/findings/extract-state.json` (the per-thread `last_message_id` + `message_count` watermark, advanced on each successful merge). Only changed threads survive the filter. Shared reference files (`package-summary.txt`, `page-index.json`) are written once; per-batch files contain thread file paths, per-thread file sizes, config, and a `mode` field (`incremental`, `since`, `all`, or `force`) consumed by the merge step.
- If the filter is empty (no threads have changed), prep exits with `No new threads since the last successful merge` and the merge step is unnecessary.
2. **Run the workflow**: check which prep layout was produced:
- **Single-batch** (≤200 threads) --- `extract-prep.json` exists: read it, pass its parsed contents as `args` to the Workflow tool with `scriptPath: "wisdom/extract/workflow.mjs"`. Write the returned `{ additions: [...] }` array to `wisdom/data/findings/extract-results-0.json`.
- **Multi-batch** (>200 threads) --- `extract-manifest.json` exists: read it, then for each entry in `manifest.batches`, read the corresponding `extract-batch-{i}.json`, pass it as `args` to the Workflow tool, and write the returned additions array to `extract-results-{i}.json`. Skip batches whose result file already exists (resumability).
3. **Merge**: `node wisdom/wisdom.mjs extract --merge` reads every `extract-results-*.json`, grafts the additions into `staging.md` per the merge semantics in [wisdom/PLAN-3.md](wisdom/PLAN-3.md#merge-semantics), and advances `extract-state.json`. Atomic write via temp+rename; previous `staging.md` retained as `staging.md.bak` for one generation.
4. **Review**: `wisdom/data/findings/staging.md` is the long-lived human-review file --- grouped by target page, with `[DUPLICATE?]` markers for cross-thread overlaps, `[REFINED?]` markers for previously-processed findings whose source thread has grown, and an `Unmapped Findings` section for findings with no existing doc page. The reviewer can append `[LOCKED]` to any section header to prevent auto-replacement on the next merge.

**Diagnostic modes** (mutually exclusive primary flags):
- `extract --since 2026-06-04` --- filter by thread `created` date instead of watermark. State is not touched; merge writes to `staging-since-2026-06-04.md` rather than grafting into the canonical staging file.
- `extract --all` --- bootstrap / re-baseline. Processes every thread regardless of state and ignores the channel filter. Merge still grafts (replacing matching sections, inserting new ones) so an existing `staging.md` is not lost.
- `extract --force` --- ignores the watermark filter but respects `--channel`. Useful when the agent prompt changes and specific channels need re-extraction.
- `extract --dry-run` --- writes the prep file but does not invoke the workflow; state is not touched.

## Build pipeline

The site builds via [builder/](builder/), a custom Node.js static site generator (`tbdocs`). See [builder/PLAN.md](builder/PLAN.md) for the architecture overview, [builder/README.md](builder/README.md) for the quickstart, and the [tbdocs Internals](docs/Documentation/Builder.md) site page for the high-level tour.
Expand Down
Loading