This repository is being built iteratively with the assistance of coding agents (e.g. Claude Code). All contributors (human or agent) MUST follow these standards.
-
Work milestone-by-milestone
- Implement ONLY the requested milestone scope.
- Do not start future milestones early.
-
No broken builds
- All changes must build and run locally.
- All tests must pass before considering work complete.
-
No undocumented behaviour
- If you add or change functionality, update the relevant docs.
-
No silent complexity
- Prefer simple, explicit code over clever abstractions.
- Leave clear comments where logic is non-obvious.
-
Capture workflow instructions in AGENTS.md
- When the user provides instructions about approaches, processes, or ways of working, add them to this file.
| Component | Dev | Prod |
|---|---|---|
| Language | Python 3.11+ | Python 3.11+ |
| LLM | gpt-oss-20b (Ollama) | GPT-5.2 (OpenAI API) |
| Embeddings | Nomic Embed Text V2 (Ollama) | Nomic Embed Text V2 |
| Vector Store | ChromaDB | ChromaDB |
| CLI Framework | Typer | Typer |
| Config | Pydantic Settings | Pydantic Settings |
| Logging | structlog (JSON) | structlog (JSON) |
| Runtime | Docker Compose | Docker Compose |
Every milestone MUST include automated tests.
- Unit tests — Pure logic (chunker, context builder, citation validator)
- Integration tests — Component interactions (embedder + ChromaDB, LLM client)
- Smoke tests — Full pipeline works end-to-end
- All new logic must have corresponding tests.
- Critical functions (citation validation, guardrails) require high coverage.
- CI checks must pass:
pytest,ruff check,mypy.
Before considering work complete:
- Rebuild containers —
docker compose build - Start services —
docker compose up -d - Verify manually — Run CLI commands against real services
- Test edge cases — Empty repos, large files, unsupported questions
# Start development environment
docker compose up -d
# Run tests
pytest
# Run linting
ruff check .
# Run type checking
mypy .
# Run all CI checks
pytest && ruff check . && mypy .
# Build Docker image
docker compose build
# View logs
docker compose logs -f nexusREADME.md— Quickstart, usage, architecture overviewdocs/DECISIONS.md— Architecture decision recordsdocs/plans/— Design documents per milestone
Update README.md or create CHANGELOG.md with:
- What was added/changed
- How to verify locally
- Tests added
All public functions must include docstrings:
def retrieve_chunks(query: str, top_k: int = 8) -> list[Chunk]:
"""Retrieve relevant chunks for a query.
Args:
query: The natural language question.
top_k: Maximum number of chunks to return.
Returns:
List of Chunk objects with metadata and similarity scores.
Raises:
RetrievalError: If ChromaDB connection fails.
"""Add comments for:
- Complex regex patterns (citation extraction)
- Security-sensitive logic (input validation)
- Non-obvious algorithmic choices (chunking overlap)
All functions must have complete type hints. No Any unless unavoidable.
- Repository scaffold
- Dependencies pinned
- CLI entrypoints scaffolded
- Docker Compose files
- Configuration setup
- Ingestion pipeline (walker, chunker, embedder, index)
-
nexus ingest <repo>functional
- Semantic search (top-k, threshold, diversity)
- Context builder (prompt assembly)
- LLM answer generation
-
nexus ask "<question>"functional
- Post-hoc citation validation with full rejection
- Structured logging (structlog, JSON/console)
- Bracketed inline citation format
- Eval harness with hit-rate reporting
- README finalised with all spec sections
- Docker packaging verified
- Web UI
- Authentication
- Multi-tenancy
- Distributed indexing
Before declaring a milestone complete:
- Summary of changes provided
- Verification steps documented
- Tests added/updated
- CI passes locally
- Docs updated
- User has validated manually
Before committing milestone work:
- Rebuild —
docker compose build - Run tests —
pytest - Run lints —
ruff check . && mypy . - Ask for validation — Provide specific steps
- Wait for approval — Only commit after user confirms
Nomic Embed Text V2 requires instruction prefixes:
- Documents:
search_document: <content> - Queries:
search_query: <question>
Omitting these prefixes significantly degrades retrieval quality.
All answers must include citations in this format:
relative/path/to/file.ext:start_line-end_line
Examples:
src/auth/login.py:45-92README.md:1-25
Invalid citations (files not in retrieved context) must be rejected.