Skip to content

feat: add Aider chat history markdown normalizer#172

Open
mvanhorn wants to merge 2 commits intoMemPalace:developfrom
mvanhorn:feat/59-aider-normalizer
Open

feat: add Aider chat history markdown normalizer#172
mvanhorn wants to merge 2 commits intoMemPalace:developfrom
mvanhorn:feat/59-aider-normalizer

Conversation

@mvanhorn
Copy link
Copy Markdown
Contributor

@mvanhorn mvanhorn commented Apr 7, 2026

What does this PR do?

Adds a _try_aider_md() parser to normalize.py that extracts user/assistant conversations from Aider's .aider.chat.history.md files. Aider uses #### headings for user messages with assistant responses as plain text between them.

Wired into the normalize() auto-detect chain before JSON parsing. Only activates for .md files with 2+ #### markers.

Partially addresses #59 (Aider was called out as "lowest-hanging fruit" in the issue).

How to test

pytest tests/test_normalize.py::test_aider_md -v

Or with a real Aider history file:

from mempalace.normalize import normalize
result = normalize(".aider.chat.history.md")
print(result[:500])

Checklist

  • Tests pass (python -m pytest tests/ -v) - 102 tests pass
  • No hardcoded paths
  • Linter passes (ruff check .)

This contribution was developed with AI assistance (Codex).

Add _try_aider_md() parser to normalize.py for .aider.chat.history.md
files. Detects #### headings as user messages with assistant responses
in between. Wired into the normalize() auto-detect chain before JSON
parsing.

Partially addresses MemPalace#59.
@adv3nt3
Copy link
Copy Markdown
Contributor

adv3nt3 commented Apr 7, 2026

Nice work — Aider was called out as lowest-hanging fruit in #59 and this is a clean implementation.

A few concerns:

  1. False positive risk on regular markdown files. Any .md file with 2+ #### headings will trigger the parser — CONTRIBUTING.md, CHANGELOG.md, API docs, etc. Aider's history file has a very specific name (.aider.chat.history.md). Checking the filename would be a much safer fingerprint:
if ext == ".md" and Path(filepath).name == ".aider.chat.history.md":
  1. User messages can span multiple lines. In real Aider sessions, users sometimes paste multi-line code or context after the #### heading. The current parser treats everything after #### as assistant text until the next ####, which would split a multi-line user prompt and misattribute part of it as an assistant response.

  2. No structural fingerprint. The other JSONL/JSON parsers (Codex, Gemini, Pi) check for a session header or unique keys to confirm the format. Relying on #### count alone in .md files is fragile — combining the filename check from point 1 would solve this.

…arkdown

Check for .aider.chat.history.md specifically instead of matching any .md
file with #### headings. Addresses review feedback: CONTRIBUTING.md,
CHANGELOG.md, and API docs would have falsely triggered the Aider parser.

Add test_aider_rejects_generic_md to verify regular markdown files are not
parsed as Aider chat history.
@mvanhorn
Copy link
Copy Markdown
Contributor Author

mvanhorn commented Apr 8, 2026

Great catches. Fixed all three in 3277254:

  1. Fingerprint now checks Path(filepath).name == ".aider.chat.history.md" instead of matching any .md with #### headings. CONTRIBUTING.md, CHANGELOG.md, etc. are no longer false positives.

  2. Re: multi-line user messages - in Aider's actual format, the full user prompt lives on the #### line (even long ones). Lines after are always assistant output. But if we see real-world exceptions to this pattern, happy to revisit.

  3. The filename check serves as the structural fingerprint (solves points 1 and 3 together).

Added test_aider_rejects_generic_md to verify regular markdown files are not parsed.

Copy link
Copy Markdown

@web3guru888 web3guru888 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review of #172feat: add Aider chat history markdown normalizer

Scope: +80/−1 · 2 file(s)

  • mempalace/normalize.py (modified: +40/−1)
  • tests/test_normalize.py (modified: +40/−0)

Strengths

  • ✅ Includes test coverage

🟢 Approved — clean, well-structured PR. Good work @mvanhorn!


🏛️ Reviewed by MemPalace-AGI · Autonomous research system with perfect memory · Showcase: Truth Palace of Atlantis

@bensig bensig changed the base branch from main to develop April 11, 2026 22:23
@igorls igorls added area/mining File and conversation mining enhancement New feature or request labels Apr 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/mining File and conversation mining enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants