Releases · mensfeld/llm-docs-builder · GitHub

12 Nov 16:56

mensfeld

v0.12.0 Latest

Latest

[Feature] HTML to Markdown Reverse Converter — Added support for converting HTML content to markdown format.
- Enables processing of HTML documentation sources
- Integrates seamlessly with the transformer pipeline
- Useful for converting web-based docs to markdown for further processing
- By @Eric-Guo in PR #32.

Contributors

Eric-Guo

Assets 2

03 Nov 17:19

mensfeld

v0.11.0

[Feature] Transform from URL — The transform command now accepts a remote URL via --url and processes fetched content through the standard transformer pipeline.
- Example: llm-docs-builder transform --url https://example.com/docs/page.html
- Applies all configured transformations and output options identically to local files
- By @Eric-Guo and @codex in PR #28.

Contributors

Eric-Guo and oai-codex

Assets 2

28 Oct 12:38

mensfeld

v0.10.0

[Feature] llms.txt Specification Compliance - Updated output format to fully comply with the llms.txt specification from llmstxt.org.
- Metadata Format: Metadata now appears within the description field using parentheses and comma separators: - [title](url): description (tokens:450, updated:2025-10-13, priority:high)
- Optional Descriptions: Parser now correctly handles links without descriptions: - [title](url) per spec
- Multi-Section Support: Documents automatically organized into Documentation, Examples, and Optional sections based on priority
- Body Content Support: Added optional body config parameter for custom content between description and sections
- Priority-based categorization: 1-3 → Documentation, 4-5 → Examples, 6-7 → Optional
- Empty sections are automatically omitted from output
- Updated parser regex from /^[-*]\s*\[([^\]]+)\]\(([^)]+)\):\s*(.*)$/m to /^[-*]\s*\[([^\]]+)\]\(([^)]+)\)(?::\s*([^\n]*))?$/ to make descriptions optional
- Fixed multiline regex greedy matching issue that was capturing only one link per section
[Test] Added comprehensive test suite for spec compliance (8 new parser tests, 7 new generator tests)
[Docs] Updated README with multi-section organization examples and body content usage
Breaking Change: Metadata format has changed from tokens:450 updated:2025-10-13 to (tokens:450, updated:2025-10-13) for spec compliance

Assets 2

27 Oct 16:21

mensfeld

v0.9.4

[Feature] Auto-Exclude Hidden Directories - Hidden directories (starting with .) are now automatically excluded by default to prevent noise from .git, .lint, .github, etc.
- Adds include_hidden: false as default behavior
- Set include_hidden: true in config to include hidden directories if needed
- Uses Find.prune for efficient directory tree traversal
- Prevents scanning of common directories like .lint, .gh, .git, node_modules (if hidden)
- Fixed bug where root directory . was being pruned when used as docs_path
[Fix] Excludes Pattern Matching - Fixed fnmatch pattern handling for better glob pattern support.
- Fixed **/.dir/** patterns now correctly match root-level directories
- Normalized patterns ending with /** to /**/* for proper fnmatch behavior
- Handles **/ prefix matching for zero-directory cases
- Fixed relative path calculation to avoid "different prefix" errors
[Test] Added unit tests for hidden directory exclusion feature (5 tests)
[Test] Added integration tests for hidden directory behavior (3 tests)

Assets 2

27 Oct 15:23

mensfeld

v0.9.3

[Fix] Generate Command Excludes Support - The generate command now properly respects the excludes configuration option to filter out files from llms.txt generation.
- Added should_exclude? method to Generator class that matches files against glob patterns
- Supports both simple patterns (e.g., draft.md) and glob patterns (e.g., **/private/**, draft-*.md)
- Uses File.fnmatch with FNM_PATHNAME and FNM_DOTMATCH flags for proper pattern matching
- Checks patterns against both absolute and relative paths from docs_path
- Excludes configuration works consistently with bulk-transform command
[Fix] Token Count from Transformed Content - Token counts in metadata now accurately reflect the actual content after applying transformations.
- Token count is now calculated from transformed content when any transformation options are enabled
- Adds has_transformations? helper method to detect if transformations are active
- Ensures token metadata represents the actual size of processed content, not raw files
- Falls back to raw content token count when no transformations are enabled
[Fix] Boolean Config Options - Fixed config merging bug where explicitly setting transformation options to false in YAML was being overridden to true.
- Updated Config#merge_with_options to properly handle false values for boolean options
- Fixed the || true pattern that was incorrectly treating false config values as falsy
- Now correctly uses !self['option'].nil? check before falling back to defaults
- Applies to all boolean transformation options: remove_comments, normalize_whitespace, remove_badges, remove_frontmatter
[Test] Added comprehensive unit tests for excludes functionality in Generator
[Test] Added integration tests for generate command with excludes and token counting

Assets 2

17 Oct 19:10

mensfeld

v0.9.2

[Fix] Tackle one more block boundaries tracking edge-case.

Assets 2

17 Oct 19:07

mensfeld

v0.9.1

[Fix] Fixed HeadingTransformer incorrectly treating hash symbols in code blocks as headings.
- Now properly tracks code block boundaries (fenced with ``` or ~~~)
- Skips heading processing for lines inside code blocks
- Prevents Ruby/Python/Shell comments from being interpreted as markdown headings
- Added comprehensive test coverage for code block handling

Assets 2

17 Oct 14:05

mensfeld

v0.9.0

[Feature] No AI Version Detection - The compare command now detects when websites don't serve AI-optimized versions.
- Triggers when reduction is <5% (nearly identical content for human and AI User-Agents)
- Displays prominent warning: "WARNING: NO DEDICATED AI VERSION DETECTED"
- Shows potential savings estimates based on typical 83% reduction rate
- Provides page-specific calculations (estimated token savings, potential size)
- Includes implementation guide with actionable steps
- Helps identify opportunities to optimize documentation
[Enhancement] Updated OutputFormatter#display_comparison_results to include marketing message for unoptimized sites.
[Enhancement] Added utility script probe_karafka_simple.rb for batch comparison testing.

Assets 2

17 Oct 09:54

mensfeld

v0.8.2

[Fix] Fixed Docker workflow test to properly invoke help command (use generate --help instead of --help).

Assets 2

17 Oct 09:43

mensfeld

v0.8.1

[Enhancement] Ship the docker container.

Assets 2