A flexible, pattern-based content extraction and generation system for creating course materials from markdown source files.
The Manifest-Driven Content Generator extracts specific content from markdown files based on configurable patterns and scopes, then inserts that content into templates to generate output documents.
Key Features:
- ✅ Pattern-based content extraction (find content by markdown heading)
- ✅ Flexible scope filters (bullets, text, code, or everything)
- ✅ Multiple output formats (markdown-list, plain-list, inline)
- ✅ Template-based document generation
- ✅ Comprehensive validation with helpful error messages
- ✅ Multiple source files support
- ✅ Markdown formatting control (keep or strip)
Create manifest.yml:
primary_source: "../merge-markdown/merged/guide/Guide Template.md"
documents:
output/website.md:
template: website.md
extractions:
objectives:
pattern: "#### Objectives"
scope: bullets
topics:
pattern: "## "
scope: self_plainCreate website.md with placeholders:
# Course Title
## Objectives
//Generated by content-generator
//objectives
## Topics
//Generated by content-generator
//topics# Validate configuration
node manifest-generator.js --validate
# Generate content
node manifest-generator.jsPattern → Collect until Next Heading → Filter by Scope → Format → Insert
- Pattern - Defines WHERE to start collecting (e.g., "#### Objectives")
- Scope - Defines WHAT TYPE of content to collect (e.g., bullets, text)
- Collection - Automatically stops at the next heading of any level
- Format - How to output the collected items (markdown-list, plain-list, inline)
- Insert - Replaces template placeholders with generated content
Source Document:
# Module 1: Introduction
#### Objectives
- Learn content modeling
- Understand GraphQL
## Activity 1-1
# Module 2: Advanced
#### Objectives
- Build complex queriesManifest Configuration:
extractions:
objectives:
pattern: "#### Objectives"
scope: bulletsResult:
- Learn content modeling
- Understand GraphQL
- Build complex queriesAll objectives from all modules are combined into one list!
primary_source: "path/to/source.md" # Optional: default source for all extractions
documents:
output/document.md: # Output file path
template: template.md # Template file with placeholders
extractions:
key_name: # Must match placeholder key in template
pattern: "## " # What to search for
scope: self_plain # What type of content to collect
format: markdown-list # Optional: output format
source: custom-source.md # Optional: override primary_sourceString to match at the start of a line (case-sensitive).
Examples:
"## "- Matches any H2 heading"#### Objectives"- Matches exact H4 heading "Objectives""# Module"- Matches H1 headings starting with "Module"
Defines what type of content to collect from the pattern match until the next heading.
Options:
| Scope | Collects | Formatting |
|---|---|---|
self |
Just the matched line | Keeps markdown |
self_plain |
Just the matched line | Strips markdown |
bullets |
All bullet points (- or * ) |
Content only |
bullets_plain |
All bullet points | Stripped |
text |
Paragraph text only | Keeps markdown |
text_plain |
Paragraph text only | Stripped |
code |
Code blocks | With ``` markers |
code_plain |
Code blocks | Without ``` |
all |
Everything | Keeps markdown |
all_plain |
Everything | Stripped |
See SCOPE_TYPES.md for detailed examples of each scope type.
How to format the output.
Options:
markdown-list- Output as markdown bullets:- Item 1\n- Item 2plain-list- One item per line:Item 1\nItem 2inline- Comma-separated:Item 1, Item 2
Override the primary_source for this specific extraction.
extractions:
special_content:
source: custom-file.md # Use different source for this extraction
pattern: "## "
scope: self_plainTemplates use a two-line comment pattern to mark where content should be inserted:
## Section Title
//Generated by content-generator
//extraction_key- Line 1:
//Generated by content-generator(marker) - Line 2:
//extraction_key(must match key in manifest)
Both lines are replaced with the generated content.
Example Template:
# Course Overview
## What You'll Learn
//Generated by content-generator
//objectives
## Course Topics
//Generated by content-generator
//topics
## Additional Info
This section is manually written.node manifest-generator.jsUses manifest.yml in the current directory.
node manifest-generator.js --manifest path/to/config.ymlnode manifest-generator.js --validateChecks configuration without generating files.
node manifest-generator.js --helpThe generator performs comprehensive validation before generating content:
- Valid YAML syntax
- Required keys present
- Valid scope and format values
- Primary source exists
- Template files exist
- Custom source files exist
- Placeholders have matching extractions
- Extractions have matching placeholders
- Proper placeholder format
See VALIDATION.md for complete validation rules and error messages.
extractions:
modules:
pattern: "# Module"
scope: self_plainextractions:
objectives:
pattern: "#### Objectives"
scope: bulletsextractions:
activities:
pattern: "## Activity"
scope: self_plainextractions:
code_samples:
pattern: "## Code Example"
scope: codeextractions:
intro:
pattern: "## Introduction"
scope: textGenerate multiple output files from one manifest:
primary_source: "merged.md"
documents:
output/website.md:
template: website-template.md
extractions:
objectives:
pattern: "#### Objectives"
scope: bullets
topics:
pattern: "## "
scope: self_plain
output/agenda.md:
template: agenda-template.md
extractions:
modules:
pattern: "# Module"
scope: self_plain
output/code-samples.md:
template: code-template.md
extractions:
examples:
pattern: "## Example"
scope: codePossible Causes:
- Pattern doesn't match any lines (check case-sensitivity)
- Scope doesn't match content type (bullets vs text)
- Content is immediately followed by heading (no content to collect)
Solution:
- Verify pattern matches lines in source file
- Check that scope type is appropriate
- Ensure there's content between pattern and next heading
Possible Causes:
- Pattern too broad (e.g.,
"## "matches ALL H2) - Wrong scope type
- Collection includes content from sub-sections
Solution:
- Make pattern more specific (e.g.,
"## Activity") - Use appropriate scope (bullets for lists, text for paragraphs)
- Remember: collection stops at ANY heading level
Possible Causes:
- Placeholder key doesn't match extraction key
- Missing
//Generated by content-generatorline - Template file path incorrect
Solution:
- Ensure keys match exactly (case-sensitive)
- Use proper two-line placeholder format
- Verify template path in manifest
Extract objectives and topics for a course website:
primary_source: "../merged/guide/Course.md"
documents:
output/website.md:
template: website.md
extractions:
objectives:
pattern: "#### Objectives"
scope: bullets
topics:
pattern: "## Activity"
scope: self_plainExtract just module titles:
primary_source: "guide.md"
documents:
output/agenda.md:
template: agenda-template.md
extractions:
modules:
pattern: "# Module"
scope: self_plain
format: markdown-listExtract all code examples:
primary_source: "guide.md"
documents:
output/code-samples.md:
template: code-template.md
extractions:
examples:
pattern: "## Code Example"
scope: codemanifest-generator.js- Main entry pointextractor.js- Content extraction enginetemplate-processor.js- Template processingmanifest.yml- Configuration fileSCOPE_TYPES.md- Scope type referenceVALIDATION.md- Validation rules reference
{
"js-yaml": "^4.1.0"
}Install with:
npm installWhen adding new scope types:
- Add detection function to
extractor.js - Add case to
extractByScopeswitch statement - Document in
SCOPE_TYPES.md - Add validation to
validateManifest - Add example to this README
- Regex pattern support
- Custom boundary markers (not just headings)
- Content filters (exclude lines with TODO)
- Content transformations (uppercase, title case)
- Multiple patterns per extraction (OR logic)
- Conditional extractions based on markers
- Plugin system for custom scope types
See repository root for license information.