Skip to content

Latest commit

 

History

History
490 lines (356 loc) · 10.2 KB

File metadata and controls

490 lines (356 loc) · 10.2 KB

Manifest-Driven Content Generator

A flexible, pattern-based content extraction and generation system for creating course materials from markdown source files.

Overview

The Manifest-Driven Content Generator extracts specific content from markdown files based on configurable patterns and scopes, then inserts that content into templates to generate output documents.

Key Features:

  • ✅ Pattern-based content extraction (find content by markdown heading)
  • ✅ Flexible scope filters (bullets, text, code, or everything)
  • ✅ Multiple output formats (markdown-list, plain-list, inline)
  • ✅ Template-based document generation
  • ✅ Comprehensive validation with helpful error messages
  • ✅ Multiple source files support
  • ✅ Markdown formatting control (keep or strip)

Quick Start

1. Create a Manifest

Create manifest.yml:

primary_source: "../merge-markdown/merged/guide/Guide Template.md"

documents:
  output/website.md:
    template: website.md
    extractions:
      objectives:
        pattern: "#### Objectives"
        scope: bullets
      topics:
        pattern: "## "
        scope: self_plain

2. Create a Template

Create website.md with placeholders:

# Course Title

## Objectives

//Generated by content-generator
//objectives

## Topics

//Generated by content-generator
//topics

3. Generate Content

# Validate configuration
node manifest-generator.js --validate

# Generate content
node manifest-generator.js

How It Works

The Core Concept

Pattern → Collect until Next Heading → Filter by Scope → Format → Insert
  1. Pattern - Defines WHERE to start collecting (e.g., "#### Objectives")
  2. Scope - Defines WHAT TYPE of content to collect (e.g., bullets, text)
  3. Collection - Automatically stops at the next heading of any level
  4. Format - How to output the collected items (markdown-list, plain-list, inline)
  5. Insert - Replaces template placeholders with generated content

Example

Source Document:

# Module 1: Introduction

#### Objectives
- Learn content modeling
- Understand GraphQL

## Activity 1-1

# Module 2: Advanced

#### Objectives
- Build complex queries

Manifest Configuration:

extractions:
  objectives:
    pattern: "#### Objectives"
    scope: bullets

Result:

- Learn content modeling
- Understand GraphQL
- Build complex queries

All objectives from all modules are combined into one list!

Manifest Configuration

Structure

primary_source: "path/to/source.md"  # Optional: default source for all extractions

documents:
  output/document.md:                # Output file path
    template: template.md            # Template file with placeholders
    extractions:
      key_name:                      # Must match placeholder key in template
        pattern: "## "               # What to search for
        scope: self_plain            # What type of content to collect
        format: markdown-list        # Optional: output format
        source: custom-source.md     # Optional: override primary_source

Parameters

pattern (required)

String to match at the start of a line (case-sensitive).

Examples:

  • "## " - Matches any H2 heading
  • "#### Objectives" - Matches exact H4 heading "Objectives"
  • "# Module" - Matches H1 headings starting with "Module"

scope (required)

Defines what type of content to collect from the pattern match until the next heading.

Options:

Scope Collects Formatting
self Just the matched line Keeps markdown
self_plain Just the matched line Strips markdown
bullets All bullet points (- or * ) Content only
bullets_plain All bullet points Stripped
text Paragraph text only Keeps markdown
text_plain Paragraph text only Stripped
code Code blocks With ``` markers
code_plain Code blocks Without ```
all Everything Keeps markdown
all_plain Everything Stripped

See SCOPE_TYPES.md for detailed examples of each scope type.

format (optional, default: markdown-list)

How to format the output.

Options:

  • markdown-list - Output as markdown bullets: - Item 1\n- Item 2
  • plain-list - One item per line: Item 1\nItem 2
  • inline - Comma-separated: Item 1, Item 2

source (optional)

Override the primary_source for this specific extraction.

extractions:
  special_content:
    source: custom-file.md  # Use different source for this extraction
    pattern: "## "
    scope: self_plain

Template Format

Templates use a two-line comment pattern to mark where content should be inserted:

## Section Title

//Generated by content-generator
//extraction_key
  • Line 1: //Generated by content-generator (marker)
  • Line 2: //extraction_key (must match key in manifest)

Both lines are replaced with the generated content.

Example Template:

# Course Overview

## What You'll Learn

//Generated by content-generator
//objectives

## Course Topics

//Generated by content-generator
//topics

## Additional Info

This section is manually written.

Usage

Basic Generation

node manifest-generator.js

Uses manifest.yml in the current directory.

Custom Manifest

node manifest-generator.js --manifest path/to/config.yml

Validation Only

node manifest-generator.js --validate

Checks configuration without generating files.

Help

node manifest-generator.js --help

Validation

The generator performs comprehensive validation before generating content:

Phase 1: Manifest Structure

  • Valid YAML syntax
  • Required keys present
  • Valid scope and format values

Phase 2: File Existence

  • Primary source exists
  • Template files exist
  • Custom source files exist

Phase 3: Template Matching

  • Placeholders have matching extractions
  • Extractions have matching placeholders
  • Proper placeholder format

See VALIDATION.md for complete validation rules and error messages.

Common Use Cases

Extract Module Titles

extractions:
  modules:
    pattern: "# Module"
    scope: self_plain

Extract All Objectives

extractions:
  objectives:
    pattern: "#### Objectives"
    scope: bullets

Extract Activity Titles

extractions:
  activities:
    pattern: "## Activity"
    scope: self_plain

Extract Code Examples

extractions:
  code_samples:
    pattern: "## Code Example"
    scope: code

Extract Introduction Text

extractions:
  intro:
    pattern: "## Introduction"
    scope: text

Multiple Documents

Generate multiple output files from one manifest:

primary_source: "merged.md"

documents:
  output/website.md:
    template: website-template.md
    extractions:
      objectives:
        pattern: "#### Objectives"
        scope: bullets
      topics:
        pattern: "## "
        scope: self_plain
  
  output/agenda.md:
    template: agenda-template.md
    extractions:
      modules:
        pattern: "# Module"
        scope: self_plain
  
  output/code-samples.md:
    template: code-template.md
    extractions:
      examples:
        pattern: "## Example"
        scope: code

Troubleshooting

No Content Generated

Possible Causes:

  1. Pattern doesn't match any lines (check case-sensitivity)
  2. Scope doesn't match content type (bullets vs text)
  3. Content is immediately followed by heading (no content to collect)

Solution:

  • Verify pattern matches lines in source file
  • Check that scope type is appropriate
  • Ensure there's content between pattern and next heading

Wrong Content Extracted

Possible Causes:

  1. Pattern too broad (e.g., "## " matches ALL H2)
  2. Wrong scope type
  3. Collection includes content from sub-sections

Solution:

  • Make pattern more specific (e.g., "## Activity")
  • Use appropriate scope (bullets for lists, text for paragraphs)
  • Remember: collection stops at ANY heading level

Content Not Inserted

Possible Causes:

  1. Placeholder key doesn't match extraction key
  2. Missing //Generated by content-generator line
  3. Template file path incorrect

Solution:

  • Ensure keys match exactly (case-sensitive)
  • Use proper two-line placeholder format
  • Verify template path in manifest

Examples

Website Metadata Document

Extract objectives and topics for a course website:

primary_source: "../merged/guide/Course.md"

documents:
  output/website.md:
    template: website.md
    extractions:
      objectives:
        pattern: "#### Objectives"
        scope: bullets
      topics:
        pattern: "## Activity"
        scope: self_plain

Course Agenda

Extract just module titles:

primary_source: "guide.md"

documents:
  output/agenda.md:
    template: agenda-template.md
    extractions:
      modules:
        pattern: "# Module"
        scope: self_plain
        format: markdown-list

Code Samples Collection

Extract all code examples:

primary_source: "guide.md"

documents:
  output/code-samples.md:
    template: code-template.md
    extractions:
      examples:
        pattern: "## Code Example"
        scope: code

Files

  • manifest-generator.js - Main entry point
  • extractor.js - Content extraction engine
  • template-processor.js - Template processing
  • manifest.yml - Configuration file
  • SCOPE_TYPES.md - Scope type reference
  • VALIDATION.md - Validation rules reference

Dependencies

{
  "js-yaml": "^4.1.0"
}

Install with:

npm install

Contributing

When adding new scope types:

  1. Add detection function to extractor.js
  2. Add case to extractByScope switch statement
  3. Document in SCOPE_TYPES.md
  4. Add validation to validateManifest
  5. Add example to this README

Future Enhancements

  • Regex pattern support
  • Custom boundary markers (not just headings)
  • Content filters (exclude lines with TODO)
  • Content transformations (uppercase, title case)
  • Multiple patterns per extraction (OR logic)
  • Conditional extractions based on markers
  • Plugin system for custom scope types

License

See repository root for license information.