Skip to content

Feature: Output Schema Specification for Custom Templates #40

@sirkirby

Description

@sirkirby

RFC: Output Schema Specification for Custom Templates

Status: Proposed
Created: 2025-10-27
Priority: Medium (Foundation complete, enables future features)
Effort: Large (multi-phase implementation)
Impact: High (enables structured data features, improves custom template experience)

Problem Statement

Currently, Ten Second Tom supports custom prompt templates, allowing users to define their own prompts for daily summaries and weekly reviews. However, the system has no way to understand or validate the expected output format from these custom templates.

Current Limitations

  1. Parsing is coupled to default templates: The ParseDailySummary and ParseWeeklySummary methods are designed around the embedded default template formats
  2. Best-effort parsing only: With custom templates, structured parsing may fail silently, returning empty lists
  3. No contract between template and parser: Users can't specify what structure they expect in the LLM output
  4. Limited structured data extraction: Future features (search, analytics, insights) can't reliably access structured data from custom templates

Why This Matters

As identified in code review, the parsing logic assumes a specific markdown format (e.g., "## Top 3 Accomplishments", "## Key Events"). When users create custom templates with different output formats, the structured parsing fails. While the raw LLM response is always saved (which is correct), we lose the ability to extract structured data for:

  • Search and filtering
  • Aggregations and analytics
  • Cross-referencing entries
  • Future ML/AI features

Proposed Solution

Add optional output schema specifications to prompt templates, allowing users to define the expected structure of LLM responses.

Design Approach

1. Schema in Template YAML Front Matter

Extend the YAML front matter to include an optional outputSchema section:

---
templateType: weekly
title: My Custom Weekly Review
description: A template focused on wins and learnings
version: 1.0
outputSchema:
  type: structured  # or 'freeform' for no parsing
  fields:
    - name: accomplishments
      type: list
      minItems: 1
      maxItems: 5
      sectionMarker: "## My Wins This Week"
      required: true
    - name: challenges
      type: list
      minItems: 0
      maxItems: 3
      sectionMarker: "## Areas for Improvement"
      required: false
    - name: insights
      type: list
      sectionMarker: ["## Key Insights", "## Learnings"]  # Multiple possible headers
      required: false
    - name: goals
      type: list
      sectionMarker: "## Next Week Focus"
      required: false
---

# Your prompt content here...

2. Schema-Aware Parser

Create a new StructuredOutputParser that:

  • Accepts an outputSchema configuration
  • Uses the schema to guide parsing (find sections, extract items)
  • Validates extracted data against schema constraints (min/max items, required fields)
  • Returns parsing errors if schema validation fails

3. Backward Compatibility

  • Templates without outputSchema continue to use best-effort parsing (current behavior)
  • Default embedded templates should be annotated with schemas
  • Parsing failures are logged as warnings, not errors (raw response is always saved)

4. Model Updates

Update PromptTemplate model to include:

public record PromptTemplate
{
    // ... existing properties ...
    
    /// <summary>
    /// Optional output schema defining expected LLM response structure.
    /// </summary>
    public OutputSchema? OutputSchema { get; init; }
}

public record OutputSchema
{
    /// <summary>
    /// Schema type: 'structured' or 'freeform'
    /// </summary>
    public required string Type { get; init; }
    
    /// <summary>
    /// Field definitions for structured schemas.
    /// </summary>
    public IReadOnlyList<OutputField>? Fields { get; init; }
}

public record OutputField
{
    public required string Name { get; init; }
    public required string Type { get; init; }  // list, text, number, etc.
    public int? MinItems { get; init; }
    public int? MaxItems { get; init; }
    public object? SectionMarker { get; init; }  // string or string[]
    public bool Required { get; init; }
}

Implementation Phases

Phase 1: Foundation (Immediate)

  • DONE: Make parsing lenient (don't fail on empty results)
  • DONE: Update model documentation (raw response is source of truth)
  • DONE: Update tests to reflect lenient parsing

Phase 2: Schema Definition (Next)

  • Define OutputSchema model classes
  • Add schema parsing to YamlFrontMatterParser
  • Add validation for schema structure
  • Update default templates with schemas

Phase 3: Schema-Aware Parsing (Future)

  • Create StructuredOutputParser class
  • Integrate schema-driven parsing into handlers
  • Add schema validation during template installation
  • Provide helpful error messages for schema violations

Phase 4: Advanced Features (Long-term)

  • Schema editor/validator CLI tool (tom template validate)
  • Template testing framework (provide sample input, validate output)
  • Community template repository with schema verification
  • AI-powered schema generation from examples

Success Criteria

  1. Flexibility: Users can define custom output formats with confidence
  2. Reliability: Structured data extraction works predictably for schemas
  3. Backward Compatibility: Existing templates continue to work
  4. Developer Experience: Clear documentation and helpful error messages
  5. Future-Ready: Foundation for advanced features (search, analytics, etc.)

Alternative Approaches Considered

1. Strict JSON Output Mode

Force LLM to return JSON instead of markdown.

Pros: Reliable parsing, no schema needed
Cons: Less human-readable, requires prompt engineering, loses markdown formatting benefits

2. LLM-Based Extraction

Use a second LLM call to extract structured data from the first response.

Pros: Flexible, works with any format
Cons: Expensive (double LLM calls), slower, introduces latency

3. Regex-Based Extraction

Use regex patterns to extract data.

Pros: Fast, no schema needed
Cons: Brittle, hard to maintain, fails on format variations

Recommended: Output schemas provide the best balance of flexibility, reliability, and user control.

Related Issues/PRs

  • Original discussion: Code review of CreateWeeklyReviewHandler.ParseWeeklySummary
  • Related spec: 007-improved-prompt-template-management (introduced custom templates)

Technical Notes

  • Consider using JSON Schema as inspiration for schema validation
  • Schemas should be optional - not all templates need structured output
  • Parsing errors should be informative but not block entry creation
  • Schema validation should happen at template install time, not runtime

Documentation Requirements

  • Add "Custom Template Output Schemas" guide to docs/
  • Update template examples with schema annotations
  • Add schema reference documentation
  • Include troubleshooting guide for parsing issues

Implementation Checklist

  • Phase 1: Foundation (✅ Complete)
  • Phase 2: Schema Definition
    • Define OutputSchema model classes
    • Add YAML parsing support
    • Add validation
    • Update default templates
  • Phase 3: Schema-Aware Parsing
    • Create StructuredOutputParser
    • Integrate with handlers
    • Add error handling
  • Phase 4: Advanced Features
    • Template validator CLI
    • Testing framework
    • Documentation

Open Questions

  1. Should we support multiple schema versions for template evolution?
  2. How do we handle schema changes for existing entries?
  3. Should schemas be strictly validated or advisory only?
  4. Do we need schema migration tooling?

Labels: enhancement, templates, architecture
Milestone: Future Enhancement

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions