Feature: Output Schema Specification for Custom Templates

# RFC: Output Schema Specification for Custom Templates

**Status**: Proposed  
**Created**: 2025-10-27  
**Priority**: Medium (Foundation complete, enables future features)  
**Effort**: Large (multi-phase implementation)  
**Impact**: High (enables structured data features, improves custom template experience)

## Problem Statement

Currently, Ten Second Tom supports custom prompt templates, allowing users to define their own prompts for daily summaries and weekly reviews. However, the system has no way to understand or validate the expected output format from these custom templates.

### Current Limitations

1. **Parsing is coupled to default templates**: The `ParseDailySummary` and `ParseWeeklySummary` methods are designed around the embedded default template formats
2. **Best-effort parsing only**: With custom templates, structured parsing may fail silently, returning empty lists
3. **No contract between template and parser**: Users can't specify what structure they expect in the LLM output
4. **Limited structured data extraction**: Future features (search, analytics, insights) can't reliably access structured data from custom templates

### Why This Matters

As identified in code review, the parsing logic assumes a specific markdown format (e.g., "## Top 3 Accomplishments", "## Key Events"). When users create custom templates with different output formats, the structured parsing fails. While the raw LLM response is always saved (which is correct), we lose the ability to extract structured data for:

- Search and filtering
- Aggregations and analytics
- Cross-referencing entries
- Future ML/AI features

## Proposed Solution

Add optional **output schema specifications** to prompt templates, allowing users to define the expected structure of LLM responses.

### Design Approach

#### 1. Schema in Template YAML Front Matter

Extend the YAML front matter to include an optional `outputSchema` section:

```yaml
---
templateType: weekly
title: My Custom Weekly Review
description: A template focused on wins and learnings
version: 1.0
outputSchema:
  type: structured  # or 'freeform' for no parsing
  fields:
    - name: accomplishments
      type: list
      minItems: 1
      maxItems: 5
      sectionMarker: "## My Wins This Week"
      required: true
    - name: challenges
      type: list
      minItems: 0
      maxItems: 3
      sectionMarker: "## Areas for Improvement"
      required: false
    - name: insights
      type: list
      sectionMarker: ["## Key Insights", "## Learnings"]  # Multiple possible headers
      required: false
    - name: goals
      type: list
      sectionMarker: "## Next Week Focus"
      required: false
---

# Your prompt content here...
```

#### 2. Schema-Aware Parser

Create a new `StructuredOutputParser` that:

- Accepts an `outputSchema` configuration
- Uses the schema to guide parsing (find sections, extract items)
- Validates extracted data against schema constraints (min/max items, required fields)
- Returns parsing errors if schema validation fails

#### 3. Backward Compatibility

- Templates without `outputSchema` continue to use best-effort parsing (current behavior)
- Default embedded templates should be annotated with schemas
- Parsing failures are logged as warnings, not errors (raw response is always saved)

#### 4. Model Updates

Update `PromptTemplate` model to include:

```csharp
public record PromptTemplate
{
    // ... existing properties ...
    
    /// <summary>
    /// Optional output schema defining expected LLM response structure.
    /// </summary>
    public OutputSchema? OutputSchema { get; init; }
}

public record OutputSchema
{
    /// <summary>
    /// Schema type: 'structured' or 'freeform'
    /// </summary>
    public required string Type { get; init; }
    
    /// <summary>
    /// Field definitions for structured schemas.
    /// </summary>
    public IReadOnlyList<OutputField>? Fields { get; init; }
}

public record OutputField
{
    public required string Name { get; init; }
    public required string Type { get; init; }  // list, text, number, etc.
    public int? MinItems { get; init; }
    public int? MaxItems { get; init; }
    public object? SectionMarker { get; init; }  // string or string[]
    public bool Required { get; init; }
}
```

### Implementation Phases

#### Phase 1: Foundation (Immediate)
- ✅ **DONE**: Make parsing lenient (don't fail on empty results)
- ✅ **DONE**: Update model documentation (raw response is source of truth)
- ✅ **DONE**: Update tests to reflect lenient parsing

#### Phase 2: Schema Definition (Next)
- Define `OutputSchema` model classes
- Add schema parsing to `YamlFrontMatterParser`
- Add validation for schema structure
- Update default templates with schemas

#### Phase 3: Schema-Aware Parsing (Future)
- Create `StructuredOutputParser` class
- Integrate schema-driven parsing into handlers
- Add schema validation during template installation
- Provide helpful error messages for schema violations

#### Phase 4: Advanced Features (Long-term)
- Schema editor/validator CLI tool (`tom template validate`)
- Template testing framework (provide sample input, validate output)
- Community template repository with schema verification
- AI-powered schema generation from examples

## Success Criteria

1. **Flexibility**: Users can define custom output formats with confidence
2. **Reliability**: Structured data extraction works predictably for schemas
3. **Backward Compatibility**: Existing templates continue to work
4. **Developer Experience**: Clear documentation and helpful error messages
5. **Future-Ready**: Foundation for advanced features (search, analytics, etc.)

## Alternative Approaches Considered

### 1. Strict JSON Output Mode
Force LLM to return JSON instead of markdown.

**Pros**: Reliable parsing, no schema needed  
**Cons**: Less human-readable, requires prompt engineering, loses markdown formatting benefits

### 2. LLM-Based Extraction
Use a second LLM call to extract structured data from the first response.

**Pros**: Flexible, works with any format  
**Cons**: Expensive (double LLM calls), slower, introduces latency

### 3. Regex-Based Extraction
Use regex patterns to extract data.

**Pros**: Fast, no schema needed  
**Cons**: Brittle, hard to maintain, fails on format variations

**Recommended**: **Output schemas** provide the best balance of flexibility, reliability, and user control.

## Related Issues/PRs

- Original discussion: Code review of `CreateWeeklyReviewHandler.ParseWeeklySummary`
- Related spec: 007-improved-prompt-template-management (introduced custom templates)

## Technical Notes

- Consider using [JSON Schema](https://json-schema.org/) as inspiration for schema validation
- Schemas should be optional - not all templates need structured output
- Parsing errors should be informative but not block entry creation
- Schema validation should happen at template install time, not runtime

## Documentation Requirements

- Add "Custom Template Output Schemas" guide to docs/
- Update template examples with schema annotations  
- Add schema reference documentation
- Include troubleshooting guide for parsing issues

## Implementation Checklist

- [ ] Phase 1: Foundation (✅ Complete)
- [ ] Phase 2: Schema Definition
  - [ ] Define `OutputSchema` model classes
  - [ ] Add YAML parsing support
  - [ ] Add validation
  - [ ] Update default templates
- [ ] Phase 3: Schema-Aware Parsing
  - [ ] Create `StructuredOutputParser`
  - [ ] Integrate with handlers
  - [ ] Add error handling
- [ ] Phase 4: Advanced Features
  - [ ] Template validator CLI
  - [ ] Testing framework
  - [ ] Documentation

## Open Questions

1. Should we support multiple schema versions for template evolution?
2. How do we handle schema changes for existing entries?
3. Should schemas be strictly validated or advisory only?
4. Do we need schema migration tooling?

---

**Labels**: enhancement, templates, architecture  
**Milestone**: Future Enhancement



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature: Output Schema Specification for Custom Templates #40

RFC: Output Schema Specification for Custom Templates

Problem Statement

Current Limitations

Why This Matters

Proposed Solution

Design Approach

1. Schema in Template YAML Front Matter

2. Schema-Aware Parser

3. Backward Compatibility

4. Model Updates

Implementation Phases

Phase 1: Foundation (Immediate)

Phase 2: Schema Definition (Next)

Phase 3: Schema-Aware Parsing (Future)

Phase 4: Advanced Features (Long-term)

Success Criteria

Alternative Approaches Considered

1. Strict JSON Output Mode

2. LLM-Based Extraction

3. Regex-Based Extraction

Related Issues/PRs

Technical Notes

Documentation Requirements

Implementation Checklist

Open Questions

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Feature: Output Schema Specification for Custom Templates #40

Description

RFC: Output Schema Specification for Custom Templates

Problem Statement

Current Limitations

Why This Matters

Proposed Solution

Design Approach

1. Schema in Template YAML Front Matter

2. Schema-Aware Parser

3. Backward Compatibility

4. Model Updates

Implementation Phases

Phase 1: Foundation (Immediate)

Phase 2: Schema Definition (Next)

Phase 3: Schema-Aware Parsing (Future)

Phase 4: Advanced Features (Long-term)

Success Criteria

Alternative Approaches Considered

1. Strict JSON Output Mode

2. LLM-Based Extraction

3. Regex-Based Extraction

Related Issues/PRs

Technical Notes

Documentation Requirements

Implementation Checklist

Open Questions

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions