Skip to content

Feature: Local LLM Summarization via OpenAI-Compatible API (llama.cpp / Ollama / LM Studio) #52

@sirkirby

Description

@sirkirby

Architectural decision TL;DR

  • LLM as a slice: yes. Mirror Audio – give LLM a proper vertical slice.
  • Providers as infra: all concrete LLM providers (OpenAI, Anthropic, local OpenAI-compatible, etc.) stay in Infra.Llm.
  • Setup as a feature: Local LLM setup is a feature slice (or sub-slice of existing Setup), bootstrapped via MediatR, orchestrating infra services and scripts.

So:

  • Features/Audio → ffmpeg, Whisper, commands/queries for recording/STT
  • Features/Llm → summarization, templates, config, setup flows (MediatR)
  • Infra/Llm → provider implementations (OpenAiLlmProvider, AnthropicLlmProvider, LocalOpenAiCompatibleLlmProvider)

This keeps the domain-y flows (summarization, setup UX) in slices and the “talk to remote things over HTTP / spawn processes” in infra, which is consistent with how Audio is isolated.


Summary

Add a local LLM provider that uses an OpenAI-compatible HTTP API (llama.cpp server, Ollama, LM Studio, etc.) and expose it via a dedicated LLM feature slice.

Key points:

  • Keep Whisper.cpp inside the Audio slice for STT.
  • Introduce an LLM slice that owns summarization workflows, templates, and setup commands.
  • Keep concrete LLM providers (OpenAI, Anthropic, Local/OpenAI-compatible) in Infra.Llm.
  • Add guided local LLM setup wired through MediatR just like other feature slices.

Goals

  1. Promote LLM behavior to a first-class feature slice, not a random infra detail.
  2. Add a local LLM provider using OpenAI-compatible HTTP APIs.
  3. Provide MediatR-based setup flows for local LLM:
    • Detect llama.cpp (or compatible server).
    • Optionally download & configure a default GGUF model.
    • Wire config into the LLM slice.
  4. Maintain clear Infra vs Slice boundaries:
    • Slices express “what the app does”.
    • Infra expresses “how we talk to the outside world / system”.

Slices & Layout

New/Refined Structure

Features/
  Audio/
    Commands/
      RecordAudioCommand.cs
      TranscribeAudioCommand.cs
    ...
  Llm/
    Commands/
      SummarizeNoteCommand.cs
      SetupLocalLlmCommand.cs
    Queries/
      GetLlmStatusQuery.cs
    Templates/
      NoteSummaries/
        DefaultNoteSummaryTemplate.txt (or similar)
    ...
  Setup/
    Commands/
      RunInitialSetupWizardCommand.cs
      RunAudioSetupCommand.cs
      RunLocalLlmSetupCommand.cs  // orchestrates Llm.SetupLocalLlmCommand

Infra/
  Audio/
    FfmpegRecorder.cs
    WhisperCppSttProvider.cs
    OpenAiWhisperSttProvider.cs
  Llm/
    OpenAiLlmProvider.cs
    AnthropicLlmProvider.cs
    LocalOpenAiCompatibleLlmProvider.cs
    LlmProviderFactory.cs
  Scripts/
    LocalLlmModelDownloader.cs (invokes bash/pwsh or raw HTTP download)
    ProcessRunner.cs
Config/
  LlmOptions.cs
  AudioOptions.cs
  SetupOptions.cs

Architecture Decision: Slice vs Infra

Why LLM belongs in a slice (like Audio)

  • Audio today:

    • Feature slice coordinates “record audio”, “transcribe audio”, etc.
    • Infra does the ugly bits (ffmpeg, Whisper.cpp CLI).
  • LLM should mirror that:

    • Slice coordinates “summarize note”, “generate tags”, “check LLM status”, “setup local LLM”.
    • Infra does HTTP calls and process spawning for different providers.

Current smell: LLM is mostly hiding in infra as LlmProvider implementations. That makes summarization feel like an infra concern instead of a first-class behavior of the app.

Refactor direction:

  • Lift or re-home LLM-specific behaviors (summarization, template selection, provider selection) into Features/Llm.
  • Keep raw provider details (OpenAiLlmProvider, AnthropicLlmProvider, LocalOpenAiCompatibleLlmProvider) in Infra.Llm.

Result: app logic = slices; IO / external calls = infra. Symmetric with Audio.


Feature Design

1. LLM Slice: Core Responsibilities

Features/Llm should own:

  • Commands/queries describing intent:

    • SummarizeNoteCommand
    • SummarizeTranscriptCommand
    • SetupLocalLlmCommand (driven by Setup slice)
    • TestLlmProviderCommand
    • GetLlmStatusQuery
  • Templates & policies:

    • Prompt templates for summaries/tags.
    • Rules for which provider to use (local vs cloud) and fallbacks.
  • Config mapping:

    • Interpret LlmOptions and decide which provider to use via ILlmProviderFactory.

The LLM slice should not know:

  • How HTTP is executed.
  • How to spawn llama.cpp.
  • How exactly OpenAI/Anthropic JSON is structured.

All that lives in infra.


2. Infra LLM Provider: LocalOpenAiCompatibleLlmProvider

In Infra/Llm:

Config

public sealed class OpenAiCompatibleLocalLlmProviderConfig
{
    public string BaseUrl { get; init; } = "http://127.0.0.1:11434/v1";
    public string Model { get; init; } = "local-llama";
    public string? ApiKey { get; init; } = null;
    public int TimeoutSeconds { get; init; } = 60;
}

Provider Type Enum

public enum LlmProviderType
{
    OpenAi,
    Anthropic,
    OpenAiCompatibleLocal
}

Provider Implementation

LocalOpenAiCompatibleLlmProvider:

  • Implements ILlmProvider.

  • Accepts OpenAiCompatibleLocalLlmProviderConfig.

  • Sends OpenAI-style POST /v1/chat/completions (or /v1/completions) to BaseUrl.

  • Used for:

    • llama.cpp server
    • LM Studio
    • Ollama (if configured in compatible mode)

This preserves your existing provider abstraction and simply adds a new variant.


3. LLM Slice: Summarization Flow

Command example:

public sealed record SummarizeNoteCommand(Guid NoteId) : IRequest<SummarizeNoteResult>;

Handler responsibilities:

  1. Load note & transcript from persistence.

  2. Compose prompt using a template (e.g., via INoteSummaryTemplateRenderer).

  3. Select provider (via ILlmOrchestrator / ILlmProviderSelector):

    • Try defaultProvider.
    • Fall back according to LlmOptions.FallbackOrder.
  4. Call ILlmProvider.CompleteAsync(...).

  5. Persist summary + tags.

  6. Return result.

The handler doesn’t know if it’s OpenAI, Anthropic, or local llama – it just uses the abstraction.


4. Setup Flow via MediatR

We reuse the existing pattern: Setup is a slice that orchestrates other slices via MediatR commands.

4.1. Commands

In Features/Llm/Commands/SetupLocalLlmCommand.cs:

public sealed record SetupLocalLlmCommand(bool ForceRedownload = false) : IRequest<SetupLocalLlmResult>;

Handler steps (LLM slice):

  1. Ask an infra service to check for llama.cpp:

    • ILocalLlmEnvironmentCheckerLocalLlmEnvironmentStatus

      • HasLlamaBinary
      • HasConfiguredModel
      • ConfiguredModelPath
  2. If no binary:

    • Return result indicating missing dependency (so Setup slice / CLI can show instructions).
  3. If no model:

    • Emit a SetupLocalLlmModelRequired state for CLI to prompt user for consent to download.
  4. If user consents (CLI passes a new SetupLocalLlmCommand with a flag or additional info):

    • Delegate to ILocalLlmModelInstaller (infra) to download & register the model.
  5. Update LlmOptions (or your config storage) to point to the new model & provider.

  6. Optionally run a TestLlmProviderCommand to verify.

In Features/Setup/Commands/RunLocalLlmSetupCommand.cs:

  • Orchestrates the human-facing sequence in CLI:

    • Print diagnostics.
    • Ask for confirmation.
    • Dispatch SetupLocalLlmCommand.
    • React to outcome and print friendly messages.

This follows the same pattern as other feature slices that use MediatR for setup flows.

4.2. Infra Services for Setup

In Infra/Llm (or Infra/Scripts):

  • LocalLlmEnvironmentChecker:

    • Knows how to check for:

      • llama-server / llama-cli on PATH.
      • Whether a configured GGUF model file exists.
  • LocalLlmModelInstaller:

    • Either:

      • Invokes embedded bash/powershell scripts (ProcessRunner).
      • Or does HTTP download directly via .NET (safer, testable).
    • Writes model to:

      • macOS: ~/Library/Application Support/ten-second-tom/models
      • Fallback: ~/.local/share/ten-second-tom/models
    • Returns installed path & model metadata.

This keeps all OS/process ugliness out of the LLM slice and Setup slice.


5. Homebrew + Dependency Story

Update sirkirby/homebrew-ten-second-tom:

  • Add dependency:

    depends_on "llama.cpp"
  • Ensure llama-server / llama-cli is installed on PATH on macOS.

LLM setup flow:

  1. brew install sirkirby/tap/ten-second-tom

  2. ten-second-tom initial run:

    • Setup slice runs RunInitialSetupWizardCommand.

    • That wizard includes an option:

      • “Configure local LLM (llama.cpp) for on-device summarization?”
  3. If accepted:

    • Dispatch RunLocalLlmSetupCommandSetupLocalLlmCommandLocalLlmModelInstaller.

6. Config Model

LlmOptions (in Config):

public sealed class LlmOptions
{
    public string DefaultProvider { get; init; } = "local-llama";
    public IReadOnlyList<string> FallbackOrder { get; init; } = new[] { "local-llama", "openai", "anthropic" };

    public Dictionary<string, LlmProviderConfig> Providers { get; init; } = new();
}

public sealed class LlmProviderConfig
{
    public string Type { get; init; } = default!; // "openai", "anthropic", "openai-compatible-local", etc.
    public object RawConfig { get; init; } = default!;
}

The LLM slice consumes LlmOptions. The Infra factory turns them into concrete ILlmProvider instances.


Testing

LLM Slice

  • SummarizeNoteCommandHandlerTests:

    • Uses a fake ILlmProvider and ILlmProviderSelector.

    • Asserts:

      • Correct prompt template is used.
      • Provider selection obeys DefaultProvider + FallbackOrder.
      • Summaries and tags persisted correctly.
  • SetupLocalLlmCommandHandlerTests:

    • Fake ILocalLlmEnvironmentChecker + ILocalLlmModelInstaller.

    • Covers:

      • Missing llama binary.
      • Missing model → prompts for install.
      • Successful install updates config.
      • Failure bubbles a clear result.

Infra

  • LocalOpenAiCompatibleLlmProviderTests:

    • Use test HTTP server to emulate OpenAI API.
    • Validate request/response handling + error behavior.
  • LocalLlmEnvironmentCheckerTests:

    • With IFileSystem / IProcessRunner fakes.
  • LocalLlmModelInstallerTests:

    • Validate path selection.
    • Validate configuration update hooks.

Acceptance Criteria

  • Architecture

    • Features/Llm slice exists and owns summarization and LLM setup commands/queries.
    • Infra/Llm owns all concrete provider implementations and setup helpers.
  • Local Provider

    • LocalOpenAiCompatibleLlmProvider implemented.
    • Can connect to an OpenAI-compatible local server (llama.cpp, LM Studio, Ollama).
  • Setup via MediatR

    • SetupLocalLlmCommand in LLM slice.
    • RunLocalLlmSetupCommand in Setup slice orchestrating CLI UX.
  • Homebrew

    • Tap updated to depend on llama.cpp.
  • Config

    • LlmOptions supports DefaultProvider, FallbackOrder, and provider registry.
    • Local provider configuration persisted and loaded correctly.
  • Fallback Behavior

    • When local LLM fails/unreachable, falls back according to FallbackOrder.
  • Docs

    • Updated documentation explaining:

      • LLM slice responsibilities.
      • How to enable local LLM via setup.
      • Example config for llama.cpp, LM Studio, and Ollama (all via openai-compatible-local).

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions