Feature: Local LLM Summarization via OpenAI-Compatible API (llama.cpp / Ollama / LM Studio)

> **Architectural decision TL;DR**  
> - **LLM as a *slice***: yes. Mirror `Audio` – give LLM a proper vertical slice.  
> - **Providers as infra**: all concrete LLM providers (OpenAI, Anthropic, local OpenAI-compatible, etc.) stay in `Infra.Llm`.  
> - **Setup as a feature**: Local LLM setup is a **feature slice** (or sub-slice of existing `Setup`), bootstrapped via MediatR, orchestrating infra services and scripts.

So:
- `Features/Audio` → ffmpeg, Whisper, commands/queries for recording/STT  
- `Features/Llm` → summarization, templates, config, setup flows (MediatR)  
- `Infra/Llm` → provider implementations (`OpenAiLlmProvider`, `AnthropicLlmProvider`, `LocalOpenAiCompatibleLlmProvider`)

This keeps the domain-y flows (summarization, setup UX) in slices and the “talk to remote things over HTTP / spawn processes” in infra, which is consistent with how Audio is isolated.

---

## Summary

Add a **local LLM provider** that uses an **OpenAI-compatible HTTP API** (llama.cpp server, Ollama, LM Studio, etc.) and expose it via a dedicated **LLM feature slice**.

Key points:

- Keep **Whisper.cpp** inside the `Audio` slice for STT.
- Introduce an **LLM slice** that owns summarization workflows, templates, and setup commands.
- Keep concrete LLM providers (OpenAI, Anthropic, Local/OpenAI-compatible) in `Infra.Llm`.
- Add guided **local LLM setup** wired through **MediatR** just like other feature slices.

---

## Goals

1. Promote LLM behavior to a **first-class feature slice**, not a random infra detail.
2. Add a **local LLM provider** using OpenAI-compatible HTTP APIs.
3. Provide **MediatR-based setup flows** for local LLM:
   - Detect llama.cpp (or compatible server).
   - Optionally download & configure a default GGUF model.
   - Wire config into the LLM slice.
4. Maintain **clear Infra vs Slice** boundaries:
   - Slices express “what the app does”.
   - Infra expresses “how we talk to the outside world / system”.

---

## Slices & Layout

### New/Refined Structure

```text
Features/
  Audio/
    Commands/
      RecordAudioCommand.cs
      TranscribeAudioCommand.cs
    ...
  Llm/
    Commands/
      SummarizeNoteCommand.cs
      SetupLocalLlmCommand.cs
    Queries/
      GetLlmStatusQuery.cs
    Templates/
      NoteSummaries/
        DefaultNoteSummaryTemplate.txt (or similar)
    ...
  Setup/
    Commands/
      RunInitialSetupWizardCommand.cs
      RunAudioSetupCommand.cs
      RunLocalLlmSetupCommand.cs  // orchestrates Llm.SetupLocalLlmCommand

Infra/
  Audio/
    FfmpegRecorder.cs
    WhisperCppSttProvider.cs
    OpenAiWhisperSttProvider.cs
  Llm/
    OpenAiLlmProvider.cs
    AnthropicLlmProvider.cs
    LocalOpenAiCompatibleLlmProvider.cs
    LlmProviderFactory.cs
  Scripts/
    LocalLlmModelDownloader.cs (invokes bash/pwsh or raw HTTP download)
    ProcessRunner.cs
Config/
  LlmOptions.cs
  AudioOptions.cs
  SetupOptions.cs
````

---

## Architecture Decision: Slice vs Infra

### Why LLM belongs in a slice (like Audio)

* **Audio** today:

  * Feature slice coordinates “record audio”, “transcribe audio”, etc.
  * Infra does the ugly bits (ffmpeg, Whisper.cpp CLI).
* **LLM** should mirror that:

  * Slice coordinates “summarize note”, “generate tags”, “check LLM status”, “setup local LLM”.
  * Infra does HTTP calls and process spawning for different providers.

Current smell: LLM is mostly hiding in infra as `LlmProvider` implementations. That makes summarization feel like an infra concern instead of a first-class behavior of the app.

Refactor direction:

* Lift *or* re-home LLM-specific behaviors (summarization, template selection, provider selection) into `Features/Llm`.
* Keep raw provider details (`OpenAiLlmProvider`, `AnthropicLlmProvider`, `LocalOpenAiCompatibleLlmProvider`) in `Infra.Llm`.

Result: app logic = slices; IO / external calls = infra. Symmetric with Audio.

---

## Feature Design

### 1. LLM Slice: Core Responsibilities

`Features/Llm` should own:

* **Commands/queries** describing intent:

  * `SummarizeNoteCommand`
  * `SummarizeTranscriptCommand`
  * `SetupLocalLlmCommand` (driven by Setup slice)
  * `TestLlmProviderCommand`
  * `GetLlmStatusQuery`
* **Templates & policies**:

  * Prompt templates for summaries/tags.
  * Rules for which provider to use (local vs cloud) and fallbacks.
* **Config mapping**:

  * Interpret `LlmOptions` and decide which provider to use via `ILlmProviderFactory`.

The LLM slice should **not** know:

* How HTTP is executed.
* How to spawn llama.cpp.
* How exactly OpenAI/Anthropic JSON is structured.

All that lives in infra.

---

### 2. Infra LLM Provider: `LocalOpenAiCompatibleLlmProvider`

In `Infra/Llm`:

#### Config

```csharp
public sealed class OpenAiCompatibleLocalLlmProviderConfig
{
    public string BaseUrl { get; init; } = "http://127.0.0.1:11434/v1";
    public string Model { get; init; } = "local-llama";
    public string? ApiKey { get; init; } = null;
    public int TimeoutSeconds { get; init; } = 60;
}
```

#### Provider Type Enum

```csharp
public enum LlmProviderType
{
    OpenAi,
    Anthropic,
    OpenAiCompatibleLocal
}
```

#### Provider Implementation

`LocalOpenAiCompatibleLlmProvider`:

* Implements `ILlmProvider`.
* Accepts `OpenAiCompatibleLocalLlmProviderConfig`.
* Sends OpenAI-style `POST /v1/chat/completions` (or `/v1/completions`) to `BaseUrl`.
* Used for:

  * llama.cpp server
  * LM Studio
  * Ollama (if configured in compatible mode)

This preserves your existing provider abstraction and simply adds a new variant.

---

### 3. LLM Slice: Summarization Flow

Command example:

```csharp
public sealed record SummarizeNoteCommand(Guid NoteId) : IRequest<SummarizeNoteResult>;
```

Handler responsibilities:

1. Load note & transcript from persistence.
2. Compose prompt using a template (e.g., via `INoteSummaryTemplateRenderer`).
3. Select provider (via `ILlmOrchestrator` / `ILlmProviderSelector`):

   * Try `defaultProvider`.
   * Fall back according to `LlmOptions.FallbackOrder`.
4. Call `ILlmProvider.CompleteAsync(...)`.
5. Persist summary + tags.
6. Return result.

The handler doesn’t know if it’s OpenAI, Anthropic, or local llama – it just uses the abstraction.

---

### 4. Setup Flow via MediatR

We reuse the existing pattern: Setup is a slice that orchestrates other slices via MediatR commands.

#### 4.1. Commands

**In `Features/Llm/Commands/SetupLocalLlmCommand.cs`:**

```csharp
public sealed record SetupLocalLlmCommand(bool ForceRedownload = false) : IRequest<SetupLocalLlmResult>;
```

Handler steps (LLM slice):

1. Ask an infra service to **check for llama.cpp**:

   * `ILocalLlmEnvironmentChecker` → `LocalLlmEnvironmentStatus`

     * `HasLlamaBinary`
     * `HasConfiguredModel`
     * `ConfiguredModelPath`
2. If no binary:

   * Return result indicating missing dependency (so Setup slice / CLI can show instructions).
3. If no model:

   * Emit a `SetupLocalLlmModelRequired` state for CLI to prompt user for consent to download.
4. If user consents (CLI passes a new `SetupLocalLlmCommand` with a flag or additional info):

   * Delegate to `ILocalLlmModelInstaller` (infra) to download & register the model.
5. Update `LlmOptions` (or your config storage) to point to the new model & provider.
6. Optionally run a `TestLlmProviderCommand` to verify.

**In `Features/Setup/Commands/RunLocalLlmSetupCommand.cs`:**

* Orchestrates the human-facing sequence in CLI:

  * Print diagnostics.
  * Ask for confirmation.
  * Dispatch `SetupLocalLlmCommand`.
  * React to outcome and print friendly messages.

This follows the same pattern as other feature slices that use MediatR for setup flows.

#### 4.2. Infra Services for Setup

In `Infra/Llm` (or `Infra/Scripts`):

* `LocalLlmEnvironmentChecker`:

  * Knows how to check for:

    * `llama-server` / `llama-cli` on PATH.
    * Whether a configured GGUF model file exists.
* `LocalLlmModelInstaller`:

  * Either:

    * Invokes embedded bash/powershell scripts (`ProcessRunner`).
    * Or does HTTP download directly via .NET (safer, testable).
  * Writes model to:

    * macOS: `~/Library/Application Support/ten-second-tom/models`
    * Fallback: `~/.local/share/ten-second-tom/models`
  * Returns installed path & model metadata.

This keeps all OS/process ugliness out of the LLM slice and Setup slice.

---

### 5. Homebrew + Dependency Story

Update `sirkirby/homebrew-ten-second-tom`:

* Add dependency:

  ```ruby
  depends_on "llama.cpp"
  ```
* Ensure `llama-server` / `llama-cli` is installed on PATH on macOS.

LLM setup flow:

1. `brew install sirkirby/tap/ten-second-tom`
2. `ten-second-tom` initial run:

   * Setup slice runs `RunInitialSetupWizardCommand`.
   * That wizard includes an option:

     * “Configure local LLM (llama.cpp) for on-device summarization?”
3. If accepted:

   * Dispatch `RunLocalLlmSetupCommand` → `SetupLocalLlmCommand` → `LocalLlmModelInstaller`.

---

### 6. Config Model

`LlmOptions` (in Config):

```csharp
public sealed class LlmOptions
{
    public string DefaultProvider { get; init; } = "local-llama";
    public IReadOnlyList<string> FallbackOrder { get; init; } = new[] { "local-llama", "openai", "anthropic" };

    public Dictionary<string, LlmProviderConfig> Providers { get; init; } = new();
}

public sealed class LlmProviderConfig
{
    public string Type { get; init; } = default!; // "openai", "anthropic", "openai-compatible-local", etc.
    public object RawConfig { get; init; } = default!;
}
```

The LLM slice consumes `LlmOptions`. The Infra factory turns them into concrete `ILlmProvider` instances.

---

## Testing

### LLM Slice

* `SummarizeNoteCommandHandlerTests`:

  * Uses a fake `ILlmProvider` and `ILlmProviderSelector`.
  * Asserts:

    * Correct prompt template is used.
    * Provider selection obeys `DefaultProvider` + `FallbackOrder`.
    * Summaries and tags persisted correctly.

* `SetupLocalLlmCommandHandlerTests`:

  * Fake `ILocalLlmEnvironmentChecker` + `ILocalLlmModelInstaller`.
  * Covers:

    * Missing llama binary.
    * Missing model → prompts for install.
    * Successful install updates config.
    * Failure bubbles a clear result.

### Infra

* `LocalOpenAiCompatibleLlmProviderTests`:

  * Use test HTTP server to emulate OpenAI API.
  * Validate request/response handling + error behavior.

* `LocalLlmEnvironmentCheckerTests`:

  * With `IFileSystem` / `IProcessRunner` fakes.

* `LocalLlmModelInstallerTests`:

  * Validate path selection.
  * Validate configuration update hooks.

---

## Acceptance Criteria

* [ ] **Architecture**

  * [ ] `Features/Llm` slice exists and owns summarization and LLM setup commands/queries.
  * [ ] `Infra/Llm` owns all concrete provider implementations and setup helpers.
* [ ] **Local Provider**

  * [ ] `LocalOpenAiCompatibleLlmProvider` implemented.
  * [ ] Can connect to an OpenAI-compatible local server (llama.cpp, LM Studio, Ollama).
* [ ] **Setup via MediatR**

  * [ ] `SetupLocalLlmCommand` in LLM slice.
  * [ ] `RunLocalLlmSetupCommand` in Setup slice orchestrating CLI UX.
* [ ] **Homebrew**

  * [ ] Tap updated to depend on `llama.cpp`.
* [ ] **Config**

  * [ ] `LlmOptions` supports `DefaultProvider`, `FallbackOrder`, and provider registry.
  * [ ] Local provider configuration persisted and loaded correctly.
* [ ] **Fallback Behavior**

  * [ ] When local LLM fails/unreachable, falls back according to `FallbackOrder`.
* [ ] **Docs**

  * [ ] Updated documentation explaining:

    * LLM slice responsibilities.
    * How to enable local LLM via setup.
    * Example config for llama.cpp, LM Studio, and Ollama (all via `openai-compatible-local`).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature: Local LLM Summarization via OpenAI-Compatible API (llama.cpp / Ollama / LM Studio) #52

Summary

Goals

Slices & Layout

New/Refined Structure

Architecture Decision: Slice vs Infra

Why LLM belongs in a slice (like Audio)

Feature Design

1. LLM Slice: Core Responsibilities

2. Infra LLM Provider: `LocalOpenAiCompatibleLlmProvider`

Config

Provider Type Enum

Provider Implementation

3. LLM Slice: Summarization Flow

4. Setup Flow via MediatR

4.1. Commands

4.2. Infra Services for Setup

5. Homebrew + Dependency Story

6. Config Model

Testing

LLM Slice

Infra

Acceptance Criteria

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Feature: Local LLM Summarization via OpenAI-Compatible API (llama.cpp / Ollama / LM Studio) #52

Description

Summary

Goals

Slices & Layout

New/Refined Structure

Architecture Decision: Slice vs Infra

Why LLM belongs in a slice (like Audio)

Feature Design

1. LLM Slice: Core Responsibilities

2. Infra LLM Provider: LocalOpenAiCompatibleLlmProvider

Config

Provider Type Enum

Provider Implementation

3. LLM Slice: Summarization Flow

4. Setup Flow via MediatR

4.1. Commands

4.2. Infra Services for Setup

5. Homebrew + Dependency Story

6. Config Model

Testing

LLM Slice

Infra

Acceptance Criteria

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

2. Infra LLM Provider: `LocalOpenAiCompatibleLlmProvider`