Skip to content

fix(metering): default local-provider pricing to $0 for uncataloged models#1055

Open
AL-ZiLLA wants to merge 2 commits intoRightNow-AI:mainfrom
AL-ZiLLA:fix/local-driver-zero-cost
Open

fix(metering): default local-provider pricing to $0 for uncataloged models#1055
AL-ZiLLA wants to merge 2 commits intoRightNow-AI:mainfrom
AL-ZiLLA:fix/local-driver-zero-cost

Conversation

@AL-ZiLLA
Copy link
Copy Markdown
Contributor

Bug

estimate_cost_with_catalog falls back to (1.0, 3.0) per million tokens for any model not present in the builtin catalog. That unconditional fallback treats locally-served models — custom Ollama Modelfiles, vLLM variants, LM Studio / Lemonade / llama.cpp endpoints — as paid cloud models, even though they run on the user's own hardware and cost $0 per call.

On my deployment, two agents on a custom Ollama Modelfile (gemma4-agent:latest) were tripping the $2/hr and $8/day budget quotas with entirely fictional cost. Ledger shows 895 calls / $43.59 of phantom burn across two weeks — actual cost: $0.

Repro on main:

let catalog = ModelCatalog::new();
let cost = MeteringEngine::estimate_cost_with_catalog(
    &catalog,
    "my-custom-ollama-modelfile",
    1_000_000, 1_000_000,
);
assert_eq!(cost, 0.0); // FAILS — returns 4.0

Proof from my usage_events table:

model calls total_cost $/M tokens
gemma4:26b 390 $0.00 0.00 (catalog hit)
gemma4-agent 895 $43.59 1.01 (catalog miss → fallback)

Both are local Ollama models producing zero-cost inference. The only difference is that gemma4-agent is a user-built Modelfile alias and isn't in the builtin catalog.

Fix

Thread the provider string through estimate_cost_with_catalog and pick the fallback based on whether the provider runs inference locally:

  • Local providers (ollama, vllm, lmstudio, lm-studio, lemonade, llamacpp, llama.cpp, local) → fallback (0.0, 0.0)
  • Cloud providers → fallback unchanged at (1.0, 3.0) so an unknown cloud model surfaces a cost estimate rather than hiding it

Catalog pricing always wins if the model is registered — a known cloud model won't get silenced by a mislabeled provider hint.

Impact

  • Fixes false quota trips on custom Ollama Modelfiles and locally-served LLMs
  • Does not change pricing for any cataloged model
  • Does not change behavior for unknown cloud models (still $1/$3 fallback)

Callers updated

Three call sites in crates/openfang-kernel/src/kernel.rs now pass &manifest.model.provider. The manifest ModelConfig already carries this field (crates/openfang-types/src/agent.rs:375), so no storage or config changes are required.

Tests

Added three new tests and updated three existing ones in crates/openfang-kernel/src/metering.rs:

  • test_estimate_cost_with_catalog_unknown_local_is_free — every supported local provider string returns $0 for an unknown model; verifies case-insensitive matching
  • test_estimate_cost_with_catalog_known_model_ignores_provider_hint — catalog pricing wins over the provider hint
  • test_is_local_provider — unit test for the helper
  • Updated existing alias / catalog-hit / unknown-cloud tests to pass a provider argument

Full workspace cargo test --release passes (1,300+ tests, 0 failures). cargo clippy -p openfang-runtime -p openfang-kernel -p openfang-api -- -D warnings clean.

ALZiLLA and others added 2 commits April 14, 2026 11:36
1. CSP: x-frame-options SAMEORIGIN + frame-ancestors for localhost:3000
   (allows Command Center iframe embedding)
2. reasoning serde alias: accept both "reasoning" (Gemma 4 via Ollama)
   and "reasoning_content" (DeepSeek-R1, Qwen3) in non-streaming responses

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
…odels

estimate_cost_with_catalog previously fell back to ($1/M input, $3/M
output) for any model not in the builtin catalog. Custom local
Modelfiles — e.g. an Ollama `gemma4-agent:latest` built via the
Ollama CLI — miss the catalog and so were charged as if they were
a paid cloud model, tripping budget quotas on zero-cost inference.

Fix: thread the provider string through estimate_cost_with_catalog
and pick the fallback based on whether the provider runs inference
locally. For ollama/vllm/lmstudio/lemonade/llamacpp/local, default
to ($0, $0). Cloud providers still default to ($1, $3) so an
unknown cloud model surfaces a cost estimate rather than hiding it.

Catalog pricing always wins if the model IS registered — a known
model tagged with a local provider hint still uses catalog prices.

Added unit tests covering: local-unknown is free, cloud-unknown
uses default, known model ignores the provider hint, case-insensitive
provider matching, and every supported local-provider string.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@jaberjaber23
Copy link
Copy Markdown
Member

Thanks @AL-ZiLLA — the metering fix is genuinely useful (local-GPU users running uncataloged Ollama/vLLM model IDs were getting phantom cloud pricing in budgets). But this PR bundles three unrelated concerns and can't land as one.

Ask: please split into 3 PRs

  1. fix(metering): default local-provider pricing to $0 — the core change in crates/openfang-kernel/src/metering.rs. Double-check is_local_provider against the full set of local identifiers (ollama, llamafile, vllm, lmstudio, localai, tabby, plus custom base_url-driven "openai-compatible" aliases). This is the one we want to land first.
  2. chore(api): iframe / CSP policy update — the X-Frame-Options: SAMEORIGIN + frame-ancestors changes in crates/openfang-api/src/middleware.rs. This is a security-relevant change (clickjacking posture) and needs its own security sign-off. Relaxing frame-ancestors to http://localhost:3000 is fine for local dev but we need to be explicit about the threat model if the dashboard is ever exposed on non-localhost.
  3. fix(openai): accept reasoning / reasoning_content aliases — the serde alias in drivers/openai.rs.

CI is currently red on this branch; after splitting and rebasing on post-#1041 main the metering PR should go green quickly.

Thanks again — looking forward to landing (1) fast.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants