fix(metering): default local-provider pricing to $0 for uncataloged models#1055
Open
AL-ZiLLA wants to merge 2 commits intoRightNow-AI:mainfrom
Open
fix(metering): default local-provider pricing to $0 for uncataloged models#1055AL-ZiLLA wants to merge 2 commits intoRightNow-AI:mainfrom
AL-ZiLLA wants to merge 2 commits intoRightNow-AI:mainfrom
Conversation
1. CSP: x-frame-options SAMEORIGIN + frame-ancestors for localhost:3000 (allows Command Center iframe embedding) 2. reasoning serde alias: accept both "reasoning" (Gemma 4 via Ollama) and "reasoning_content" (DeepSeek-R1, Qwen3) in non-streaming responses Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
…odels estimate_cost_with_catalog previously fell back to ($1/M input, $3/M output) for any model not in the builtin catalog. Custom local Modelfiles — e.g. an Ollama `gemma4-agent:latest` built via the Ollama CLI — miss the catalog and so were charged as if they were a paid cloud model, tripping budget quotas on zero-cost inference. Fix: thread the provider string through estimate_cost_with_catalog and pick the fallback based on whether the provider runs inference locally. For ollama/vllm/lmstudio/lemonade/llamacpp/local, default to ($0, $0). Cloud providers still default to ($1, $3) so an unknown cloud model surfaces a cost estimate rather than hiding it. Catalog pricing always wins if the model IS registered — a known model tagged with a local provider hint still uses catalog prices. Added unit tests covering: local-unknown is free, cloud-unknown uses default, known model ignores the provider hint, case-insensitive provider matching, and every supported local-provider string. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Member
|
Thanks @AL-ZiLLA — the metering fix is genuinely useful (local-GPU users running uncataloged Ollama/vLLM model IDs were getting phantom cloud pricing in budgets). But this PR bundles three unrelated concerns and can't land as one. Ask: please split into 3 PRs
CI is currently red on this branch; after splitting and rebasing on post-#1041 Thanks again — looking forward to landing (1) fast. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Bug
estimate_cost_with_catalogfalls back to(1.0, 3.0)per million tokens for any model not present in the builtin catalog. That unconditional fallback treats locally-served models — custom Ollama Modelfiles, vLLM variants, LM Studio / Lemonade / llama.cpp endpoints — as paid cloud models, even though they run on the user's own hardware and cost $0 per call.On my deployment, two agents on a custom Ollama Modelfile (
gemma4-agent:latest) were tripping the $2/hr and $8/day budget quotas with entirely fictional cost. Ledger shows 895 calls / $43.59 of phantom burn across two weeks — actual cost: $0.Repro on
main:Proof from my usage_events table:
gemma4:26bgemma4-agentBoth are local Ollama models producing zero-cost inference. The only difference is that
gemma4-agentis a user-built Modelfile alias and isn't in the builtin catalog.Fix
Thread the provider string through
estimate_cost_with_catalogand pick the fallback based on whether the provider runs inference locally:ollama,vllm,lmstudio,lm-studio,lemonade,llamacpp,llama.cpp,local) → fallback(0.0, 0.0)(1.0, 3.0)so an unknown cloud model surfaces a cost estimate rather than hiding itCatalog pricing always wins if the model is registered — a known cloud model won't get silenced by a mislabeled provider hint.
Impact
Callers updated
Three call sites in
crates/openfang-kernel/src/kernel.rsnow pass&manifest.model.provider. The manifestModelConfigalready carries this field (crates/openfang-types/src/agent.rs:375), so no storage or config changes are required.Tests
Added three new tests and updated three existing ones in
crates/openfang-kernel/src/metering.rs:test_estimate_cost_with_catalog_unknown_local_is_free— every supported local provider string returns $0 for an unknown model; verifies case-insensitive matchingtest_estimate_cost_with_catalog_known_model_ignores_provider_hint— catalog pricing wins over the provider hinttest_is_local_provider— unit test for the helperFull workspace
cargo test --releasepasses (1,300+ tests, 0 failures).cargo clippy -p openfang-runtime -p openfang-kernel -p openfang-api -- -D warningsclean.