Skip to content

Add support for max_model_len and gpu_memory_utilization in OrpheusModel#293

Open
BorisFaj wants to merge 1 commit intocanopyai:mainfrom
BorisFaj:fix/issue-290-max_model_len
Open

Add support for max_model_len and gpu_memory_utilization in OrpheusModel#293
BorisFaj wants to merge 1 commit intocanopyai:mainfrom
BorisFaj:fix/issue-290-max_model_len

Conversation

@BorisFaj
Copy link
Copy Markdown

@BorisFaj BorisFaj commented Dec 4, 2025

Summary

This PR adds support for passing max_model_len and gpu_memory_utilization to OrpheusModel, matching what the README already documents.
These values are now stored in the class and forwarded to AsyncEngineArgs when initializing the vLLM engine.

Additionally, a small formatting cleanup was applied to engine_class.py without changing any other logic.

Motivation

The README examples use:

OrpheusModel(
    model_name="...",
    max_model_len=2048,
    gpu_memory_utilization=0.8,
)

…but the current implementation of OrpheusModel does not accept these parameters and does not forward them to the underlying vLLM engine. As a result, users cannot control context length or VRAM usage, and the examples fail unless the library is patched locally.

This PR brings the implementation in line with the documented API.

Changes

Added max_model_len and gpu_memory_utilization as optional parameters in OrpheusModel.__init__.

Passed these parameters into AsyncEngineArgs inside _setup_engine.

Minor formatting adjustments in engine_class.py (no behavioral changes).

Testing

No automated tests were added or modified.
The behavior was verified manually by running Orpheus with custom max_model_len and gpu_memory_utilization values and ensuring that vLLM receives and applies them correctly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant