Skip to content

Add support for cache hit threshold #300

@mayabar

Description

@mayabar

vLLM is going to add a command line parameter --global-cache-hit-threshold. Need to support it in the simulator too.
The new feature should be in sync with --enable-kvcache, so that the threshold value is considered only if kvcache support in enabled in the simulator.
Add logic that checks a real threshold requirements satisfaction and returns finish reason accordingly. For request with a cached prefix long enough, run a real request process.
Pay attention to the request doRemoteDecode field - this means this is prefill execution, don't check the threshold.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions