Add support for cache hit threshold

vLLM is going to add a command line parameter --global-cache-hit-threshold. Need to support it in the simulator too.
The new feature should be in sync with --enable-kvcache, so that the threshold value is considered only if kvcache support in enabled in the simulator.
Add logic that checks a real threshold requirements satisfaction and returns finish reason accordingly. For request with a cached prefix long enough, run a real request process.
Pay attention to the request `doRemoteDecode` field - this means this is prefill execution, don't check the threshold. 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for cache hit threshold #300

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add support for cache hit threshold #300

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions