vLLM is going to add a command line parameter --global-cache-hit-threshold. Need to support it in the simulator too.
The new feature should be in sync with --enable-kvcache, so that the threshold value is considered only if kvcache support in enabled in the simulator.
Add logic that checks a real threshold requirements satisfaction and returns finish reason accordingly. For request with a cached prefix long enough, run a real request process.
Pay attention to the request doRemoteDecode field - this means this is prefill execution, don't check the threshold.
vLLM is going to add a command line parameter --global-cache-hit-threshold. Need to support it in the simulator too.
The new feature should be in sync with --enable-kvcache, so that the threshold value is considered only if kvcache support in enabled in the simulator.
Add logic that checks a real threshold requirements satisfaction and returns finish reason accordingly. For request with a cached prefix long enough, run a real request process.
Pay attention to the request
doRemoteDecodefield - this means this is prefill execution, don't check the threshold.