We needed weight dtype override support in vLLM plugin. In order to test LLMs through tt-inference-server we need to be able to override weight dtypes for different layers (e.g. bf16 for router weights, bfp_bf4 for expert weights, bfp_bf8 as default).
We needed weight dtype override support in vLLM plugin. In order to test LLMs through tt-inference-server we need to be able to override weight dtypes for different layers (e.g. bf16 for router weights, bfp_bf4 for expert weights, bfp_bf8 as default).