Skip to content

[vLLM] Add weight_dtype_overrides support to TTConfig #4365

@dgolubovicTT

Description

@dgolubovicTT

We needed weight dtype override support in vLLM plugin. In order to test LLMs through tt-inference-server we need to be able to override weight dtypes for different layers (e.g. bf16 for router weights, bfp_bf4 for expert weights, bfp_bf8 as default).

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions