Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
c22f795
consolidating qwen runtimes and models
YouNeedCryDear Mar 4, 2026
4d73735
increase model size support to 122B
YouNeedCryDear Mar 4, 2026
d6a9a84
add fp8 tp8 runtime for 397B model
YouNeedCryDear Mar 4, 2026
4a5f8bc
modify the GPU and CPU resourse request for qwen runtimes
YouNeedCryDear Mar 4, 2026
a21b1c9
improve qwen runtime to cover more models
YouNeedCryDear Mar 5, 2026
0f1e999
add qwen3 VL in supported model format
YouNeedCryDear Mar 6, 2026
53ee70d
add sample isvc for supported qwen models
YouNeedCryDear Mar 6, 2026
30bd938
remove old model specific runtimes
YouNeedCryDear Mar 6, 2026
267f69d
use qwen.<MODEL NAME LOWER CASE> as display name for consistency
YouNeedCryDear Mar 6, 2026
cd192cb
remove old isvc samples
YouNeedCryDear Mar 9, 2026
3fbc92b
use smg container with grpc
YouNeedCryDear Mar 11, 2026
0b48a47
fine grind the engine args
YouNeedCryDear Mar 12, 2026
da0abda
add worker timeout to fp8 runtimes and use http mode for mm runtimes
YouNeedCryDear Mar 17, 2026
cbb7db7
adjust qwen runtimes to vllm
YouNeedCryDear Mar 24, 2026
55afff8
combine runtimes for qwen
YouNeedCryDear Mar 24, 2026
6084f8c
add generation config and optimization in engine arg
YouNeedCryDear Mar 31, 2026
0908907
add more qwen models
YouNeedCryDear Mar 31, 2026
d311dfd
add 512 as max concurrent request for safeguard
YouNeedCryDear Mar 31, 2026
def9d91
include vllm qwen runtimes in kustomize
YouNeedCryDear Mar 31, 2026
27b235f
use -1 for max model len and update smg to 1.4.0
YouNeedCryDear Apr 3, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions config/models/Qwen/Qwen-14B-Chat.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
apiVersion: ome.io/v1beta1
kind: ClusterBaseModel
metadata:
name: qwen-14b-chat
spec:
modelCapabilities:
- TEXT_TO_TEXT
vendor: Qwen
displayName: qwen.qwen-14b-chat
modelArchitecture: QWenLMHeadModel
disabled: false
version: "1.0.0"
modelFormat:
name: safetensors
version: "1.0.0"
modelFramework:
name: transformers
version: "4.32.0"
modelParameterSize: 14B
storage:
storageUri: hf://Qwen/Qwen-14B-Chat
path: /raid/models/Qwen/Qwen-14B-Chat
22 changes: 22 additions & 0 deletions config/models/Qwen/Qwen-1_8B-Chat.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
apiVersion: ome.io/v1beta1
kind: ClusterBaseModel
metadata:
name: qwen-1-8b-chat
spec:
modelCapabilities:
- TEXT_TO_TEXT
vendor: Qwen
displayName: qwen.qwen-1_8b-chat
modelArchitecture: QWenLMHeadModel
disabled: false
version: "1.0.0"
modelFormat:
name: safetensors
version: "1.0.0"
modelFramework:
name: transformers
version: "4.32.0"
modelParameterSize: 1.8B
storage:
storageUri: hf://Qwen/Qwen-1_8B-Chat
path: /raid/models/Qwen/Qwen-1_8B-Chat
22 changes: 22 additions & 0 deletions config/models/Qwen/Qwen-72B-Chat.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
apiVersion: ome.io/v1beta1
kind: ClusterBaseModel
metadata:
name: qwen-72b-chat
spec:
modelCapabilities:
- TEXT_TO_TEXT
vendor: Qwen
displayName: qwen.qwen-72b-chat
modelArchitecture: QWenLMHeadModel
disabled: false
version: "1.0.0"
modelFormat:
name: safetensors
version: "1.0.0"
modelFramework:
name: transformers
version: "4.32.0"
modelParameterSize: 72B
storage:
storageUri: hf://Qwen/Qwen-72B-Chat
path: /raid/models/Qwen/Qwen-72B-Chat
2 changes: 2 additions & 0 deletions config/models/Qwen/Qwen-Image-Edit-Plus.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@ kind: ClusterBaseModel
metadata:
name: qwen-image-edit-plus
spec:
modelCapabilities:
- IMAGE_TEXT_TO_IMAGE
vendor: Qwen
displayName: qwen.qwen-image-edit-plus
disabled: false
Expand Down
2 changes: 2 additions & 0 deletions config/models/Qwen/Qwen-Image-Edit.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@ kind: ClusterBaseModel
metadata:
name: qwen-image-edit
spec:
modelCapabilities:
- IMAGE_TEXT_TO_IMAGE
vendor: Qwen
displayName: qwen.qwen-image-edit
disabled: false
Expand Down
2 changes: 2 additions & 0 deletions config/models/Qwen/Qwen-Image.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@ kind: ClusterBaseModel
metadata:
name: qwen-image
spec:
modelCapabilities:
- TEXT_TO_IMAGE
vendor: Qwen
displayName: qwen.qwen-image
disabled: false
Expand Down
22 changes: 22 additions & 0 deletions config/models/Qwen/Qwen1.5-0.5B-Chat.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
apiVersion: ome.io/v1beta1
kind: ClusterBaseModel
metadata:
name: qwen1-5-0-5b-chat
spec:
modelCapabilities:
- TEXT_TO_TEXT
vendor: Qwen
displayName: qwen.qwen1.5-0.5b-chat
modelArchitecture: Qwen2ForCausalLM
disabled: false
version: "1.0.0"
modelFormat:
name: safetensors
version: "1.0.0"
modelFramework:
name: transformers
version: "4.37.0"
modelParameterSize: 0.5B
storage:
storageUri: hf://Qwen/Qwen1.5-0.5B-Chat
path: /raid/models/Qwen/Qwen1.5-0.5B-Chat
22 changes: 22 additions & 0 deletions config/models/Qwen/Qwen1.5-1.8B-Chat.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
apiVersion: ome.io/v1beta1
kind: ClusterBaseModel
metadata:
name: qwen1-5-1-8b-chat
spec:
modelCapabilities:
- TEXT_TO_TEXT
vendor: Qwen
displayName: qwen.qwen1.5-1.8b-chat
modelArchitecture: Qwen2ForCausalLM
disabled: false
version: "1.0.0"
modelFormat:
name: safetensors
version: "1.0.0"
modelFramework:
name: transformers
version: "4.37.0"
modelParameterSize: 1.8B
storage:
storageUri: hf://Qwen/Qwen1.5-1.8B-Chat
path: /raid/models/Qwen/Qwen1.5-1.8B-Chat
2 changes: 1 addition & 1 deletion config/models/Qwen/Qwen1.5-110B-Chat.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ spec:
modelCapabilities:
- TEXT_TO_TEXT
vendor: Qwen
displayName: qwen.qwen1-5-110b-chat
displayName: qwen.qwen1.5-110b-chat
disabled: false
version: "1.0.0"
modelArchitecture: Qwen2ForCausalLM
Expand Down
22 changes: 22 additions & 0 deletions config/models/Qwen/Qwen1.5-14B-Chat.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
apiVersion: ome.io/v1beta1
kind: ClusterBaseModel
metadata:
name: qwen1-5-14b-chat
spec:
modelCapabilities:
- TEXT_TO_TEXT
vendor: Qwen
displayName: qwen.qwen1.5-14b-chat
modelArchitecture: Qwen2ForCausalLM
disabled: false
version: "1.0.0"
modelFormat:
name: safetensors
version: "1.0.0"
modelFramework:
name: transformers
version: "4.37.0"
modelParameterSize: 14B
storage:
storageUri: hf://Qwen/Qwen1.5-14B-Chat
path: /raid/models/Qwen/Qwen1.5-14B-Chat
2 changes: 1 addition & 1 deletion config/models/Qwen/Qwen1.5-32B-Chat.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ spec:
modelCapabilities:
- TEXT_TO_TEXT
vendor: Qwen
displayName: qwen.qwen1-5-32b-chat
displayName: qwen.qwen1.5-32b-chat
disabled: false
version: "1.0.0"
modelArchitecture: Qwen2ForCausalLM
Expand Down
22 changes: 22 additions & 0 deletions config/models/Qwen/Qwen1.5-4B-Chat.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
apiVersion: ome.io/v1beta1
kind: ClusterBaseModel
metadata:
name: qwen1-5-4b-chat
spec:
modelCapabilities:
- TEXT_TO_TEXT
vendor: Qwen
displayName: qwen.qwen1.5-4b-chat
modelArchitecture: Qwen2ForCausalLM
disabled: false
version: "1.0.0"
modelFormat:
name: safetensors
version: "1.0.0"
modelFramework:
name: transformers
version: "4.37.0"
modelParameterSize: 4B
storage:
storageUri: hf://Qwen/Qwen1.5-4B-Chat
path: /raid/models/Qwen/Qwen1.5-4B-Chat
2 changes: 1 addition & 1 deletion config/models/Qwen/Qwen1.5-72B-Chat.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ spec:
modelCapabilities:
- TEXT_TO_TEXT
vendor: Qwen
displayName: qwen.qwen1-5-72b-chat
displayName: qwen.qwen1.5-72b-chat
disabled: false
version: "1.0.0"
modelArchitecture: Qwen2ForCausalLM
Expand Down
2 changes: 1 addition & 1 deletion config/models/Qwen/Qwen1.5-7B-Chat.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ spec:
modelCapabilities:
- TEXT_TO_TEXT
vendor: Qwen
displayName: qwen.qwen1-5-7b-chat
displayName: qwen.qwen1.5-7b-chat
disabled: false
version: "1.0.0"
modelArchitecture: Qwen2ForCausalLM
Expand Down
22 changes: 22 additions & 0 deletions config/models/Qwen/Qwen1.5-MoE-A2.7B-Chat.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
apiVersion: ome.io/v1beta1
kind: ClusterBaseModel
metadata:
name: qwen1-5-moe-a2-7b-chat
spec:
modelCapabilities:
- TEXT_TO_TEXT
vendor: Qwen
displayName: qwen.qwen1.5-moe-a2.7b-chat
modelArchitecture: Qwen2MoeForCausalLM
disabled: false
version: "1.0.0"
modelFormat:
name: safetensors
version: "1.0.0"
modelFramework:
name: transformers
version: "4.39.0.dev0"
modelParameterSize: 14.3B
storage:
storageUri: hf://Qwen/Qwen1.5-MoE-A2.7B-Chat
path: /raid/models/Qwen/Qwen1.5-MoE-A2.7B-Chat
22 changes: 22 additions & 0 deletions config/models/Qwen/Qwen2-0.5B-Instruct.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
apiVersion: ome.io/v1beta1
kind: ClusterBaseModel
metadata:
name: qwen2-0-5b-instruct
spec:
modelCapabilities:
- TEXT_TO_TEXT
vendor: Qwen
displayName: qwen.qwen2-0.5b-instruct
modelArchitecture: Qwen2ForCausalLM
disabled: false
version: "1.0.0"
modelFormat:
name: safetensors
version: "1.0.0"
modelFramework:
name: transformers
version: "4.40.1"
modelParameterSize: 0.5B
storage:
storageUri: hf://Qwen/Qwen2-0.5B-Instruct
path: /raid/models/Qwen/Qwen2-0.5B-Instruct
22 changes: 22 additions & 0 deletions config/models/Qwen/Qwen2-1.5B-Instruct.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
apiVersion: ome.io/v1beta1
kind: ClusterBaseModel
metadata:
name: qwen2-1-5b-instruct
spec:
modelCapabilities:
- TEXT_TO_TEXT
vendor: Qwen
displayName: qwen.qwen2-1.5b-instruct
modelArchitecture: Qwen2ForCausalLM
disabled: false
version: "1.0.0"
modelFormat:
name: safetensors
version: "1.0.0"
modelFramework:
name: transformers
version: "4.40.1"
modelParameterSize: 1.5B
storage:
storageUri: hf://Qwen/Qwen2-1.5B-Instruct
path: /raid/models/Qwen/Qwen2-1.5B-Instruct
22 changes: 22 additions & 0 deletions config/models/Qwen/Qwen2-57B-A14B-Instruct.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
apiVersion: ome.io/v1beta1
kind: ClusterBaseModel
metadata:
name: qwen2-57b-a14b-instruct
spec:
modelCapabilities:
- TEXT_TO_TEXT
vendor: Qwen
displayName: qwen.qwen2-57b-a14b-instruct
modelArchitecture: Qwen2MoeForCausalLM
disabled: false
version: "1.0.0"
modelFormat:
name: safetensors
version: "1.0.0"
modelFramework:
name: transformers
version: "4.40.1"
modelParameterSize: 57B
storage:
storageUri: hf://Qwen/Qwen2-57B-A14B-Instruct
path: /raid/models/Qwen/Qwen2-57B-A14B-Instruct
22 changes: 22 additions & 0 deletions config/models/Qwen/Qwen2-Math-1.5B-Instruct.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
apiVersion: ome.io/v1beta1
kind: ClusterBaseModel
metadata:
name: qwen2-math-1-5b-instruct
spec:
modelCapabilities:
- TEXT_TO_TEXT
vendor: Qwen
displayName: qwen.qwen2-math-1.5b-instruct
modelArchitecture: Qwen2ForCausalLM
disabled: false
version: "1.0.0"
modelFormat:
name: safetensors
version: "1.0.0"
modelFramework:
name: transformers
version: "4.43.1"
modelParameterSize: 1.5B
storage:
storageUri: hf://Qwen/Qwen2-Math-1.5B-Instruct
path: /raid/models/Qwen/Qwen2-Math-1.5B-Instruct
22 changes: 22 additions & 0 deletions config/models/Qwen/Qwen2-Math-72B-Instruct.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
apiVersion: ome.io/v1beta1
kind: ClusterBaseModel
metadata:
name: qwen2-math-72b-instruct
spec:
modelCapabilities:
- TEXT_TO_TEXT
vendor: Qwen
displayName: qwen.qwen2-math-72b-instruct
modelArchitecture: Qwen2ForCausalLM
disabled: false
version: "1.0.0"
modelFormat:
name: safetensors
version: "1.0.0"
modelFramework:
name: transformers
version: "4.43.1"
modelParameterSize: 72B
storage:
storageUri: hf://Qwen/Qwen2-Math-72B-Instruct
path: /raid/models/Qwen/Qwen2-Math-72B-Instruct
22 changes: 22 additions & 0 deletions config/models/Qwen/Qwen2-Math-7B-Instruct.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
apiVersion: ome.io/v1beta1
kind: ClusterBaseModel
metadata:
name: qwen2-math-7b-instruct
spec:
modelCapabilities:
- TEXT_TO_TEXT
vendor: Qwen
displayName: qwen.qwen2-math-7b-instruct
modelArchitecture: Qwen2ForCausalLM
disabled: false
version: "1.0.0"
modelFormat:
name: safetensors
version: "1.0.0"
modelFramework:
name: transformers
version: "4.43.1"
modelParameterSize: 7B
storage:
storageUri: hf://Qwen/Qwen2-Math-7B-Instruct
path: /raid/models/Qwen/Qwen2-Math-7B-Instruct
Loading
Loading