Skip to content

Commit aae7a64

Browse files
committed
perf: refactor GLM KV cache and attention, add end-to-end timing instrumentation, and migrate to a unified benchsuite for benchmarking and strict gating
1 parent 6ac0115 commit aae7a64

21 files changed

Lines changed: 2291 additions & 371 deletions

File tree

.gitignore

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -2,12 +2,6 @@
22
*.iml
33
target
44
.DS_Store
5-
DeepSeek-OCR
6-
DeepSeek-OCR-2
7-
PaddleOCR-VL
8-
dots.ocr
9-
baselines/sample
10-
baselines/fixtures
115
__pycache__
126
.venv
137
.hf-cache

benchsuite/README.md

Lines changed: 175 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,175 @@
1+
# Benchsuite
2+
3+
统一的基准与门禁子项目,采用“统一入口 + 模型适配器(package)”结构。
4+
5+
## 设计
6+
7+
- 统一入口:`python -m benchsuite.cli`
8+
- 子命令:
9+
- `gate`:strict token 对齐门禁
10+
- `bench-python`:Python 侧单次基准
11+
- `bench-rust`:Rust CLI 侧单次基准
12+
- `perf`:按 model/device/precision/case matrix 自动跑 Python+Rust,对比并保存 run 历史
13+
- `matrix-gate`:按 model/device/precision/case matrix 执行 strict gate(prompt+token)
14+
- 模型适配器:`benchsuite/models/<model>.py`
15+
- 当前实现:`glm.py``GlmAdapter`
16+
17+
## 安装
18+
19+
在仓库根目录:
20+
21+
```bash
22+
python -m pip install -e '.[bench]'
23+
```
24+
25+
安装后可用统一命令:
26+
27+
```bash
28+
benchsuite --help
29+
```
30+
31+
也可以继续用模块调用:
32+
33+
```bash
34+
python -m benchsuite.cli --help
35+
```
36+
37+
## 离线约束
38+
39+
所有子命令统一设置:
40+
41+
- `HF_HUB_OFFLINE=1`
42+
- `TRANSFORMERS_OFFLINE=1`
43+
- `HF_HOME=.hf-cache`
44+
- `TRANSFORMERS_CACHE=.hf-cache`
45+
- `DEEPSEEK_OCR_CONFIG_DIR=.cli-config`
46+
- `DEEPSEEK_OCR_CACHE_DIR=.cli-cache`
47+
48+
## 用法
49+
50+
### 1) strict token gate
51+
52+
```bash
53+
python -m benchsuite.cli gate \
54+
--model glm-ocr \
55+
--baseline baselines/glm/matrix_v20/formula__image__n8/baseline.json \
56+
--rust baselines/glm/matrix_v33/formula__image__n8/rust_output.json \
57+
--output baselines/glm/matrix_v33/formula__image__n8/compare.json
58+
```
59+
60+
### 2) 单次 Python / Rust 基准
61+
62+
```bash
63+
python -m benchsuite.cli bench-python \
64+
--model glm-ocr \
65+
--model-dir .cli-cache/models/glm-ocr \
66+
--image baselines/sample/images/test.png \
67+
--prompt "Formula Recognition:" \
68+
--device cpu \
69+
--dtype f32 \
70+
--max-new-tokens 8 \
71+
--output baselines/glm/perf_py_v22/formula__test__n8/cpu_f32/bench.json
72+
73+
python -m benchsuite.cli bench-rust \
74+
--model glm-ocr \
75+
--cli target/release/deepseek-ocr-cli \
76+
--image baselines/sample/images/test.png \
77+
--prompt "Formula Recognition:" \
78+
--device cpu \
79+
--dtype f32 \
80+
--max-new-tokens 8 \
81+
--output baselines/glm/perf_rs_v22/formula__test__n8/cpu_f32/bench.json
82+
```
83+
84+
### 3) 一键 perf 矩阵(自动跑两边 + 自动对比 + 历史 run 对比)
85+
86+
```bash
87+
python -m benchsuite.cli perf \
88+
--run v23 \
89+
--include-models glm-ocr \
90+
--include-devices cpu mps \
91+
--include-precision f32 f16
92+
```
93+
94+
输出包括:
95+
96+
- `baselines/benchsuite/runs/<run>/perf/summary.json`(结构化结果)
97+
- `baselines/benchsuite/runs/<run>/perf/report.txt`(可读对比表)
98+
- `baselines/benchsuite/runs/<run>/perf/<model>/<case>/<device_dtype>/{python,rust,compare}.json`
99+
100+
你也可以显式指定单 case:
101+
102+
```bash
103+
python -m benchsuite.cli perf \
104+
--run adhoc \
105+
--include-models glm-ocr \
106+
--include-devices cpu \
107+
--include-precision f32 \
108+
--image baselines/sample/images/test.png \
109+
--prompt "Formula Recognition:" \
110+
--max-new-tokens 64
111+
```
112+
113+
快速迭代(只跑前 N 个 case):
114+
115+
```bash
116+
python -m benchsuite.cli perf \
117+
--run smoke \
118+
--include-models glm-ocr \
119+
--include-devices cpu \
120+
--include-precision f32 \
121+
--limit 1
122+
```
123+
124+
### 4) 一键 matrix strict gate(默认 24-case)
125+
126+
```bash
127+
python -m benchsuite.cli matrix-gate \
128+
--run gate_v34 \
129+
--include-models glm-ocr \
130+
--include-devices cpu mps \
131+
--include-precision f32 f16
132+
```
133+
134+
输出包括:
135+
136+
- `baselines/benchsuite/runs/<run>/matrix/summary.json`
137+
- `baselines/benchsuite/runs/<run>/matrix/report.txt`
138+
- `baselines/benchsuite/runs/<run>/matrix/<model>/<case>/<device_dtype>/{python,compare}.json`
139+
140+
常用筛选:
141+
142+
```bash
143+
python -m benchsuite.cli matrix-gate \
144+
--run smoke \
145+
--include-models glm-ocr \
146+
--include-devices cpu \
147+
--include-precision f32 \
148+
--limit 1
149+
150+
python -m benchsuite.cli matrix-gate \
151+
--run formula_only \
152+
--include-models glm-ocr \
153+
--include-devices cpu \
154+
--include-precision f32 \
155+
--cases formula__image__n8 formula__test__n8
156+
```
157+
158+
ad-hoc 单条输入(不走内建 matrix):
159+
160+
```bash
161+
python -m benchsuite.cli matrix-gate \
162+
--run adhoc_gate \
163+
--include-models glm-ocr \
164+
--include-devices cpu \
165+
--include-precision f32 \
166+
--image baselines/sample/images/test.png \
167+
--prompt "Formula Recognition:" \
168+
--max-new-tokens 8
169+
```
170+
171+
## 扩展新模型
172+
173+
1. 新建 `benchsuite/models/<name>.py`,实现 `<Name>Adapter`
174+
2.`benchsuite/registry.py` 注册名称
175+
3. 复用统一入口,无需再新增散脚本

benchsuite/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
"""Unified benchmark/gate toolkit for OCR backends."""
2+

benchsuite/cli.py

Lines changed: 182 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,182 @@
1+
#!/usr/bin/env python3
2+
from __future__ import annotations
3+
4+
import argparse
5+
from pathlib import Path
6+
7+
from benchsuite.common import repo_root, write_json
8+
from benchsuite.orchestrator import BenchOrchestrator
9+
from benchsuite.registry import get_adapter
10+
11+
try:
12+
from tqdm.auto import tqdm
13+
except Exception: # pragma: no cover
14+
tqdm = None
15+
16+
17+
_ORCHESTRATOR = BenchOrchestrator()
18+
19+
20+
def _single_job_progress(desc: str):
21+
if tqdm is None:
22+
return None
23+
return tqdm(total=1, desc=desc, unit="job")
24+
25+
26+
def _run_gate(args: argparse.Namespace) -> int:
27+
adapter = get_adapter(args.model)
28+
report = adapter.compare_tokens(args.baseline, args.rust)
29+
out = args.output if args.output else args.rust.parent / "compare.json"
30+
write_json(out, report)
31+
print(out)
32+
return 0 if report["match"] else 1
33+
34+
35+
def _run_bench_python(args: argparse.Namespace) -> int:
36+
adapter = get_adapter(args.model)
37+
pbar = _single_job_progress("bench-python")
38+
try:
39+
payload = adapter.run_python_bench(
40+
model_dir=args.model_dir,
41+
image=args.image,
42+
prompt=args.prompt,
43+
max_new_tokens=args.max_new_tokens,
44+
py_device=args.device,
45+
py_dtype=args.dtype,
46+
output=args.output,
47+
repo_root=repo_root(),
48+
)
49+
if pbar is not None:
50+
pbar.update(1)
51+
finally:
52+
if pbar is not None:
53+
pbar.close()
54+
print(args.output)
55+
if args.print_json:
56+
import json
57+
58+
print(json.dumps(payload, ensure_ascii=False))
59+
return 0
60+
61+
62+
def _run_bench_rust(args: argparse.Namespace) -> int:
63+
adapter = get_adapter(args.model)
64+
pbar = _single_job_progress("bench-rust")
65+
try:
66+
payload = adapter.run_rust_bench(
67+
cli=args.cli,
68+
image=args.image,
69+
prompt=args.prompt,
70+
max_new_tokens=args.max_new_tokens,
71+
rs_device=args.device,
72+
rs_dtype=args.dtype,
73+
output=args.output,
74+
repo_root=repo_root(),
75+
)
76+
if pbar is not None:
77+
pbar.update(1)
78+
finally:
79+
if pbar is not None:
80+
pbar.close()
81+
print(args.output)
82+
if args.print_json:
83+
import json
84+
85+
print(json.dumps(payload, ensure_ascii=False))
86+
return 0
87+
88+
89+
def _run_perf(args: argparse.Namespace) -> int:
90+
return _ORCHESTRATOR.run_perf(args)
91+
92+
93+
def _run_matrix_gate(args: argparse.Namespace) -> int:
94+
return _ORCHESTRATOR.run_matrix_gate(args)
95+
96+
97+
def build_parser() -> argparse.ArgumentParser:
98+
parser = argparse.ArgumentParser(
99+
prog="python -m benchsuite.cli",
100+
description="Unified benchmark + gate CLI with model adapters",
101+
)
102+
sub = parser.add_subparsers(dest="command", required=True)
103+
104+
p = sub.add_parser("gate", help="strict token gate: baseline vs rust output")
105+
p.add_argument("--model", default="glm-ocr")
106+
p.add_argument("--baseline", required=True, type=Path)
107+
p.add_argument("--rust", required=True, type=Path)
108+
p.add_argument("--output", type=Path)
109+
p.set_defaults(func=_run_gate)
110+
111+
p = sub.add_parser("bench-python", help="run python benchmark for one model case")
112+
p.add_argument("--model", default="glm-ocr")
113+
p.add_argument("--model-dir", required=True, type=Path)
114+
p.add_argument("--image", required=True, type=Path)
115+
p.add_argument("--prompt", required=True)
116+
p.add_argument("--device", required=True, choices=["cpu", "mps"])
117+
p.add_argument("--dtype", required=True, choices=["f32", "f16"])
118+
p.add_argument("--max-new-tokens", type=int, default=64)
119+
p.add_argument("--output", required=True, type=Path)
120+
p.add_argument("--print-json", action="store_true")
121+
p.set_defaults(func=_run_bench_python)
122+
123+
p = sub.add_parser("bench-rust", help="run rust benchmark for one model case")
124+
p.add_argument("--model", default="glm-ocr")
125+
p.add_argument("--cli", default=Path("target/release/deepseek-ocr-cli"), type=Path)
126+
p.add_argument("--image", required=True, type=Path)
127+
p.add_argument("--prompt", required=True)
128+
p.add_argument("--device", required=True, choices=["cpu", "metal"])
129+
p.add_argument("--dtype", required=True, choices=["f32", "f16"])
130+
p.add_argument("--max-new-tokens", type=int, required=True)
131+
p.add_argument("--output", required=True, type=Path)
132+
p.add_argument("--print-json", action="store_true")
133+
p.set_defaults(func=_run_bench_rust)
134+
135+
p = sub.add_parser("perf", help="one-command run: py+rust compare with history")
136+
p.add_argument("--run", help="run id used under baselines/*/runs/<run>")
137+
p.add_argument("--tag", default="latest")
138+
p.add_argument("--include-models", nargs="*", default=[])
139+
p.add_argument("--include-devices", nargs="*", default=[])
140+
p.add_argument("--include-precision", nargs="*", default=[])
141+
p.add_argument("--cli", default=Path("target/release/deepseek-ocr-cli"), type=Path)
142+
p.add_argument("--model-dir", type=Path)
143+
p.add_argument("--case-name")
144+
p.add_argument("--baseline-json", type=Path)
145+
p.add_argument("--matrix-source", type=Path)
146+
p.add_argument("--image", type=Path)
147+
p.add_argument("--prompt")
148+
p.add_argument("--max-new-tokens", type=int)
149+
p.add_argument("--cases", nargs="*")
150+
p.add_argument("--limit", type=int)
151+
p.add_argument("--output-root", type=Path)
152+
p.set_defaults(func=_run_perf)
153+
154+
p = sub.add_parser("matrix-gate", help="one-command strict matrix gate run")
155+
p.add_argument("--run", help="run id used under baselines/*/runs/<run>")
156+
p.add_argument("--tag", default="latest")
157+
p.add_argument("--include-models", nargs="*", default=[])
158+
p.add_argument("--include-devices", nargs="*", default=[])
159+
p.add_argument("--include-precision", nargs="*", default=[])
160+
p.add_argument("--cli", default=Path("target/release/deepseek-ocr-cli"), type=Path)
161+
p.add_argument("--model-dir", type=Path)
162+
p.add_argument("--source-matrix", type=Path)
163+
p.add_argument("--output-root", type=Path)
164+
p.add_argument("--case-name", default="adhoc")
165+
p.add_argument("--image", type=Path)
166+
p.add_argument("--prompt")
167+
p.add_argument("--max-new-tokens", type=int)
168+
p.add_argument("--cases", nargs="*")
169+
p.add_argument("--limit", type=int)
170+
p.set_defaults(func=_run_matrix_gate)
171+
172+
return parser
173+
174+
175+
def main() -> int:
176+
parser = build_parser()
177+
args = parser.parse_args()
178+
return int(args.func(args))
179+
180+
181+
if __name__ == "__main__":
182+
raise SystemExit(main())

0 commit comments

Comments
 (0)