[Data] Apply DataProto to vLLM Inference & Align API with SGLang by wheresmyhair · Pull Request #967 · OptimalScale/LMFlow

wheresmyhair · 2026-04-11T11:42:03Z

Overview

Apply DataProto to vllm inference pipeline, aligning its API with the sglang inferencer introduced in Unified data exchange protocol across modules #960. This unifies data exchange across inference engines and modernizes the vllm integration.
Remove Ray dependency in vllm, paving the way for a Ray-less lmflow implementation.

Detailed Description

DataProto integration

VLLMInferencer now returns DataProto instead of list[VLLMInferenceResultWithInput], with prompts in non_tensor_batch["inputs"] and generated text in non_tensor_batch["outputs"]
prepare_inputs_for_inference creates DataProto for both sglang and vllm through a unified code path
__vllm_inference in HFDecoderModel extracts prompts and sampling params from DataProto, converts to vllm.SamplingParams, and stores outputs back into the proto
Inference results are saved/loaded as pickle via DataProto.save_to_disk / load_from_disk
inference_results_path now accepts a directory — results are automatically saved as inference_results.pkl inside it

API alignment with sglang and modernization

VLLMInferencer now mirrors SGLangInferencer
Removed InferencerWithOffloading base class and all Ray-based distributed inference code -- vllm >= 0.8 supports data_parallel_size natively in vllm.LLM(), using a multiprocessing backend with no Ray dependency
Added --inference_data_parallel_size argument
Total GPUs used = tensor_parallel_size × data_parallel_size
Removed use_beam_search from sampling params (dropped in vLLM V1), added deprecation warning
Fixed deactivate_model_for_inference — old cleanup code referenced llm_engine.model_executor.driver_worker which no longer exists in V1
Added --inference_max_model_len to cap context length (prompt and output) for models with large defaults
Bumped vllm version constraint from >=0.4.3 to >=0.8.0 in setup.py

Files changed

File	Change
`src/lmflow/pipeline/vllm_inferencer.py`	Major rewrite: DataProto, aligned API, native DP
`src/lmflow/models/hf_decoder_model.py`	DataProto for vllm, unified prepare_inputs
`src/lmflow/models/hf_model_mixin.py`	DP, max_model_len, V1-compatible deactivation
`src/lmflow/args.py`	New args, dir-based results path
`src/lmflow/pipeline/sglang_inferencer.py`	Dir-based results path
`src/lmflow/pipeline/utils/memory_safe_vllm_inference.py`	Simplified to new API
`examples/vllm_inference.py`	Simplified to match sglang pattern
`scripts/run_vllm_inference.sh`	New script
`scripts/run_sglang_inference.sh`	Updated results path
`setup.py`	vllm >= 0.8.0
`tests/pipeline/test_vllm_inferencer.py`	New, 8 tests

Downstream impact

MemorySafeVLLMInferencer is updated to return DataProto. iterative_dpo_aligner.py consumes MemorySafeVLLMInferencer and will need a separate update to handle DataProto instead of list[VLLMInferenceResultWithInput].

Tests

6 unit tests pass (no GPU): sampling params parsing, DataProto save/load round-trip, DataProto repeat logic
2 GPU integration tests pass: full inference pipeline + save/load with Qwen3-0.6B on RTX 4090
Run scripts/run_vllm_inference.sh end-to-end with target model

[Data] Apply DataProto to vLLM Inference & Align API with SGLang

dee43cf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Data] Apply DataProto to vLLM Inference & Align API with SGLang#967

[Data] Apply DataProto to vLLM Inference & Align API with SGLang#967
wheresmyhair wants to merge 1 commit intomainfrom
lmflow-vllm-dataproto

wheresmyhair commented Apr 11, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

wheresmyhair commented Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Detailed Description

DataProto integration

API alignment with sglang and modernization

Files changed

Downstream impact

Tests

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

wheresmyhair commented Apr 11, 2026 •

edited

Loading