Skip to content

Latest commit

 

History

History
170 lines (123 loc) · 5.78 KB

File metadata and controls

170 lines (123 loc) · 5.78 KB

Allocation Mode

This document describes AReaL's allocation mode system, which controls how GPUs are distributed between inference and training backends during distributed RL training.

Overview

Each engine component (actor, critic, rollout, ref, teacher) has its own backend configuration field that specifies:

  • Which backend to use (SGLang, vLLM for inference; FSDP, Megatron, Archon for training)
  • The parallelization strategy
  • The total number of GPUs required

AReaL parses each backend string into a ModelAllocation object that drives resource allocation for that specific engine.

Configuration

Per-Engine Backend Fields

Each engine in the YAML config has its own backend field:

# Rollout (inference) engine
rollout:
  backend: "sglang:d4t2"

# Actor (training) engine
actor:
  backend: "fsdp:d8"

# Critic engine (falls back to actor.backend if empty)
critic:
  backend: ""

# Ref engine (falls back to actor.backend if empty)
ref:
  backend: ""

When critic.backend or ref.backend is empty, it automatically inherits from actor.backend.

Note: The top-level allocation_mode config field is deprecated and only retained for backward compatibility with legacy SPMD launchers (local/ray/slurm). It is ignored by the single-controller scheduler. Use the per-engine backend fields shown above instead.

Backend String Syntax

<backend>:<parallelism_dims>

For example, fsdp:d4t2 means: use the FSDP backend with data parallelism 4 and tensor parallelism 2.

Parallelism Dimensions

Dimension Abbreviation Description Valid For
Data d Number of model replicas All backends
Tensor t Split operations across GPUs All backends
Pipeline p Split layers across GPUs in stages Megatron, Archon
Context c Split sequence length across GPUs All backends
Expert e Split MoE experts across GPUs Megatron, Archon

Dimensions are specified as <abbrev><size>, e.g., d4t2 means data parallel size 4 and tensor parallel size 2.

Calculating GPU Requirements

The total GPUs for a component is computed as:

world_size = dp × tp × pp × cp

Expert parallelism (e) does not increase world size—it redistributes how experts are placed within the existing GPU mesh.

Examples

Backend String GPUs per Engine Notes
fsdp:d8 8 8 data-parallel replicas
sglang:d2t4 8 2 instances × 4 TP GPUs
megatron:d2p2t4 16 2 DP × 2 PP × 4 TP
megatron:d2p2t4e4 16 Same mesh, 4-way expert par

Full Config Example

# 16-GPU setup: 8 inference + 8 training
rollout:
  backend: "sglang:d2t4"    # 2 × 4 = 8 GPUs
actor:
  backend: "fsdp:d4t2"      # 4 × 2 = 8 GPUs

Backend Selection

Inference Backends

Backend Supported Dimensions
sglang d, t
vllm d, t, p

For inference, d represents the number of independent server instances, and each instance uses t × p GPUs.

Note that the internal backend configurations do not affect how AReaL allocates GPUs. Given rollout.backend: "sglang:d4t4", you can also configure sglang.dp_size=4, sglang.ep_size=4, and sglang.enable_dp_attention=True. In this case, we launch 4 model replicas each with 4 GPUs. Within each instance, SGLang will still use DP attention and expert parallelism to distribute computations in attention and expert layers.

Training Backends

Backend Supported Dimensions Use Case
fsdp d, t, c Default for simple parallelism
megatron d, t, p, c, e Required for pipeline or expert parallel
archon d, t, p, c, e Alternative to Megatron (experimental)

Important: An explicit backend prefix is required in all allocation strings. Bare dimension strings (e.g., d4t2) are no longer accepted. Always specify the backend explicitly: fsdp:d4t2, megatron:d2p2t4, sglang:d4t2.

MoE Hybrid Parallelism

For Mixture-of-Experts models, Megatron/Archon supports different parallelism strategies for attention and FFN (expert) modules using the hybrid syntax:

megatron:(attn:<attn_dims>|ffn:<ffn_dims>)

This enables MoE Parallel Folding, which reduces the minimum GPU requirement for combined context and expert parallelism.

Constraints

  • Pipeline parallel size (p) must be identical for attn and ffn
  • World size must match (if d is omitted in ffn, it is derived automatically)
  • Expert parallel (e) is only valid in the ffn section

Example

actor:
  backend: "megatron:(attn:d4p2t2c2|ffn:d2p2t4e2)"
Module dp pp tp cp ep World Size
attn 4 2 2 2 - 32
ffn 2 2 4 - 2 32

See Also