Allocation Mode

This document describes AReaL's allocation mode system, which controls how GPUs are distributed between inference and training backends during distributed RL training.

Overview

Each engine component (actor, critic, rollout, ref, teacher) has its own backend configuration field that specifies:

Which backend to use (SGLang, vLLM for inference; FSDP, Megatron, Archon for training)
The parallelization strategy
The total number of GPUs required

AReaL parses each backend string into a ModelAllocation object that drives resource allocation for that specific engine.

Configuration

Per-Engine Backend Fields

Each engine in the YAML config has its own backend field:

# Rollout (inference) engine
rollout:
  backend: "sglang:d4t2"

# Actor (training) engine
actor:
  backend: "fsdp:d8"

# Critic engine (falls back to actor.backend if empty)
critic:
  backend: ""

# Ref engine (falls back to actor.backend if empty)
ref:
  backend: ""

When critic.backend or ref.backend is empty, it automatically inherits from actor.backend.

Note: The top-level allocation_mode config field is deprecated and only retained for backward compatibility with legacy SPMD launchers (local/ray/slurm). It is ignored by the single-controller scheduler. Use the per-engine backend fields shown above instead.

Backend String Syntax

<backend>:<parallelism_dims>

For example, fsdp:d4t2 means: use the FSDP backend with data parallelism 4 and tensor parallelism 2.

Parallelism Dimensions

Dimension	Abbreviation	Description	Valid For
Data	`d`	Number of model replicas	All backends
Tensor	`t`	Split operations across GPUs	All backends
Pipeline	`p`	Split layers across GPUs in stages	Megatron, Archon
Context	`c`	Split sequence length across GPUs	All backends
Expert	`e`	Split MoE experts across GPUs	Megatron, Archon

Dimensions are specified as <abbrev><size>, e.g., d4t2 means data parallel size 4 and tensor parallel size 2.

Calculating GPU Requirements

The total GPUs for a component is computed as:

world_size = dp × tp × pp × cp

Expert parallelism (e) does not increase world size—it redistributes how experts are placed within the existing GPU mesh.

Examples

Backend String	GPUs per Engine	Notes
`fsdp:d8`	8	8 data-parallel replicas
`sglang:d2t4`	8	2 instances × 4 TP GPUs
`megatron:d2p2t4`	16	2 DP × 2 PP × 4 TP
`megatron:d2p2t4e4`	16	Same mesh, 4-way expert par

Full Config Example

# 16-GPU setup: 8 inference + 8 training
rollout:
  backend: "sglang:d2t4"    # 2 × 4 = 8 GPUs
actor:
  backend: "fsdp:d4t2"      # 4 × 2 = 8 GPUs

Backend Selection

Inference Backends

Backend	Supported Dimensions
`sglang`	`d`, `t`
`vllm`	`d`, `t`, `p`

For inference, d represents the number of independent server instances, and each instance uses t × p GPUs.

Note that the internal backend configurations do not affect how AReaL allocates GPUs. Given rollout.backend: "sglang:d4t4", you can also configure sglang.dp_size=4, sglang.ep_size=4, and sglang.enable_dp_attention=True. In this case, we launch 4 model replicas each with 4 GPUs. Within each instance, SGLang will still use DP attention and expert parallelism to distribute computations in attention and expert layers.

Training Backends

Backend	Supported Dimensions	Use Case
`fsdp`	`d`, `t`, `c`	Default for simple parallelism
`megatron`	`d`, `t`, `p`, `c`, `e`	Required for pipeline or expert parallel
`archon`	`d`, `t`, `p`, `c`, `e`	Alternative to Megatron (experimental)

Important: An explicit backend prefix is required in all allocation strings. Bare dimension strings (e.g., d4t2) are no longer accepted. Always specify the backend explicitly: fsdp:d4t2, megatron:d2p2t4, sglang:d4t2.

MoE Hybrid Parallelism

For Mixture-of-Experts models, Megatron/Archon supports different parallelism strategies for attention and FFN (expert) modules using the hybrid syntax:

megatron:(attn:<attn_dims>|ffn:<ffn_dims>)

This enables MoE Parallel Folding, which reduces the minimum GPU requirement for combined context and expert parallelism.

Constraints

Pipeline parallel size (p) must be identical for attn and ffn
World size must match (if d is omitted in ffn, it is derived automatically)
Expert parallel (e) is only valid in the ffn section

Example

actor:
  backend: "megatron:(attn:d4p2t2c2|ffn:d2p2t4e2)"

Module	dp	pp	tp	cp	ep	World Size
attn	4	2	2	2	-	32
ffn	2	2	4	-	2	32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allocation Mode

Overview

Configuration

Per-Engine Backend Fields

Backend String Syntax

Parallelism Dimensions

Calculating GPU Requirements

Examples

Full Config Example

Backend Selection

Inference Backends

Training Backends

MoE Hybrid Parallelism

Constraints

Example

See Also

FilesExpand file tree

alloc_mode.md

Latest commit

History

alloc_mode.md

File metadata and controls

Allocation Mode

Overview

Configuration

Per-Engine Backend Fields

Backend String Syntax

Parallelism Dimensions

Calculating GPU Requirements

Examples

Full Config Example

Backend Selection

Inference Backends

Training Backends

MoE Hybrid Parallelism

Constraints

Example

See Also