This repository contains the official implementation for TRACE (Turning Recurrent Agent failures into Capability-targeted training Environments). TRACE is an end-to-end system for environment-specific agent self-improvement.
Large Language Models (LLMs) deployed in complex agentic environments often fail because they lack specific, underlying capabilities. Existing approaches typically rely on generic synthetic data or direct reinforcement learning (RL) on the target environment, which forces the model to learn these capabilities implicitly and inefficiently. TRACE solves this by automatically identifying the exact capabilities an agent lacks, synthesizing targeted training environments to isolate and train those capabilities, and routing to the appropriate adapter at inference time.
The TRACE pipeline consists of four automated steps:
- Capability Selection: An analysis agent contrasts successful and failed trajectories from the base model in the target environment. It identifies and ranks the specific capabilities that meaningfully distinguish successes from failures.
- Synthetic Environment Generation: For each identified deficit, a generation agent constructs a capability-targeted synthetic training environment. This environment preserves the target environment's interface (tool schemas, interaction protocols) while isolating the missing capability for verifiable training.
- GRPO Training: We train a separate Low-Rank Adaptation (LoRA) module for each capability-specific synthetic environment using Group Relative Policy Optimization (GRPO).
- Select & Adapt: At inference, the base model identifies the most relevant capability for the task given natural language descriptions of each capability, and the corresponding LoRA adapter is activated for generation.
TRACE demonstrates significant improvements and generalization across different complex environments:
-
$\tau^{2}$ -Bench (Customer Service): Improves upon the base agent by +14.1 points, achieving an overall pass rate of 47.0%. It scales more efficiently than baselines, outperforming GRPO and GEPA by +9.2 and +7.4 points given the same rollout budget. - ToolSandBox (Tool Use): Achieves a mean similarity score of 0.552, improving over the base model by +0.141 points and +7 perfect scores.
- Python 3.11+
- CUDA-capable GPUs (1 GPU for vLLM inference server, additional GPUs for training)
- conda or equivalent environment manager
conda create -n trace python -y
conda activate trace
pip install -r requirements.txtTRACE is designed to be driven by an LLM coding agent (Claude Code, Codex, etc.). You write a short YAML config, render it into step-by-step instructions, and hand those instructions to the agent.
1. Edit the config files in configs/ with your model and evaluation results.
There are two configs — one per pipeline stage:
configs/capability_selection.yaml— model, eval results, selection thresholdsconfigs/environment_generation.yaml— model, vLLM settings, rollout parameters
2. Render the pipeline documents:
python render_pipeline.py configs/capability_selection.yaml --stage capability
python render_pipeline.py configs/environment_generation.yaml --stage environmentThis produces files in prompts/:
my_experiment_capability_selection.md— instructions for Step 1my_experiment_environment_generation.md— instructions for Step 2
3. Hand each file to your coding agent (Claude Code, Codex, etc.) to execute.
Hand the rendered *_capability_selection.md to your coding agent. It will:
- Phase 1 (Discovery): Read trajectories and propose candidate capabilities →
pipeline/candidate_capabilities.json - Phase 2 (Labeling): Run N independent labeling passes →
pipeline/run_01.json...pipeline/run_10.json - Phase 3 (Aggregation): Run the filtering script →
pipeline/selected_capabilities.json
Hand the rendered *_environment_generation.md to your coding agent. Each invocation processes one capability (the highest-priority PENDING one). Re-invoke to process the next. For each capability, the agent will:
- Generate a synthetic environment file (e.g.,
capability_<name>_game.py) - Launch a vLLM server and collect validation rollouts
- Validate the reward distribution (target: 20-60% success rate)
- Mark the capability as DONE in
selected_capabilities.json
Train a LoRA adapter for each generated environment.
The training loop collects rollouts from a running vLLM server. Start it on a dedicated GPU:
conda activate trace
export CUDA_VISIBLE_DEVICES=0 # GPU for inference
export VLLM_ALLOW_RUNTIME_LORA_UPDATING=True
vllm serve <YOUR_MODEL> \
--host 0.0.0.0 \
--port 8080 \
--dtype bfloat16 \
--max-model-len 32000 \
--enable-lora \
--max-loras 2 \
--gpu-memory-utilization 0.85 \
--enable-auto-tool-choice \
--tool-call-parser hermesReplace <YOUR_MODEL> with your HuggingFace model ID (e.g., Qwen/Qwen3-30B-A3B-Instruct-2507).
In a separate terminal, launch training on the remaining GPUs:
conda activate trace
export CUDA_VISIBLE_DEVICES=1,2,3,4,5,6,7 # GPUs for training
export VLLM_BASE_URLS=http://localhost:8080
export VLLM_MODEL=<YOUR_MODEL>
export VLLM_ALLOW_RUNTIME_LORA_UPDATING=True
torchrun --nproc_per_node=<N_GPUS> --master-port=29501 -m train \
--game capability_<YOUR_CAPABILITY_NAME> \
--model <YOUR_MODEL> | Argument | Description |
|---|---|
--game |
Registered game name (e.g., capability_multi_step_transaction_completion) |
--model |
HuggingFace model ID (must match what vLLM is serving) |
--group-size |
Number of rollouts per seed in each GRPO group |
--groups-per-batch |
Number of groups per training batch |
Repeat Steps 3a-3b for each capability-specific environment to produce a set of LoRA adapters.
├── configs/
│ ├── capability_selection.yaml # Config for capability selection stage
│ └── environment_generation.yaml # Config for environment generation stage
├── prompts/ # Rendered pipeline instructions (generated)
├── pipeline/
│ ├── trace_capability_selection.md # Step 1 template
│ ├── trace_environment_generation.md # Step 2 template
│ ├── aggregate_capabilities.py # Aggregation script for Phase 3
│ ├── candidate_capabilities.json # Phase 1 output
│ ├── selected_capabilities.json # Phase 3 output
│ └── run_*.json # Phase 2 labeling outputs
├── train/
│ ├── __main__.py # Training entry point
│ ├── config.py # Hyperparameters (LoRA rank, LR, etc.)
│ ├── train_grpo.py # GRPO training loop
│ ├── collect_rollouts.py # Rollout collection against vLLM
│ ├── inference.py # vLLM client & prompt building
│ ├── model.py # Model loading with LoRA
│ └── ppo.py # GRPO loss computation
├── render_pipeline.py # Renders templates from YAML config
├── game_registry.py # Central game/environment registry
├── capability_*_game.py # Generated synthetic environments
├── requirements.txt # Python dependencies
└── gameplay_rollouts/ # Training rollout logs
If you find this work helpful in your research, please consider citing our paper:
@misc{kang2026tracecapabilitytargetedagentictraining,
title={TRACE: Capability-Targeted Agentic Training},
author={Hangoo Kang and Tarun Suresh and Jon Saad-Falcon and Azalia Mirhoseini},
year={2026},
eprint={2604.05336},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2604.05336},
}