Codex/0330 by PI-33 · Pull Request #6 · Gen-Verse/CURE

PI-33 · 2026-03-30T07:46:29Z

No description provided.

…pe-ppo-or-grpo Refactor RL pipeline for GRPO grouping

- Add environment.yml, requirements.txt for reproducible env setup - Add start.sh launch script with VLLM_CUDART_SO_PATH for /proc-restricted containers - Add .gitignore for temp_data, logs, checkpoints, and data files - Fix optimization_config.py for 2-GPU small-scale training (CodeContests_200 + MBPP) - Fix openrlhf_deepspeed.py: fallback to torch.optim.AdamW when FusedAdam JIT fails - Fix trainer.py: handle empty training batches and packed metadata averaging Co-authored-by: Cursor <cursoragent@cursor.com>

…ging, and 8-GPU config - Add experiment directory auto-creation with config snapshots and result archiving - Separate TensorBoard logs into dedicated tb_logs/ dir, pass step index for proper tracking - Add per-step training metrics logging (policy_loss, kl_loss, clip_ratio, entropy, lr) - Extend reward.py with estimated reward statistics output - Update optimization_config for Qwen2.5-7B-Instruct on 8-GPU (4×TP2) setup - Add config variants (debug, paper-aligned) under optimization/configs/ - Improve .gitignore to exclude logs, experiment checkpoints, and temp data Made-with: Cursor

- analysis/: training curves, eval summaries, BoN robustness analysis - figures/: training visualization plots - scripts: generate_report.py, regenerate_figures.py - DESIGN_SELF_BOOTSTRAPPED_GRPO.md: self-supervised reward design Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

PI-33 and others added 7 commits October 21, 2025 18:39

Refactor RL pipeline for GRPO grouping

3e33880

Merge pull request Gen-Verse#1 from PI-33/codex/determine-training-ty…

1b59e6f

…pe-ppo-or-grpo Refactor RL pipeline for GRPO grouping

Implement bootstrap reward pipeline and experiment reporting

1f7b6f2

Align startup scripts with start.sh and keep configs pure

1e36371

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Codex/0330#6

Codex/0330#6
PI-33 wants to merge 7 commits intoGen-Verse:mainfrom
PI-33:codex/0330

PI-33 commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

PI-33 commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant