XcHan Mr-XcHan

🎯

Focusing

Pinned Loading

ELAPSE ELAPSE Public

ELAPSE: Expand Latent Action Projection Space for Policy Optimization in Offline Reinforcement Learning.

Python 2
AEM AEM Public

Attention Ensemble Mixture.

Python 2
FGO FGO Public

Long Chain-of-Thought Compression via Fine-Grained Group Policy Optimization

Jupyter Notebook 1
GRPO_GPT2 GRPO_GPT2 Public

A reproduction of GRPO from deepseek based on gpt2.

Python 1
RL_algorithms RL_algorithms Public

Basic reinforcement learning algorithm.

Jupyter Notebook 2
OpenRLHF OpenRLHF Public

Forked from OpenRLHF/OpenRLHF

An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention & RFT)

Python