CVPR EDGE 2026 (Poster)
Kiran Nair, Rodrigue Rizk, KC Santosh
USD Artificial Intelligence Research
Department of Computer Science, University of South Dakota, USA
- Mar. 23, 2026: Accepted as a Poster at CVPR EDGE 2026
- Apr. 04, 2026: Initial codebase released
Large Language Models (LLMs) achieve state-of-the-art performance but incur substantial computational and energy costs due to their dense, fixed-depth Transformer architectures. We introduce the Spike-Gated Residual Unit (S-GRU), a lightweight module that enables dynamic depth adaptation in pretrained Transformers without modifying backbone weights. By inserting spike-gated units into each residual block and optimizing a sparsity-aware objective controlled by a regularization coefficient
(a) Transformer architecture with S-GRU applied to each decoder layer
(b) Spike-gated residual and dynamic gating module
- Python 3.10+
- PyTorch
- Transformers (Hugging Face)
Install dependencies:
pip install -r requirements.txt| Model | Params | HellaSwag ↑ | Wiki-2 ↓ | Active Layers ↓ | Speedup ↑ | C4 | LAMBADA | Avg. |
|---|---|---|---|---|---|---|---|---|
| TinyLlama-1.1B | 1.1B | 44.0% | 10.18 | 100% | 1.00x | 11.00 | 22.30 | 14.49 |
| Phi-2 | 2.7B | 52.0% | 13.14 | 100% | 0.41x* | 16.71 | 37.02 | 22.29 |
| Qwen-2.5-1.5B | 1.5B | 52.0% | 12.49 | 100% | 0.73x* | 19.03 | 28.85 | 20.12 |
| DeepSeek-MoE | 1.3B | 46.0% | 11.26 | 100% | 1.00x* | 14.84 | 23.34 | 16.48 |
| S-GRU (Ours) | 1.1B | 41.0% | 12.45 | 76.5% | 1.31x | 16.50 | 28.68 | 19.21 |
Theoretical speedup relative to TinyLlama-1.1B base throughput.
python -m src.train \
--model <model_name> \
--device <device> \
--lambda_sparsity <value> \
--max_steps <steps> \
--batch_size <batch_size>For questions or issues, please open a GitHub issue.
For direct contact: 📧 kiran.prasannannair@coyotes.usd.edu

