Skip to content

USD-AI-ResearchLab/spike-gated-residual-unit

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spike-Gated Residual Unit (S-GRU)

🔗 Paper (OpenReview)

CVPR EDGE 2026 (Poster)

Kiran Nair, Rodrigue Rizk, KC Santosh
USD Artificial Intelligence Research
Department of Computer Science, University of South Dakota, USA


🚀 News

  • Mar. 23, 2026: Accepted as a Poster at CVPR EDGE 2026
  • Apr. 04, 2026: Initial codebase released

🧠 Abstract

Large Language Models (LLMs) achieve state-of-the-art performance but incur substantial computational and energy costs due to their dense, fixed-depth Transformer architectures. We introduce the Spike-Gated Residual Unit (S-GRU), a lightweight module that enables dynamic depth adaptation in pretrained Transformers without modifying backbone weights. By inserting spike-gated units into each residual block and optimizing a sparsity-aware objective controlled by a regularization coefficient $\lambda_{\text{sparsity}}$, the model learns to selectively bypass redundant layers during inference. Using a gate-only fine-tuning strategy on TinyLlama-1.1B, S-GRU reduces the average active depth while maintaining competitive performance, achieving significant efficiency gains. Layer-wise analysis reveals an emergent hierarchy where early and late layers remain critical, while intermediate layers are dynamically suppressed under sparsity constraints. These results demonstrate that S-GRU provides a practical, software-level pathway toward energy-efficient Transformer inference, enabling controllable trade-offs along an efficiency–intelligence Pareto frontier without requiring full retraining or specialized hardware.


🏗️ Architecture

(a) Transformer architecture with S-GRU applied to each decoder layer
(b) Spike-gated residual and dynamic gating module


⚙️ Requirements

  • Python 3.10+
  • PyTorch
  • Transformers (Hugging Face)

Install dependencies:

pip install -r requirements.txt

📊 Results

Model Params HellaSwag ↑ Wiki-2 ↓ Active Layers ↓ Speedup ↑ C4 LAMBADA Avg.
TinyLlama-1.1B 1.1B 44.0% 10.18 100% 1.00x 11.00 22.30 14.49
Phi-2 2.7B 52.0% 13.14 100% 0.41x* 16.71 37.02 22.29
Qwen-2.5-1.5B 1.5B 52.0% 12.49 100% 0.73x* 19.03 28.85 20.12
DeepSeek-MoE 1.3B 46.0% 11.26 100% 1.00x* 14.84 23.34 16.48
S-GRU (Ours) 1.1B 41.0% 12.45 76.5% 1.31x 16.50 28.68 19.21

Theoretical speedup relative to TinyLlama-1.1B base throughput.


🚀 Training & Evaluation

python -m src.train \
  --model <model_name> \
  --device <device> \
  --lambda_sparsity <value> \
  --max_steps <steps> \
  --batch_size <batch_size>

📬 Contact Information

For questions or issues, please open a GitHub issue.

For direct contact: 📧 kiran.prasannannair@coyotes.usd.edu

About

Official Implementation of "Spike-Gated Residual Unit" CVPR EDGE 2026

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 100.0%