Skip to content

phonism/genrec

Repository files navigation

GenRec

License: MIT Python 3.9+ PyTorch

A Model Zoo for Generative Recommendation.

Benchmark Results

Evaluation Protocol

Following TIGER, LC-Rec, and OpenOneRec:

  • Dataset: Amazon 2014 with 5-core filtering (users and items with < 5 interactions removed)
  • Split: Leave-one-out (last item for test, second-to-last for validation, rest for training)
  • Ranking: Full-item-set ranking over all items (no negative sampling)
  • Max sequence length: 50 for all models
  • Metrics: Recall@K and NDCG@K (K=5, 10)
  • HSTU: Tested with both full-vocabulary cross-entropy (CE) and sampled softmax (SS, 128 negatives, temp=0.05, L2 norm)

Recall@10 Comparison

Amazon 2014 Beauty

Methods R@5 R@10 N@5 N@10
SASRec (CE) 0.0538 0.0851 0.0320 0.0421
SASRec (BCE) 0.0258 0.0503 0.0137 0.0216
HSTU (CE) 0.0568 0.0859 0.0347 0.0441
HSTU (SS) 0.0414 0.0727 0.0235 0.0335
TIGER 0.0419 0.0644 0.0282 0.0354
LCRec 0.0481 0.0704 0.0331 0.0403
OneRec-SFT (1.7B) 0.0578 0.0816 0.0398 0.0475

Amazon 2014 Sports

Methods R@5 R@10 N@5 N@10
SASRec (CE) 0.0321 0.0495 0.0191 0.0248
SASRec (BCE) 0.0156 0.0291 0.0085 0.0128
HSTU (CE) 0.0283 0.0439 0.0182 0.0232
HSTU (SS) 0.0246 0.0393 0.0143 0.0191
TIGER 0.0236 0.0377 0.0150 0.0195
LCRec 0.0238 0.0360 0.0159 0.0198
OneRec-SFT (1.7B) 0.0299 0.0436 0.0200 0.0244

Amazon 2014 Toys

Methods R@5 R@10 N@5 N@10
SASRec (CE) 0.0613 0.0922 0.0348 0.0448
SASRec (BCE) 0.0353 0.0594 0.0186 0.0264
HSTU (CE) 0.0611 0.0914 0.0363 0.0461
HSTU (SS) 0.0494 0.0795 0.0277 0.0375
TIGER 0.0340 0.0521 0.0214 0.0272
LCRec 0.0433 0.0614 0.0310 0.0368
OneRec-SFT (1.7B) 0.0545 0.0790 0.0383 0.0462

Amazon 2014 Home

Methods R@5 R@10 N@5 N@10
SASRec (CE) 0.0177 0.0277 0.0106 0.0138
SASRec (BCE) 0.0081 0.0143 0.0046 0.0066
HSTU (CE) 0.0129 0.0208 0.0084 0.0109
HSTU (SS) 0.0123 0.0193 0.0079 0.0102
TIGER 0.0145 0.0231 0.0096 0.0123
LCRec 0.0163 0.0234 0.0110 0.0133
OneRec-SFT (1.7B) 0.0166 0.0246 0.0112 0.0138

Features

  • Multiple Models: Implementations of SASRec, HSTU, RQVAE, TIGER, LCRec, COBRA, and NoteLLM
  • Multiple Datasets: Amazon 2014 (Beauty, Sports, Toys, Clothing) and Amazon 2023 (32 categories)
  • Modular Design: Clean separation of models, data, and training logic
  • Flexible Configuration: Gin-config based experiment management
  • Easy Extension: Add custom datasets and models with minimal code
  • Reproducible: Consistent evaluation metrics (Recall@K, NDCG@K) with W&B logging

Models

Model Type Description
SASRec Baseline Self-Attentive Sequential Recommendation
HSTU Baseline Hierarchical Sequential Transduction Unit with temporal bias
RQVAE Generative Residual Quantized VAE for semantic ID generation
TIGER Generative Generative Retrieval with trie-based constrained decoding
LCRec Generative LLM-based recommendation with collaborative semantics
COBRA Generative Cascaded sparse-dense representations
NoteLLM Generative Retrievable LLM for note recommendation (experimental)

Installation

From Source (Recommended)

git clone https://github.com/phonism/genrec.git
cd genrec
pip install -e .

Full Installation (with Triton, TorchRec, etc.)

pip install -e ".[full]"

Dependencies Only

pip install -r requirements.txt

Quick Start

Train Baseline Models

# SASRec on Amazon 2014
python genrec/trainers/sasrec_trainer.py config/sasrec/amazon.gin --split beauty

# HSTU on Amazon 2014
python genrec/trainers/hstu_trainer.py config/hstu/amazon.gin --split beauty

# SASRec on Amazon 2023
python genrec/trainers/sasrec_trainer.py config/sasrec/amazon2023.gin

# HSTU on Amazon 2023
python genrec/trainers/hstu_trainer.py config/hstu/amazon2023.gin

Train RQVAE (Semantic ID Generator)

# For TIGER pipeline
python genrec/trainers/rqvae_trainer.py config/tiger/amazon/rqvae.gin --split beauty

# For LCRec pipeline
python genrec/trainers/rqvae_trainer.py config/lcrec/amazon/rqvae.gin --split beauty

# For COBRA pipeline
python genrec/trainers/rqvae_trainer.py config/cobra/amazon/rqvae.gin --split beauty

Train TIGER (Generative Retrieval)

# Requires pretrained RQVAE checkpoint
python genrec/trainers/tiger_trainer.py config/tiger/amazon/tiger.gin --split beauty

# On Amazon 2023
python genrec/trainers/tiger_trainer.py config/tiger/amazon2023/tiger.gin

Train LCRec (LLM-based)

# Requires pretrained RQVAE checkpoint
python genrec/trainers/lcrec_trainer.py config/lcrec/amazon/lcrec.gin --split beauty

# On Amazon 2023
python genrec/trainers/lcrec_trainer.py config/lcrec/amazon2023/lcrec.gin

Train COBRA

# Requires pretrained RQVAE checkpoint
python genrec/trainers/cobra_trainer.py config/cobra/amazon/cobra.gin --split beauty

Configuration

Dataset Selection

# Amazon 2014 datasets (via --split)
--split beauty    # Beauty
--split sports    # Sports and Outdoors
--split toys      # Toys and Games
--split clothing  # Clothing, Shoes and Jewelry

# Amazon 2023 datasets use dedicated config files
config/sasrec/amazon2023.gin
config/hstu/amazon2023.gin
config/tiger/amazon2023/tiger.gin
config/lcrec/amazon2023/lcrec.gin

Parameter Override

--gin "param=value"

Examples

# Change epochs and batch size
python genrec/trainers/tiger_trainer.py config/tiger/amazon/tiger.gin \
    --split beauty \
    --gin "train.epochs=200" \
    --gin "train.batch_size=128"

# Custom model path for LCRec
python genrec/trainers/lcrec_trainer.py config/lcrec/amazon/lcrec.gin \
    --split beauty \
    --gin "MODEL_HUB_QWEN3_1_7B='/path/to/model'"

Project Structure

genrec/
├── genrec/
│   ├── models/          # Model implementations
│   │   ├── sasrec.py        # SASRec
│   │   ├── hstu.py          # HSTU
│   │   ├── rqvae.py         # RQVAE
│   │   ├── tiger.py         # TIGER
│   │   ├── lcrec.py         # LCRec
│   │   ├── cobra.py         # COBRA
│   │   └── notellm.py       # NoteLLM
│   ├── trainers/        # Training scripts
│   │   ├── sasrec_trainer.py
│   │   ├── hstu_trainer.py
│   │   ├── rqvae_trainer.py
│   │   ├── tiger_trainer.py
│   │   ├── lcrec_trainer.py
│   │   ├── cobra_trainer.py
│   │   └── trainer_utils.py
│   ├── modules/         # Reusable components
│   │   ├── transformer.py   # Transformer blocks
│   │   ├── embedding.py     # Embedding layers
│   │   ├── encoder.py       # Encoder modules
│   │   ├── metrics.py       # Recall@K, NDCG@K
│   │   ├── loss.py          # Loss functions
│   │   ├── scheduler.py     # LR schedulers
│   │   ├── kmeans.py        # K-means for RQVAE init
│   │   ├── gumbel.py        # Gumbel softmax
│   │   └── normalize.py     # Normalization layers
│   └── data/            # Dataset implementations
│       ├── amazon.py        # Amazon 2014 datasets
│       ├── amazon2023.py    # Amazon 2023 datasets (32 categories)
│       ├── amazon_sasrec.py # SASRec-specific data
│       ├── amazon_hstu.py   # HSTU-specific data
│       ├── amazon_lcrec.py  # LCRec-specific data
│       ├── amazon_cobra.py  # COBRA-specific data
│       └── p5_amazon.py     # P5-format data
├── config/              # Gin configuration files
│   ├── base.gin             # Base config
│   ├── sasrec/              # SASRec configs
│   ├── hstu/                # HSTU configs
│   ├── tiger/               # TIGER configs (amazon/, amazon2023/)
│   ├── lcrec/               # LCRec configs (amazon/, amazon2023/)
│   └── cobra/               # COBRA configs
├── scripts/             # Utility scripts
├── docs/                # Documentation (English & Chinese)
├── assets/              # Media assets
└── reference/           # Reference implementations

Documentation

Full documentation is available at https://phonism.github.io/genrec

Contributing

We welcome contributions! Please see our Contributing Guide for details.

Citation

If you find this project useful, please cite:

@software{genrec2025,
  title = {GenRec: A Model Zoo for Generative Recommendation},
  author = {Qi Lu},
  year = {2025},
  url = {https://github.com/phonism/genrec}
}

References

  • SASRec: Self-Attentive Sequential Recommendation
  • HSTU: Hierarchical Sequential Transduction Units
  • TIGER: Recommender Systems with Generative Retrieval
  • RQ-VAE-Recommender by Edoardo Botta
  • LC-Rec: LLM-based Collaborative Recommendation
  • COBRA: Cascaded Sparse-Dense Representations
  • NoteLLM: A Retrievable LLM for Note Recommendation

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

GenRec: Generative Recommender Systems with RQ-VAE semantic IDs, Transformer-based retrieval, and LLM integration. Built on PyTorch with distributed training support.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages