This repository provides the implementation for our KDD 2026 paper "How Well Does Generative Recommendation Generalize?"
In this work, we study the memorization and generalization behavior of generative recommendation (GR) models. We introduce a fine-grained evaluation framework that categorizes test instances by memorization and generalization patterns, and a token-level memorization analysis that explains why GR generalizes better but memorizes worse than conventional models. We further propose an adaptive ensemble method that leverages confidence-based indicators to combine GR and conventional models, improving overall performance.
We release instance-level memorization/generalization annotations and saved model checkpoints for the 7 open-source datasets used in the paper.
conda env create -f environment.yml
conda activate GenRec
pip install -r requirements.txtTrain SASRec or TIGER on a single GPU:
CUDA_VISIBLE_DEVICES=0 python main.py \
--model=SASRec \
--dataset=AmazonReviews2014 \
--category=Sports_and_OutdoorsCUDA_VISIBLE_DEVICES=0 python main.py \
--model=TIGER \
--dataset=AmazonReviews2014 \
--category=Sports_and_OutdoorsMulti-GPU training with accelerate:
accelerate launch --num_processes=2 --mixed_precision=fp16 main.py \
--model=TIGER \
--dataset=AmazonReviews2014 \
--category=Sports_and_OutdoorsTraining parameters can be overridden via command line (see genrec/default.yaml for all options).
Evaluate a trained model with memorization/generalization breakdown:
CUDA_VISIBLE_DEVICES=0 python mem_gen_evaluation.py \
--model=TIGER \
--dataset=AmazonReviews2014 \
--category=Sports_and_Outdoors \
--checkpoint_path=path/to/TIGER.pth \
--sem_ids_path=path/to/semantic_ids.sem_ids \
--eval=test \
--save_inferenceTo evaluate across all datasets for both models:
bash scripts/eval/eval_mem_gen.shScripts under scripts/analysis/ reproduce the analysis results in the paper. For example, to reproduce the support coverage analysis:
bash scripts/analysis/run_support_coverage.shOther analysis scripts include run_performance_analysis.sh, run_codebook_intervention.sh, run_indicator_validation.sh.
Run inference for both models and perform the adaptive ensemble grid search:
bash scripts/eval/eval_adaptive_ensemble.sh