Skip to content

Alishahryar1/avazu-ctr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

335 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ”¬ CTR Architecture Research Laboratory

πŸ† Advancing State-of-the-Art Click-Through Rate Prediction via Literature-Hybrid Architectures

PyTorch Python Optuna Polars


🎯 Best Private LogLoss πŸ“Š Best Public LogLoss ⚑ Training Time
0.38484 0.38671 ~45 min (single epoch)

πŸ› Research Vision

This project serves as a laboratory for exploring and synthesizing state-of-the-art architectures in Click-Through Rate (CTR) prediction. Rather than implementing a single traditional model, we focus on Hybrid Architecture Synthesisβ€”combining orthogonal strengths from various seminal research papers into unified, high-performance encoders.

Our primary goal is to investigate how explicit cross-networks, attention-based encoders, and field-level importance gating can be fused to capture complex feature interactions in high-cardinality sparse datasets like Avazu.


πŸ† Best Results & Optimal Configuration

Our best submission achieved a Private LogLoss of 0.38484 and Public LogLoss of 0.38671 on the Avazu CTR Prediction competition. Below are the optimal hyperparameters discovered through extensive Optuna-based Bayesian optimization.

πŸ’‘ Tip: You can modify these parameters in config.py to experiment with different configurations.

πŸ“‹ Click to expand full optimal configuration

πŸ”§ Model Architecture

Component Parameter Value
Backbone Type gated_dcn
Diversity Weight 0.001177
Feature Bagging Ratio 0.827
Aggregation Method mean

🌐 DCN (Deep Cross Network)

Parameter Value
Enabled βœ… True
Layers 13
Low Rank 52
LayerNorm βœ… True

🧠 MLP Backbone

Parameter Value
Hidden Dims [1408]
Activation relu
Dropout 0.101
Skip Connections βœ… True
LayerNorm βœ… True

🎯 Feature Gating

Parameter Value
Enabled βœ… True
Activation gelu
Low Rank None

πŸ”€ Diverse Prediction Heads

Head Hidden Dims Activation Dropout LayerNorm
1 [128] tanh 0.455 ❌
2 [32] tanh 0.383 ❌
3 [512] silu 0.413 βœ…
4 [16] mish 0.068 βœ…

⚑ Optimizer Configuration

Dense Parameters (AdamW)

Parameter Value
Learning Rate 2.234e-4
Weight Decay 3.203e-5
Warmup Ratio 0.402
Decay Type none

Embedding Parameters (Adagrad)

Parameter Value
Learning Rate 0.589
Weight Decay 0.0
Warmup Ratio 0.346
Decay Type linear
Min LR 2.04e-7

πŸ“ Training Settings

Parameter Value
Batch Size 4096
Epochs 1
Gradient Clipping 4.968
AMP βœ… float16
Compile βœ… torch.compile

πŸ“š Literature-Informed Architectural Pillars

The laboratory implements and synthesizes ideas from several key research directions:

1. Deep & Cross Network Evolution (DCNv2)

  • Source: DCN V2: Improved Deep & Cross Network (Wang et al., 2021)
  • Mechanism: Uses learnable weight matrices to model explicit, bounded-degree polynomial feature interactions.
  • Hybrid Implementation: Supports low-rank decomposition for parameter efficiency and gated units for non-linear interaction selection.

2. Squeeze-Excitation & Bilinear Interaction (FiBiNET/++)

  • Source: FiBiNET: Combining Feature Importance and Bilinear feature Interaction (Huang et al., 2019)
  • Mechanism: SENet layer dynamically learns field-level importance weights, followed by a Bilinear Interaction layer.
  • Hybrid Implementation: Incorporates multi-mode squeezing (Mean, Max, Min, Std) and grouped squeeze operations.

3. See-Through Transformer Encoding (STEC)

  • Source: STEC-Transformer: See-Through Transformer-based Encoder for CTR
  • Mechanism: A transformer-based encoder that extracts multi-head group bilinear interactions directly from attention mechanisms.
  • Hybrid Implementation: Features "See-Through" paths that preserve signal flow from all layers to the prediction head.

4. Multi-Head Diversity Enrichment

  • Source: Research into Deep Ensembles & Diversity Regularization
  • Mechanism: Utilizes a Shared Backbone with Diverse Prediction Heads, regularized by a Diversity Loss term.
  • Implementation: Features Feature Bagging (random field masking per head) and gated logit aggregation.

πŸ— The Hybrid: MultiHeadDiversityModel

The flagship architecture of this lab is the MultiHeadDiversityModel. It represents our current best attempt at architectural synthesis:

graph TD
    subgraph Input["πŸ”Œ Sparse Input"]
        F1[Fields 1..N] --> EMB[Hybrid Embedding Layer]
        EMB --> BAG[Feature Bagging / Masking]
    end

    subgraph Backbone["🧠 Shared Research Backbone"]
        BAG --> FG[Feature Gating Layer]
        FG --> DCN["DCNv2 Cross Layers<br/>(13 layers, rank 52)"]
        DCN --> MLP["Residual MLP<br/>(1408 units)"]
    end

    subgraph DiverseHeads["🎯 Multi-Head Prediction"]
        MLP --> H1["Head 1: tanh<br/>(128 units)"]
        MLP --> H2["Head 2: tanh<br/>(32 units)"]
        MLP --> H3["Head 3: silu<br/>(512 units)"]
        MLP --> H4["Head 4: mish<br/>(16 units)"]
    end

    subgraph Aggregation["πŸ”— Adaptive Fusion"]
        H1 & H2 & H3 & H4 --> AGG[Mean Aggregation]
        AGG --> OUT[Final CTR Probability]
    end

    subgraph Optimization["πŸ“‰ Objective Function"]
        OUT --> BCE[BCE Loss]
        H1 & H2 & H3 & H4 --> DIV["Diversity Regularization<br/>(Ξ» = 0.00118)"]
        BCE & DIV --> LOSS[Total Multi-Objective Loss]
    end
Loading

πŸš€ Experimental Framework

Automated Hyperparameter Optimization (Optuna)

We use Optuna to navigate the vast search space (~34 parameters) of our hybrid architectures. Our advanced tuning script supports:

Feature Description
🌳 TPE Sampler Tree-structured Parzen Estimator for Bayesian search
βœ‚οΈ MedianPruner Aggressive early stopping of unpromising trials
πŸ’Ύ SQLite Persistence Resume large-scale studies across sessions
πŸ“Š Real-time Dashboard Optuna Dashboard for visualization
# Launch a 100-trial optimization study
python misc/tune_hyperparams.py --n-trials 100 --timeout 28800

Key Search Dimensions

  • πŸ”’ Interaction Depth: Number of DCN layers vs. Transformer layers
  • πŸŽ›οΈ Diversity Calibration: Tuning the weight of diversity regularization
  • 🎨 Per-Head Hyperparameters: Individual activation functions and skip-connection strategies
  • πŸ“ Embedding Dynamics: Adaptive learning rates for sparse vs. dense parameters

πŸ›  Project Structure

avazu-ctr/
β”œβ”€β”€ πŸ“‚ src/
β”‚   β”œβ”€β”€ πŸ“‚ models/
β”‚   β”‚   β”œβ”€β”€ πŸ“‚ architectures/     # Full hybrid implementations (STEC, MultiHeadDiversity, GatedDCN)
β”‚   β”‚   └── πŸ“‚ layers/            # Primitive research blocks (CrossNetwork, SENet, FeatureGating)
β”‚   β”œβ”€β”€ πŸ“‚ training/              # Training engine with hybrid optimizer support
β”‚   └── πŸ“‚ config_types/          # Type definitions for configuration validation
β”œβ”€β”€ πŸ“‚ misc/                      # Research tools (tune_hyperparams.py, EDA scripts)
β”œβ”€β”€ πŸ“‚ papers/                    # Foundational research papers
β”œβ”€β”€ πŸ“‚ data/                      # Raw and processed datasets
β”œβ”€β”€ πŸ“„ pyproject.toml             # Project config & dependencies (uv)
β”œβ”€β”€ πŸ“„ uv.lock                    # Locked dependency versions
β”œβ”€β”€ πŸ“„ config.py                  # Best hyperparameter configuration
β”œβ”€β”€ πŸ“„ data_processor.py          # Polars-based streaming data pipeline
└── πŸ“„ train.py                   # Main training entry point

πŸ“ˆ Getting Started

1️⃣ Environment Setup

This project uses uv for fast, reliable dependency management.

# Install uv (if not already installed)
curl -LsSf https://astral.sh/uv/install.sh | sh

# Sync dependencies (PyTorch CUDA 13.0). For CPU-only, omit the env var.
UV_TORCH_BACKEND=cu130 uv sync --extra dev

2️⃣ Data Pipeline

# Blazing fast Polars-based streaming processing
uv run python data_processor.py

3️⃣ Research Loop

# 1. Start a tuning study to find architectural sweet spots
uv run python misc/tune_hyperparams.py --n-trials 50

# 2. Train the full model with best config
uv run python train.py

# 3. Analyze results via TensorBoard
uv run tensorboard --logdir=runs

Development

# Run tests
uv run pytest

# Format and lint
uv run ruff format . && uv run ruff check .

# Type check
uv run ty check

πŸ“Š Performance Highlights

Metric Value
🎯 Private LogLoss 0.38484
πŸ“‰ Public LogLoss 0.38671
⏱️ Training Time ~45 minutes
πŸ’Ύ Model Parameters ~50M
πŸ”§ Epochs 1 (single pass)

πŸ“„ License & Acknowledgments

  • Foundation: Avazu CTR Prediction Dataset
  • Architecture: Synthesized from DCNv2, FiBiNET, and STEC papers
  • Tools: Built with PyTorch, Polars, and Optuna

Licensed under the MIT License


Built with ❀️ for the CTR research community

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages