Skip to content

Latest commit

 

History

History
130 lines (90 loc) · 5.17 KB

File metadata and controls

130 lines (90 loc) · 5.17 KB

Calibration Under Corruption

This work specifically investigates the intersection of model efficiency (via pruning) and calibration quality under distribution shift, a critical but under-explored area in robust machine learning.

Overview

This repository investigates how calibration techniques perform when deep learning models encounter corrupted or out-of-distribution data. A systematic exploration of training-time calibration methods (Mixup, Label Smoothing) and post-hoc calibration (Temperature Scaling) on ResNet-18 models evaluated on CIFAR-10-C corruption benchmarks.

Key contributions:

  • Training-time vs. post-hoc calibration under corruption
  • Impact of model pruning on calibration quality
  • Comprehensive evaluation across 19 corruption types × 5 severity levels (95 conditions)

Quick Start

Installation

git clone https://github.com/devangsaraogi/calibration-under-corruption.git
cd calibration-under-corruption
pip install -r requirements.txt

For detailed setup instructions (local + HPC cluster), see SETUP.md.

Run Experiments

# Train baseline model
python scripts/training_with_calibration.py --experiment-name resnet18_baseline --no-mixup --label-smoothing 0.0

# Train with Mixup + Label Smoothing
python scripts/training_with_calibration.py --experiment-name resnet18_mixup_ls --mixup --label-smoothing 0.1

# Evaluate on CIFAR-10-C
python scripts/eval/evaluate_model.py --model-path models/resnet18_baseline.pth --experiment-name resnet18_baseline

# Apply pruning
python scripts/pruning/prune_model.py --model-path models/resnet18_baseline.pth --pruning-amount 0.7

# Temperature scaling
python scripts/calibration/temperature_scaling.py --model-path models/resnet18_baseline.pth

HPC Cluster

cd cluster/jobs
bsub < train_baseline.sh    # Submit training job
bjobs                        # Check status
tail -f ../../logs/train_*.out  # Monitor output

See SETUP.md for complete cluster instructions.

Repository Structure

calibration-under-corruption/
├── cluster/                    # HPC cluster scripts (LSF jobs)
├── scripts/                    # Core training/evaluation code
├── utils/                      # Calibration metrics (ECE, MCE, NLL)
├── results/                    # Experimental results (32 JSON files)
├── experiments/                # Experiment tracking logs
├── configs/                    # Model configurations
└── logs/                       # Cluster job outputs

Key Findings

Experiments on CIFAR-10-C (19 corruption types × 5 severity levels) revealed:

  1. Training-time calibration improves robustness Mixup+LS achieved 74.72% accuracy under corruption vs. 71.62% baseline (+3.1%)

  2. Pruning degrades calibration more than accuracy At 70% pruning, ECE increased significantly while accuracy dropped moderately

  3. Temperature scaling recovers calibration Post-hoc calibration reduced ECE across all pruning levels, even at 90% sparsity

  4. Trade-offs exist between calibration and accuracy Label Smoothing improved calibration (lower ECE) but slightly reduced clean accuracy

Full results: 32 evaluation JSONs in results/evaluation_results/ covering 4 training configs × 4 pruning levels × 2 calibration states.

Methods

Calibration Techniques

  • Mixup (α=0.2): Training-time data augmentation via linear interpolation
  • Label Smoothing (ε=0.1): Soft target labels to prevent overconfidence
  • Temperature Scaling: Post-hoc logit scaling on validation set

Model Compression

  • Magnitude Pruning: Global unstructured L1-based pruning at 50%, 70%, 90% sparsity
  • Fine-tuning: 10 epochs with Adam optimizer (lr=0.001)

Evaluation

  • CIFAR-10-C: 19 corruption types (noise, blur, weather, digital)
  • Metrics: Accuracy, ECE (15 bins), MCE, NLL

Citation

If you use this code or findings in your research, please cite:

@misc{saraogi2025calibration,
  author = {Saraogi, Devang},
  title = {Calibration Under Corruption: A Study of Neural Network Calibration Techniques Under Distribution Shift},
  year = {2025},
  publisher = {GitHub},
  howpublished = {\url{https://github.com/devangsaraogi/calibration-under-corruption}},
  note = {Course project for CSC 591: Deep Learning Beyond Accuracy, NC State University}
}

Acknowledgments

Datasets & Methods:

  • CIFAR-10-C: Benchmarking Neural Network Robustness (Hendrycks & Dietterich, 2019)
  • Temperature Scaling: On Calibration of Modern Neural Networks (Guo et al., 2017)
  • Mixup: Beyond Empirical Risk Minimization (Zhang et al., 2018)
  • Label Smoothing: Rethinking Inception Architecture (Szegedy et al., 2016)

This project was completed as part of CSC 591: Deep Learning Beyond Accuracy at North Carolina State University (Fall 2025), instructed by Dr. Jung-Eun Kim.

The course explores deep neural networks with a focus beyond traditional accuracy metrics, emphasizing resource considerations and other dimensions such as fairness, privacy, and sustainability.