Papers with Data

Data reigns supreme 🥇

Every day it becomes more evident that data is the limiting factor for state-of-the-art 📈 machine learning. Your model architecture may be revolutionary, but without high-quality data 📊 to train on, it will be doomed to mediocrity.

Pair idea with execution and use top-notch data in your next project!

NeurIPS 2023

We've combed through the 2384 papers accepted to NeurIPS in 2023 and compiled a short-list of papers introducing exciting new datasets.

Title	Tags	Paper	Dataset	Code
DreamSim: Learning New Dimensions of Human Visual Similarity using Synthetic Data	`perceptual similarity`, `image`, `synthetic`, `diffusion`, `JND`, `2AFC`
Visual Instruction Tuning	`vision-language`, `llm`, `instruction-tuning`, `image`, `multimodal`
ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation	`reward-model`, `image`, `text-to-image`, `synthetic`, `human-preference`, `alignment`
MagicBrush: A Manually Annotated Dataset for Instruction-Guided Image Editing	`image-editing`, `synthetic`, `image`, `instruction`
REAL3D-AD	`3D`, `point-cloud`, `anomaly-detection`

WACV 2024

Title	Tags	Paper	Dataset	Code
dacl10k: Benchmark for Semantic Bridge Damage Segmentation	`image`, `semantic segmentation`, `classification`, `construction`, `defect`

ICCV 2023

Title	Tags	Paper	Dataset	Code
Satlas: A Large-Scale, Multi-Task Dataset for Remote Sensing Image Understanding	`image`, `SAR`, `satellite`, `detection`, `climate`
Building3D: An Urban-Scale Dataset and Benchmarks for Learning Roof Structures from Point Clouds	`3D`, `point cloud`
EgoObjects: A Large-Scale Egocentric Dataset for Fine-Grained Object Understanding	`image`, `object`, `ego`
Equivariant Similarity for Vision-Language Foundation Models	`image`, `similarity`, `caption`
MOSE: A New Dataset for Video Object Segmentation in Complex Scenes	`video`, `segmentation`, `tracking`
SportsMOT: A Large Multi-Object Tracking Dataset in Multiple Sports Scenes	`multi-object tracking`, `sports`

CVPR 2023

We've combed through the 2359 papers accepted to CVPR in 2023 and compiled a short-list of papers introducing exciting new datasets.

Title	Tags	Paper	Dataset	Code
MVImgNet: A Large-scale Dataset of Multi-view Images	`multi-view`, `image`
GeoNet: Benchmarking Unsupervised Adaptation across Geographies	`geolocation`, `image`
Joint HDR Denoising and Fusion: A Real-World Mobile HDR Image Dataset	`denoising`, `image`
Spring: A High-Resolution High-Detail Dataset and Benchmark for Scene Flow, Optical Flow and Stereo	`optical flow`, `stereo`, `image`
ImageNet-E: Benchmarking Neural Network Robustness via Attribute Editing	`image`, `editing`
ARKitTrack: A New Diverse Dataset for Tracking Using Mobile RGB-D Data	`RGB-D`, `segmentation`, `video`
Diverse Embedding Expansion Network and Low-Light Cross-Modality Benchmark for Visible-Infrared Person Re-identification	`low-light`, `cross-modal`, `IR`
JRDB-Pose: A Large-scale Dataset for Multi-Person Pose Estimation and Tracking	`pose estimation`, `image`, `keypoint`, `tracking`
A New Benchmark: On the Utility of Synthetic Data with Blender for Bare Supervised Learning and Downstream Domain Adaptation	`synthetic`, `domain adaptation`, `supervised`

Papers from 2022

Title	Tags	Paper	Dataset	Code
Calving fronts and where to find them: a benchmark dataset and methodology for automatic glacier calving front extraction from synthetic aperture radar imagery	`glacier`, `climate`, `SAR`, `satellite`, `image`, `semantic segmentation`
The Caltech Fish Counting Dataset: A Benchmark for Multiple-Object Tracking and Counting	`conservation`, `detection`, `SONAR`, `video`, `tracking`, `counting`

Classics

Title	Tags	Paper	Dataset	Code
ChestX-ray8: Hospital-scale Chest X-ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases	`x-ray`, `image`, `healthcare`, `detection`

Contributing 👋

We would love your help in making this repository even better! If we missed a paper that introduced a new dataset, or if you can think of any ways to improve the repository, feel free to open an issue or a pull request.

Note

This repository is inspired by paperswithcode, and the template was adapted from top-cvpr-2023-papers.

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
assets		assets
automation		automation
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Papers with Data

NeurIPS 2023

WACV 2024

ICCV 2023

CVPR 2023

Papers from 2022

Classics

Contributing 👋

Note

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Papers with Data

NeurIPS 2023

WACV 2024

ICCV 2023

CVPR 2023

Papers from 2022

Classics

Contributing 👋

Note

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages