Data reigns supreme ๐ฅ
Every day it becomes more evident that data is the limiting factor for
state-of-the-art ๐ machine learning. Your model architecture may be
revolutionary, but without high-quality data ๐ to train on, it will be doomed
to mediocrity.
Pair idea with execution and use top-notch data in your next project!
We've combed through the 2384 papers accepted to NeurIPS in 2023 and compiled
a short-list of papers introducing exciting new datasets.
| Title |
Tags |
Paper |
Dataset |
Code |
| DreamSim: Learning New Dimensions of Human Visual Similarity using Synthetic Data |
perceptual similarity, image, synthetic, diffusion, JND, 2AFC |
 |
 |
 |
| Visual Instruction Tuning |
vision-language, llm, instruction-tuning, image, multimodal |
 |
 |
 |
| ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation |
reward-model, image, text-to-image, synthetic, human-preference, alignment |
 |
 |
 |
| MagicBrush: A Manually Annotated Dataset for Instruction-Guided Image Editing |
image-editing, synthetic, image, instruction |
 |
 |
 |
| REAL3D-AD |
3D, point-cloud, anomaly-detection |
 |
 |
 |
| Title |
Tags |
Paper |
Dataset |
Code |
| dacl10k: Benchmark for Semantic Bridge Damage Segmentation |
image, semantic segmentation, classification, construction, defect |
 |
 |
 |
| Title |
Tags |
Paper |
Dataset |
Code |
| Satlas: A Large-Scale, Multi-Task Dataset for Remote Sensing Image Understanding |
image, SAR, satellite, detection, climate |
 |
 |
 |
| Building3D: An Urban-Scale Dataset and Benchmarks for Learning Roof Structures from Point Clouds |
3D, point cloud |
 |
 |
|
| EgoObjects: A Large-Scale Egocentric Dataset for Fine-Grained Object Understanding |
image, object, ego |
 |
 |
 |
| Equivariant Similarity for Vision-Language Foundation Models |
image, similarity, caption |
 |
 |
 |
| MOSE: A New Dataset for Video Object Segmentation in Complex Scenes |
video, segmentation, tracking |
 |
 |
|
| SportsMOT: A Large Multi-Object Tracking Dataset in Multiple Sports Scenes |
multi-object tracking, sports |
 |
 |
 |

We've combed through the 2359 papers accepted to CVPR in 2023 and compiled
a short-list of papers introducing exciting new datasets.
| Title |
Tags |
Paper |
Dataset |
Code |
| MVImgNet: A Large-scale Dataset of Multi-view Images |
multi-view, image |
 |
 |
 |
| GeoNet: Benchmarking Unsupervised Adaptation across Geographies |
geolocation, image |
 |
 |
|
| Joint HDR Denoising and Fusion: A Real-World Mobile HDR Image Dataset |
denoising, image |
|
 |
 |
| Spring: A High-Resolution High-Detail Dataset and Benchmark for Scene Flow, Optical Flow and Stereo |
optical flow, stereo, image |
 |
 |
|
| ImageNet-E: Benchmarking Neural Network Robustness via Attribute Editing |
image, editing |
 |
 |
 |
| ARKitTrack: A New Diverse Dataset for Tracking Using Mobile RGB-D Data |
RGB-D, segmentation, video |
 |
 |
 |
| Diverse Embedding Expansion Network and Low-Light Cross-Modality Benchmark for Visible-Infrared Person Re-identification |
low-light, cross-modal, IR |
 |
 |
 |
| JRDB-Pose: A Large-scale Dataset for Multi-Person Pose Estimation and Tracking |
pose estimation, image, keypoint, tracking |
 |
 |
|
| A New Benchmark: On the Utility of Synthetic Data with Blender for Bare Supervised Learning and Downstream Domain Adaptation |
synthetic, domain adaptation, supervised |
 |
 |
 |
| Title |
Tags |
Paper |
Dataset |
Code |
| Calving fronts and where to find them: a benchmark dataset and methodology for automatic glacier calving front extraction from synthetic aperture radar imagery |
glacier, climate, SAR, satellite, image, semantic segmentation |
 |
 |
 |
| The Caltech Fish Counting Dataset: A Benchmark for Multiple-Object Tracking and Counting |
conservation, detection, SONAR, video, tracking, counting |
 |
 |
 |
| Title |
Tags |
Paper |
Dataset |
Code |
| ChestX-ray8: Hospital-scale Chest X-ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases |
x-ray, image, healthcare, detection |
 |
 |
|
We would love your help in making this repository even better! If we missed a
paper that introduced a new dataset, or if you can think of any ways to improve
the repository, feel free to open an issue or a pull request.
This repository is inspired by paperswithcode,
and the template was adapted from
top-cvpr-2023-papers.