Skip to content

Latest commit

 

History

History
453 lines (369 loc) · 65.7 KB

File metadata and controls

453 lines (369 loc) · 65.7 KB

Awesome Feed-Forward 3D

Paper Website GitHub

An curated list for feed-forward 3D scene modeling, including research directions, datasets, and applications.

Table of Contents

Taxonomy

Research Directions

Feature Enhancement

Advanced Encoding Architectures

  • pixelNeRF: Neural Radiance Fields from One or Few Images. [📄 Paper | 💻 Code]
  • IBRNet: Learning Multi-View Image-Based Rendering. [📄 Paper | 💻 Code🌐 Project Page]
  • Splatter Image: Ultra-Fast Single-View 3D Reconstruction. [📄 Paper | 💻 Code🌐 Project Page]
  • Convolutional Occupancy Networks. [📄 Paper | 💻 Code🌐 Project Page]
  • Light Field Networks: Neural Scene Representations with Single-Evaluation Rendering. [📄 Paper | 💻 Code🌐 Project Page]
  • Learned Initializations for Optimizing Coordinate-Based Neural Representations. [📄 Paper | 💻 Code🌐 Project Page]
  • Neural Rays for Occlusion-aware Image-based Rendering. [📄 Paper | 💻 Code🌐 Project Page]
  • ${C}^{3}$-GS: Learning Context-aware, Cross-dimension, Cross-scale Feature for Generalizable Gaussian Splatting. [📄 Paper | 💻 Code]
  • Scene Representation Transformer: Geometry-Free Novel View Synthesis Through Set-Latent Scene Representations. [📄 Paper | 💻 Code🌐 Project Page]
  • VisionNeRF: Vision Transformer for NeRF-Based View Synthesis from a Single Input Image. [📄 Paper | 💻 Code🌐 Project Page]
  • RePAST: Relative Pose Attention Scene Representation Transformer. [📄 Paper]
  • Is Attention All That NeRF Needs? [📄 Paper | 💻 Code🌐 Project Page]
  • Large Reconstruction Model (LRM).
  • Instant3D: Fast Text-to-3D with Sparse-view Generation and Large Reconstruction Model.
  • TripoSR: Fast 3D Object Reconstruction from a Single Image. [📄 Paper]
  • GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation.
  • GS-LRM: Large Reconstruction Model for 3D Gaussian Splatting. [📄 Paper | 💻 Code🌐 Project Page]
  • MeshLRM: Large Reconstruction Model for High-Quality Meshes.
  • MeshFormer: High-Quality Mesh Generation with 3D-Guided Reconstruction Model.
  • Flex3D: Feed-Forward 3D Generation With Flexible Reconstruction Model And Input View Curation.
  • LVSM: A Fully Data-Driven Approach to Novel View Synthesis.
  • Depth Anything 3: Recovering the Visual Space from Any Views.
  • Gamba: Marry Gaussian Splatting With Mamba for Single-View 3D Reconstruction.
  • MVGamba: Unify 3D Content Generation as State Space Sequence Modeling.
  • Long-LRM: Long-sequence Large Reconstruction Model for Wide-coverage Gaussian Splats. [📄 Paper | 💻 Code🌐 Project Page]

Cross-View Fusion

Integration of Visual Foundation Models

Geometry-aware Improvement

Explicit Geometric Aggregation

  • MVSNeRF: Fast Generalizable Radiance Field Reconstruction from Multi-View Stereo. [📄 Paper | 💻 Code🌐 Project Page]
  • Stereo Radiance Fields (SRF): Learning View Synthesis for Sparse Views of Novel Scenes. [📄 Paper | 💻 Code🌐 Project Page]
  • GeoNeRF: Generalizing NeRF with Geometry Priors. [📄 Paper | 💻 Code🌐 Project Page]
  • BoostMVSNeRFs: Boosting MVS-based NeRFs to Generalizable View Synthesis in Large-Scale Scenes.
  • Generalizable Patch-Based Neural Rendering. [📄 Paper | 💻 Code🌐 Project Page]
  • MatchNeRF: Explicit Correspondence Matching for Generalizable Neural Radiance Fields. [📄 Paper | 💻 Code🌐 Project Page]
  • GTA: A Geometry-Aware Attention Mechanism for Multi-View Transformers. [📄 Paper | 💻 Code🌐 Project Page]
  • MuRF: Multi-Baseline Radiance Fields. [📄 Paper | 💻 Code🌐 Project Page]
  • SparseNeuS: Fast Generalizable Neural Surface Reconstruction from Sparse Views.
  • VolRecon: Volume Rendering of Signed Ray Distance Functions for Generalizable Multi-View Reconstruction.
  • ReTR: Modeling Depth Distribution for Generalizable Surface Reconstruction.
  • UFORecon: Generalizable Sparse-View Surface Reconstruction from Arbitrary and Unfavorable Sets.
  • SurfaceSplat: Connecting Surface Reconstruction and Gaussian Splatting. [📄 Paper]
  • RenderFormer: Transformer-based Neural Rendering of Triangle Meshes with Global Illumination. [📄 Paper | 💻 Code🌐 Project Page]
  • AGG: Amortized Generative 3D Gaussians for Single Image to 3D. [📄 Paper]
  • TGS: Triplane Meets Gaussian Splatting: Fast and Generalizable Single-View 3D Reconstruction with Transformers.
  • LaRa: Efficient Large-Baseline Radiance Fields. [📄 Paper | 💻 Code🌐 Project Page]
  • MeshSplat: Generalizable Sparse-View Surface Reconstruction via Gaussian Splatting. [📄 Paper]
  • TranSplat: Geometry-Aware Feed-Forward Gaussian Splatting with Transformation Consistency.
  • H3R: Hybrid Multi-view Correspondence for Generalizable 3D Reconstruction. [📄 Paper]
  • MuGS: Multi-Baseline Generalizable Gaussian Splatting Reconstruction.
  • pixelSplat: 3D Gaussian Splats from Image Pairs for Scalable Generalizable 3D Reconstruction. [📄 Paper | 💻 Code🌐 Project Page]
  • MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images. [📄 Paper | 💻 Code🌐 Project Page]
  • MVSGaussian: Fast Generalizable Gaussian Splatting Reconstruction from Multi-View Stereo. [📄 Paper | 💻 Code🌐 Project Page]

Refining Predicted 3D Scenes

Pose-Free Reconstruction

Pre-trained Geometric Guidance

Model Efficiency

Feature Efficiency

  • Efficient Neural Radiance Fields for Interactive Free-viewpoint Video. [📄 Paper | 💻 Code🌐 Project Page]
  • ProNeRF: Learning Efficient Projection-Aware Ray Sampling for Fine-Grained Implicit Neural Radiance Fields. [📄 Paper | 💻 Code🌐 Project Page]
  • TinySplat: Feedforward Approach for Generating Compact 3D Scene Representation. [📄 Paper]
  • ZPressor: Bottleneck-Aware Compression for Scalable Feed-Forward 3DGS. [📄 Paper | 💻 Code🌐 Project Page]
  • FastVGGT: Training-Free Acceleration of Visual Geometry Transformer. [📄 Paper | 💻 Code🌐 Project Page]
  • Quantized Visual Geometry Grounded Transformer. [📄 Paper]
  • Faster VGGT with Block-Sparse Global Attention. [📄 Paper]
  • Evict3R: Training-Free Token Eviction for Memory-Bounded Streaming Visual Geometry Transformers. [📄 Paper]
  • LiteVGGT: Boosting Vanilla VGGT via Geometry-aware Cached Token Merging. [📄 Paper]
  • Speed3R: Sparse Feed-forward 3D Reconstruction Models. [📄 Paper]
  • SR3R: Rethinking Super-Resolution 3D Reconstruction With Feed-Forward Gaussian Splatting. [📄 Paper]

Representation Compaction

Data & Visual Augmentation

Data Augmentation

Visual Augmentation

  • MVSplat360: Feed-Forward 360 Scene Synthesis from Sparse Views. [📄 Paper | 💻 Code🌐 Project Page]
  • latentSplat: Autoencoding Variational Gaussians for Fast Generalizable 3D Reconstruction. [📄 Paper | 💻 Code🌐 Project Page]
  • ProSplat: Improved Feed-Forward 3D Gaussian Splatting for Wide-Baseline Sparse Views. [📄 Paper]
  • Difix3D+: Improving 3D Reconstructions with Single-Step Diffusion Models.
  • Reconstruct, Inpaint, Finetune: Dynamic Novel-view Synthesis from Monocular Videos. [📄 Paper]

Temporal-aware Models

Online Streaming

Offline Processing

Interactive Modeling

  • PIXIE: Physics from Pixels for Interactive Feed-Forward Scene Modeling. [📄 Paper]
  • PhysGM: Physical Gaussian Modeling for Interactive 3D Scene Editing. [📄 Paper]

Specialized Tasks

  • DAS3R: Dynamics-Aware Gaussian Splatting for Static Scene Reconstruction. [📄 Paper | 💻 Code🌐 Project Page]
  • St4RTrack: Simultaneous 4D Reconstruction and Tracking in the World.

Datasets and Benchmarks

Geometry Oriented

Visual Oriented

Mixed

Applications

Autonomous Driving

Robotics

Manipulation

Navigation

SfM & SLAM

SFM

SLAM

Scene Understanding

Semantic

3D Scene Understanding

Video Generation

Reconstruction-enhanced Video Generation

Video Generation-based Scene Reconstruction

Others

Panorama

Localization

  • Reloc3r: Large-Scale Training of Relative Camera Pose Regression for Generalizable, Fast, and Accurate Visual Localization. [📄 Paper]
  • A Scene is Worth a Thousand Features: Feed-Forward Camera Localization from a Collection of Image Features. [📄 Paper]
  • Multi-View 3D Point Tracking. [📄 Paper🌐 Project Page]
  • SAIL-Recon: Large SfM by Augmenting Scene Regression with Localization. [📄 Paper | 💻 Code🌐 Project Page]

Digital Humans

Calibration, Inpainting and Reflection

  • LoRA3D: Low-Rank Self-Calibration of 3D Geometric Foundation Models. [📄 Paper🌐 Project Page]
  • BevSplat: Resolving Height Ambiguity via Feature-Based Gaussian Primitives for Weakly-Supervised Cross-View Localization.
  • InstaInpaint: Instant 3D-Scene Inpainting with Masked Large Reconstruction Model. [📄 Paper🌐 Project Page]
  • Reflect3r: Single-View 3D Stereo Reconstruction Aided by Mirror Reflections. [📄 Paper🌐 Project Page]