An curated list for feed-forward 3D scene modeling, including research directions, datasets, and applications.
![]() |
|---|
- Research Directions
- Datasets and Benchmarks
- Applications
![]() |
|---|
- pixelNeRF: Neural Radiance Fields from One or Few Images. [π Paper | π» Code]
- IBRNet: Learning Multi-View Image-Based Rendering. [π Paper | π» Codeο½π Project Page]
- Splatter Image: Ultra-Fast Single-View 3D Reconstruction. [π Paper | π» Codeο½π Project Page]
- Convolutional Occupancy Networks. [π Paper | π» Codeο½π Project Page]
- Light Field Networks: Neural Scene Representations with Single-Evaluation Rendering. [π Paper | π» Codeο½π Project Page]
- Learned Initializations for Optimizing Coordinate-Based Neural Representations. [π Paper | π» Codeο½π Project Page]
- Neural Rays for Occlusion-aware Image-based Rendering. [π Paper | π» Codeο½π Project Page]
-
${C}^{3}$ -GS: Learning Context-aware, Cross-dimension, Cross-scale Feature for Generalizable Gaussian Splatting. [π Paper | π» Code] - Scene Representation Transformer: Geometry-Free Novel View Synthesis Through Set-Latent Scene Representations. [π Paper | π» Codeο½π Project Page]
- VisionNeRF: Vision Transformer for NeRF-Based View Synthesis from a Single Input Image. [π Paper | π» Codeο½π Project Page]
- RePAST: Relative Pose Attention Scene Representation Transformer. [π Paper]
- Is Attention All That NeRF Needs? [π Paper | π» Codeο½π Project Page]
- Large Reconstruction Model (LRM).
- Instant3D: Fast Text-to-3D with Sparse-view Generation and Large Reconstruction Model.
- TripoSR: Fast 3D Object Reconstruction from a Single Image. [π Paper]
- GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation.
- GS-LRM: Large Reconstruction Model for 3D Gaussian Splatting. [π Paper | π» Codeο½π Project Page]
- MeshLRM: Large Reconstruction Model for High-Quality Meshes.
- MeshFormer: High-Quality Mesh Generation with 3D-Guided Reconstruction Model.
- Flex3D: Feed-Forward 3D Generation With Flexible Reconstruction Model And Input View Curation.
- LVSM: A Fully Data-Driven Approach to Novel View Synthesis.
- Depth Anything 3: Recovering the Visual Space from Any Views.
- Gamba: Marry Gaussian Splatting With Mamba for Single-View 3D Reconstruction.
- MVGamba: Unify 3D Content Generation as State Space Sequence Modeling.
- Long-LRM: Long-sequence Large Reconstruction Model for Wide-coverage Gaussian Splats. [π Paper | π» Codeο½π Project Page]
- AttnRend: Learning to Render Novel Views from Wide-Baseline Stereo Pairs. [π Paper | π» Codeο½π Project Page]
- Epipolar-Free 3D Gaussian Splatting for Generalizable Novel View Synthesis. [π Paper | π» Codeο½π Project Page]
- LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation.
- DUSt3R: Geometric 3D Vision Made Easy. [π Paper | π» Codeο½π Project Page]
- Grounding Image Matching in 3D with MASt3R. [π Paper | π» Codeο½π Project Page]
- MV-DUSt3R: Multi-View Dense Stereo 3D Reconstruction.
- MV-DUSt3R+: Single-Stage Scene Reconstruction from Sparse Views In 2 Seconds. [π Paper | π» Codeο½π Project Page]
- PreF3R: Pose-Free Feed-Forward 3D Gaussian Splatting from Variable-length Image Sequence. [π Paper | π» Codeο½π Project Page]
- 3D Reconstruction with Spatial Memory. [π Paper | π» Codeο½π Project Page]
- Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass. [π Paper | π» Codeο½π Project Page]
- MUSt3R: Multi-view Network for Stereo 3D Reconstruction. [π Paper | π» Codeο½π Project Page]
- WinT3R: Window-Based Streaming Reconstruction with Camera Token Pool. [π Paper]
- Continuous 3D Perception Model with Persistent State. [π Paper | π» Codeο½π Project Page]
- VolSplat: Rethinking Feed-Forward 3D Gaussian Splatting with Voxel-Aligned Prediction. [π Paper]
- G-CUT3R: Guided 3D Reconstruction with Camera and Depth Prior Integration. [π Paper]
- TTT3R: 3D Reconstruction as Test-Time Training. [π Paper]
- Point3R: Streaming 3D Reconstruction with Explicit Spatial Pointer Memory. [π Paper]
- VGGT: Visual Geometry Grounded Transformer. [π Paper | π» Codeο½π Project Page]
- iLRM: An Iterative Large 3D Reconstruction Model. [π Paper | π» Codeο½π Project Page]
- Dens3r: A Foundation Model for 3D Geometry Prediction. [π Paper]
- MoRE: 3D Visual Geometry Reconstruction Meets Mixture-of-Experts. [π Paper]
- Uni3R: Unified 3D Reconstruction and Semantic Understanding via Generalizable Gaussian Splatting from Unposed Multi-view Images. [π Paper]
- Flow3r: Factored Flow Prediction for Scalable Visual Geometry Learning. [π Paper]
- Gen3R: 3D Scene Generation Meets Feed-Forward Reconstruction. [π Paper]
- NOVA3R: Non-pixel-aligned Visual Transformer for Amodal 3D Reconstruction. [π Paper | π» Codeο½π Project Page]
- IncVGGT: Incremental VGGT for Memory-Bounded Long-Range 3D Reconstruction.
- ZipMap: Linear-Time Stateful 3D Reconstruction with Test-Time Training. [π Paper]
- LoGeR: Long-Context Geometric Reconstruction with Hybrid Memory. [π Paper]
- tttLRM: Test-Time Training for Long Context and Autoregressive 3D Reconstruction. [π Paper]
- VGG-T3: Offline Feed-Forward 3D Reconstruction at Scale. [π Paper]
- DUSt3R: Geometric 3D Vision Made Easy. [π Paper | π» Codeο½π Project Page]
- Mono3R: Exploiting Monocular Cues for Geometric 3D Reconstruction. [π Paper]
- Feat2GS: Probing Visual Foundation Models with Gaussian Splatting. [π Paper | π» Codeο½π Project Page]
- CATSplat: Context-Aware Transformer with Spatial Guidance for Generalizable 3D Gaussian Splatting from A Single-View Image. [π Paper]
- MVSNeRF: Fast Generalizable Radiance Field Reconstruction from Multi-View Stereo. [π Paper | π» Codeο½π Project Page]
- Stereo Radiance Fields (SRF): Learning View Synthesis for Sparse Views of Novel Scenes. [π Paper | π» Codeο½π Project Page]
- GeoNeRF: Generalizing NeRF with Geometry Priors. [π Paper | π» Codeο½π Project Page]
- BoostMVSNeRFs: Boosting MVS-based NeRFs to Generalizable View Synthesis in Large-Scale Scenes.
- Generalizable Patch-Based Neural Rendering. [π Paper | π» Codeο½π Project Page]
- MatchNeRF: Explicit Correspondence Matching for Generalizable Neural Radiance Fields. [π Paper | π» Codeο½π Project Page]
- GTA: A Geometry-Aware Attention Mechanism for Multi-View Transformers. [π Paper | π» Codeο½π Project Page]
- MuRF: Multi-Baseline Radiance Fields. [π Paper | π» Codeο½π Project Page]
- SparseNeuS: Fast Generalizable Neural Surface Reconstruction from Sparse Views.
- VolRecon: Volume Rendering of Signed Ray Distance Functions for Generalizable Multi-View Reconstruction.
- ReTR: Modeling Depth Distribution for Generalizable Surface Reconstruction.
- UFORecon: Generalizable Sparse-View Surface Reconstruction from Arbitrary and Unfavorable Sets.
- SurfaceSplat: Connecting Surface Reconstruction and Gaussian Splatting. [π Paper]
- RenderFormer: Transformer-based Neural Rendering of Triangle Meshes with Global Illumination. [π Paper | π» Codeο½π Project Page]
- AGG: Amortized Generative 3D Gaussians for Single Image to 3D. [π Paper]
- TGS: Triplane Meets Gaussian Splatting: Fast and Generalizable Single-View 3D Reconstruction with Transformers.
- LaRa: Efficient Large-Baseline Radiance Fields. [π Paper | π» Codeο½π Project Page]
- MeshSplat: Generalizable Sparse-View Surface Reconstruction via Gaussian Splatting. [π Paper]
- TranSplat: Geometry-Aware Feed-Forward Gaussian Splatting with Transformation Consistency.
- H3R: Hybrid Multi-view Correspondence for Generalizable 3D Reconstruction. [π Paper]
- MuGS: Multi-Baseline Generalizable Gaussian Splatting Reconstruction.
- pixelSplat: 3D Gaussian Splats from Image Pairs for Scalable Generalizable 3D Reconstruction. [π Paper | π» Codeο½π Project Page]
- MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images. [π Paper | π» Codeο½π Project Page]
- MVSGaussian: Fast Generalizable Gaussian Splatting Reconstruction from Multi-View Stereo. [π Paper | π» Codeο½π Project Page]
- FreeSplat: Generalizable 3D Gaussian Splatting Towards Free-View Synthesis of Indoor Scenes. [π Paper | π» Codeο½π Project Page]
- HiSplat: Hierarchical 3D Gaussian Splatting for Generalizable Sparse-View Reconstruction. [π Paper | π» Codeο½π Project Page]
- PixelGaussian: Generalizable 3D Gaussian Reconstruction from Arbitrary Views. [π Paper | π» Codeο½π Project Page]
- Gaussian Graph Network: Learning Efficient and Generalizable Gaussian Representations from Multi-view Images. [π Paper | π» Codeο½π Project Page]
- Generative Densification: Learning to Densify Gaussians for High-Fidelity Generalizable 3D Reconstruction.
- G3R: Gradient Guided Generalizable Reconstruction.
- LEAP: Liberate Sparse-view 3D Modeling from Camera Poses. [π Paper | π» Codeο½π Project Page]
- Splatt3R: Zero-shot Gaussian Splatting from Uncalibrated Image Pairs. [π Paper | π» Codeο½π Project Page]
- No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images. [π Paper | π» Codeο½π Project Page]
- PF3plat: Pose-Free Feed-Forward 3D Gaussian Splatting. [π Paper | π» Codeο½π Project Page]
- FreeSplatter: Pose-free Gaussian Splatting for Sparse-view 3D Reconstruction. [π Paper | π» Code]
- FLARE: Feed-forward Geometry, Appearance and Camera Estimation from Uncalibrated Sparse Views. [π Paper | π» Codeο½π Project Page]
- Pow3R: Empowering Unconstrained 3D Reconstruction with Camera and Scene Priors. [π Paper | π» Codeο½π Project Page]
- RegGS: Unposed Sparse Views Gaussian Splatting with 3DGS Registration. [π Paper | π» Codeο½π Project Page]
- UFV-Splatter: Pose-Free Feed-Forward 3D Gaussian Splatting Adapted to Unfavorable Views. [π Paper | π» Codeο½π Project Page]
- Ο^3: Scalable Permutation-Equivariant Visual Geometry Learning. [π Paper | π» Codeο½π Project Page]
- AnySplat: Feed-forward 3D Gaussian Splatting from Unconstrained Views. [π Paper]
- No Pose at All: Self-Supervised Pose-Free 3D Gaussian Splatting from Sparse Views.
- SPFSplatV2: Efficient Self-Supervised Pose-Free 3D Gaussian Splatting from Sparse Views. [π Paper]
- PLANA3R: Zero-shot Metric Planar 3D Reconstruction via Feed-forward Planar Splatting.
- YoNoSplat: You Only Need One Model for Feedforward 3D Gaussian Splatting. [π Paper]
- DepthSplat: Connecting Gaussian Splatting and Depth. [π Paper | π» Codeο½π Project Page]
- Flash3D: Feed-Forward Generalisable 3D Scene Reconstruction from a Single Image. [π Paper | π» Codeο½π Project Page]
- Niagara: Normal-Integrated Geometric Affine Field for Scene Reconstruction from a Single View. [π Paper | π» Codeο½π Project Page]
- Revisiting Depth Representations for Feed-Forward 3D Gaussian Splatting. [π Paper | π» Codeο½π Project Page]
- Fin3R: Fine-tuning Feed-forward 3D Reconstruction Models via Monocular Knowledge Distillation. [π Paper]
- JointSplat: Joint Depth and Flow Priors for Feed-Forward 3D Gaussian Splatting. [π Paper]
- Efficient Neural Radiance Fields for Interactive Free-viewpoint Video. [π Paper | π» Codeο½π Project Page]
- ProNeRF: Learning Efficient Projection-Aware Ray Sampling for Fine-Grained Implicit Neural Radiance Fields. [π Paper | π» Codeο½π Project Page]
- TinySplat: Feedforward Approach for Generating Compact 3D Scene Representation. [π Paper]
- ZPressor: Bottleneck-Aware Compression for Scalable Feed-Forward 3DGS. [π Paper | π» Codeο½π Project Page]
- FastVGGT: Training-Free Acceleration of Visual Geometry Transformer. [π Paper | π» Codeο½π Project Page]
- Quantized Visual Geometry Grounded Transformer. [π Paper]
- Faster VGGT with Block-Sparse Global Attention. [π Paper]
- Evict3R: Training-Free Token Eviction for Memory-Bounded Streaming Visual Geometry Transformers. [π Paper]
- LiteVGGT: Boosting Vanilla VGGT via Geometry-aware Cached Token Merging. [π Paper]
- Speed3R: Sparse Feed-forward 3D Reconstruction Models. [π Paper]
- SR3R: Rethinking Super-Resolution 3D Reconstruction With Feed-Forward Gaussian Splatting. [π Paper]
- Gaussian Graph Network: Learning Efficient and Generalizable Gaussian Representations from Multi-view Images. [π Paper | π» Codeο½π Project Page]
- PixelGaussian: Generalizable 3D Gaussian Reconstruction from Arbitrary Views. [π Paper | π» Codeο½π Project Page]
- FreeSplat++: Generalizable 3D Gaussian Splatting for Efficient Indoor Scene Reconstruction. [π Paper | π» Codeο½π Project Page]
- LongSplat: Online Generalizable 3D Gaussian Splatting from Long Sequence Images. [π Paper | π» Codeο½π Project Page]
- MegaSynth: Scaling Up 3D Scene Reconstruction with Synthesized Data. [π Paper | π» Codeο½π Project Page]
- Puzzles: Unbounded Video-Depth Augmentation for Scalable End-to-End 3D Reconstruction. [π Paper | π» Codeο½π Project Page]
- Aug3D: Augmenting Large Scale Outdoor Datasets for Generalizable Novel View Synthesis. [π Paper]
- MVBoost: Boost 3D Reconstruction with Multi-View Refinement.
- MVSplat360: Feed-Forward 360 Scene Synthesis from Sparse Views. [π Paper | π» Codeο½π Project Page]
- latentSplat: Autoencoding Variational Gaussians for Fast Generalizable 3D Reconstruction. [π Paper | π» Codeο½π Project Page]
- ProSplat: Improved Feed-Forward 3D Gaussian Splatting for Wide-Baseline Sparse Views. [π Paper]
- Difix3D+: Improving 3D Reconstructions with Single-Step Diffusion Models.
- Reconstruct, Inpaint, Finetune: Dynamic Novel-view Synthesis from Monocular Videos. [π Paper]
- StreamSplat: Towards Online Dynamic 3D Reconstruction from Uncalibrated Video Streams. [π Paper | π» Code]
- Continuous 3D Perception Model with Persistent State. [π Paper | π» Codeο½π Project Page]
- DGS-LRM: Real-Time Deformable 3D Gaussian Reconstruction From Monocular Videos. [π Paper ο½π Project Page]
- Stream3R: Scalable Sequential 3D Reconstruction with Causal Transformer. [π Paper]
- LongStream: Long-Sequence Streaming Autoregressive Visual Geometry. [π Paper]
- L4GM: Large 4D Gaussian Reconstruction Model.
- 4D-LRM: Large Space-Time Reconstruction Model From and To Any View at Any Time. [π Paper]
- MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion. [π Paper | π» Codeο½π Project Page]
- Easi3R: Estimating Disentangled Motion from DUSt3R Without Training. [π Paper | π» Codeο½π Project Page]
- 4DGT: Learning a 4D Gaussian Transformer Using Real-World Monocular Videos. [π Paper | π» Codeο½π Project Page]
- MoVieS: Motion-Aware 4D Dynamic View Synthesis in One Second. [π Paper | π» Codeο½π Project Page]
- 4Real-Video-V2: Fused View-Time Attention and Feedforward Reconstruction for 4D Scene Generation. [π Paper ο½π Project Page]
- MonoFusion: Sparse-View 4D Reconstruction via Monocular Fusion. [π Paper | π» Codeο½π Project Page]
- Self-Supervised Monocular 4D Scene Reconstruction for Egocentric Videos. [π Paper]
- Feed-forward Bullet-Time Reconstruction of Dynamic Scenes from Monocular Videos. [π Paper]
- PIXIE: Physics from Pixels for Interactive Feed-Forward Scene Modeling. [π Paper]
- PhysGM: Physical Gaussian Modeling for Interactive 3D Scene Editing. [π Paper]
- DAS3R: Dynamics-Aware Gaussian Splatting for Static Scene Reconstruction. [π Paper | π» Codeο½π Project Page]
- St4RTrack: Simultaneous 4D Reconstruction and Tracking in the World.
![]() |
|---|
- DTU: Large Scale Multi-view Stereopsis Evaluation. [π Paper | π Project Page]
- 7Scenes: Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images. [π Paper | π Project Page]
- NeRF-Synthetic: NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. [π Paper | π» Codeο½π Project Page | πData Link]
- Neural 3D Mesh Renderer (NMR): Neural 3D Mesh Renderer. [π Paper | π» Code]
- CelebA: Deep Learning Face Attributes in the Wild. [π Paper | π Project Page | πData Link]
- Consistent4D: Consistent 360Β° Dynamic Object Generation from Monocular Video. [π Paper | π» Codeο½π Project Page | πData Link]
- NYUv2: Indoor Segmentation and Support Inference from RGBD Images. [π Paper | π Project Page | πData Link]
- Habitat: A Platform for Embodied AI Research. [π Paper | π» Code | πData Link]
- Hot3D: Hand and Object Tracking in 3D from Egocentric Multi-view Videos. [π Paper | π» Codeο½π Project Page | πData Link]
- ACID: Infinite Nature: Perpetual View Generation of Natural Scenes from a Single Image. [π Paper ο½π Project Page | πData Link]
- ENeRF-Outdoor: Efficient Neural Radiance Fields for Interactive Free-viewpoint Video. [π Paper | π» Codeο½π Project Page | πData Link]
- LLFF: Local Light Field Fusion: Practical View Synthesis with Prescriptive Sampling Guidelines. [π Paper | π» Codeο½π Project Page | πData Link]
- Neural3DV: Neural 3D Video Synthesis from Multi-view Video. [π Paper | π» Codeο½π Project Page | πData Link]
- EgoExo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives. [π Paper | π» Codeο½π Project Page | πData Link]
- DAVIS: A Benchmark Dataset and Evaluation Methodology for Video Object Segmentation. [π Paper | π» Codeο½π Project Page | πData Link]
- Youtube-VOS: Sequence-to-sequence Video Object Segmentation. [π Paper ο½π Project Page | πData Link]
- DL3DV-10K: A Large-Scale Scene Dataset for Deep Learning-based 3D Vision. [π Paper | π» Codeο½π Project Page | πData Link]
- RealEstate10K: Stereo Magnification: Learning View Synthesis Using Multiplane Images. [π Paper ο½π Project Page | πData Link]
- Google Scanned Objects (GSO): A High-Quality Dataset of 3D Scanned Household Items. [π Paper | π Project Page | πData Link]
- OmniObject3D: Large-Vocabulary 3D Object Dataset for Realistic Perception, Reconstruction and Generation. [π Paper | π» Codeο½π Project Page | πData Link]
- CO3D: Common Objects in 3D: Large-Scale Learning and Evaluation of Real-Life 3D Category Reconstruction. [π Paper | π» Codeο½π Project Page | πData Link]
- WildRGBD: RGBD Objects in the Wild: Scaling Real-World 3D Object Learning from RGB-D Videos. [π Paper | π» Codeο½π Project Page | πData Link]
- ShapeNet: An Information-Rich 3D Model Repository. [π Paper ο½π Project Page | πData Link]
- MVImgNet: A Large-Scale Dataset of Multi-View Images. [π Paper | π» Code | πData Link]
- Objaverse: A Universe of Annotated 3D Objects. [π Paper | π» Codeο½π Project Page | πData Link]
- Objaverse-XL: A Universe of 10M+ 3D Objects. [π Paper | π» Codeο½π Project Page | πData Link]
- DeepVoxels: Learning Persistent 3D Feature Embeddings. [π Paper | π» Codeο½π Project Page | πData Link]
- MultiShapeNet: Scene Representation Transformer: Geometry-Free Novel View Synthesis Through Set-Latent Scene Representations. [π Paper | π» Codeο½π Project Page | πData Link]
- Amazon Berkeley Objects (ABO): Dataset and Benchmarks for Real-World 3D Object Understanding. [π Paper ο½π Project Page | πData Link]
- Replica: A Digital Replica of Indoor Spaces. [π Paper | π» Code | πData Link]
- TUM RGBD: Evaluating Egomotion and Structure-from-Motion Approaches Using the TUM RGB-D Benchmark. [π Paper ο½π Project Page | πData Link]
- Matterport3D: Learning from RGB-D Data in Indoor Environments. [π Paper | π» Codeο½π Project Page | πData Link]
- Hypersim: A Photorealistic Synthetic Dataset for Holistic Indoor Scene Understanding. [π Paper | π» Code | πData Link]
- ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes. [π Paper | π» Codeο½π Project Page | πData Link]
- ScanNet++: A High-Fidelity Dataset of 3D Indoor Scenes. [π Paper | π» Codeο½π Project Page | πData Link]
- ARKitScenes: A Diverse Real-World Dataset for 3D Indoor Scene Understanding Using Mobile RGB-D Data. [π Paper | π» Code | πData Link]
- Virtual KITTI 2. [π Paper ο½π Project Page | πData Link]
- Spring: A High-Resolution High-Detail Dataset and Benchmark for Scene Flow, Optical Flow and Stereo. [π Paper | π» Codeο½π Project Page | πData Link]
- MegaDepth: Learning Single-View Depth Prediction from Internet Photos. [π Paper | π» Codeο½π Project Page | πData Link]
- PointOdyssey: A Large-Scale Synthetic Dataset for Long-Term Point Tracking. [π Paper | π» Codeο½π Project Page | πData Link]
- TartanAir: A Dataset to Push the Limits of Visual SLAM. [π Paper ο½π Project Page | πData Link]
- Waymo: Scalability in Perception for Autonomous Driving. [π Paper | π» Codeο½π Project Page | πData Link]
- nuScenes: A Multimodal Dataset for Autonomous Driving. [π Paper | π Project Page | πData Link]
- Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance Fields. [π Paper | π» Codeο½π Project Page | πData Link]
- Tanks and Temples: Benchmarking Large-Scale Scene Reconstruction. [π Paper | π Project Page]
- ETH3D: A Multi-View Stereo Benchmark with High-Resolution Images and Multi-Camera Videos. [π Paper | π» Codeο½π Project Page | πData Link]
- BlendedMVS: A Large-Scale Dataset for Generalized Multi-View Stereo Networks. [π Paper | π» Code | πData Link]
- DyCheck: Dynamic Gaussian Marbles for Novel View Synthesis of Casual Monocular Videos. [π Paper | π» Codeο½π Project Page | πData Link]
![]() |
|---|
- Driv3R: Learning Dense 4D Reconstruction for Autonomous Driving. [π Paper | π» Codeο½π Project Page]
- InfiniCube: Unbounded and Controllable Dynamic 3D Driving Scene Generation with World-Guided Video Models. [π Paper | π» Codeο½π Project Page]
- DrivingRecon: Large 4D Gaussian Reconstruction Model for Autonomous Driving. [π Paper | π» Code]
- GEN3C: 3D-Informed World-Consistent Video Generation with Precise Camera Control. [π Paper | π» Codeο½π Project Page]
- DrivingForward: Feed-forward 3D Gaussian Splatting for Driving Scene Reconstruction from Flexible Surround-view Input. [π Paper | π» Code]
- STORM: Spatio-Temporal Reconstruction Model for Large-Scale Outdoor Scenes. [π Paper | π» Codeο½π Project Page]
- SCube: Instant Large-Scale Scene Reconstruction using VoxSplats. [π Paper | π» Codeο½π Project Page]
- EVolSplat: Efficient Volume-based Gaussian Splatting for Urban View Synthesis. [π Paper | π» Codeο½π Project Page]
- Omni-Scene: Omni-Gaussian Representation for Ego-Centric Sparse-View Scene Reconstruction. [π Paper | π» Codeο½π Project Page]
- BEV-GS: Feed-forward Gaussian Splatting in Bird's-Eye-View for Road Reconstruction. [π Paper | π» Code]
- Efficient Depth-guided Urban View Synthesis. [π Paper | π» Codeο½π Project Page]
- DriveGen3D: Boosting Feed-Forward Driving Scene Generation with Efficient Video Diffusion. [π Paperο½π Project Page]
- WorldSplat: Gaussian-Centric Feed-Forward 4D Scene Generation for Autonomous Driving. [π Paper | π» Codeο½π Project Page]
- GraspNeRF: Multiview-based 6-DoF Grasp Detection for Transparent and Specular Objects Using Generalizable NeRF. [π Paper | π» Codeο½π Project Page]
- ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation. [π Paper | π» Codeο½π Project Page]
- ManiGaussian++: General Robotic Bimanual Manipulation with Hierarchical Gaussian World Model. [π Paper | π» Code]
- Query-based Semantic Gaussian Field for Scene Representation in Reinforcement Learning. [π Paper]
- GAF: Gaussian Action Field as a Dynamic World Model for Robotic Manipulation. [π Paperο½π Project Page]
- GaussianGrasper: 3D Language Gaussian Splatting for Open-Vocabulary Robotic Grasping. [π Paper | π» Code]
- EmbodiedSplat: Personalized Real-to-Sim-to-Real Navigation with Gaussian Splats from a Mobile Device. [π Paperο½π Project Page]
- IGL-Nav: Incremental 3D Gaussian Localization for Image-goal Navigation. [π Paper | π» Codeο½π Project Page]
- VR-Robo: A Real-to-Sim-to-Real Framework for Visual Robot Navigation and Locomotion. [π Paper | π» Codeο½π Project Page]
- GS-LTS: 3D Gaussian Splatting-Based Adaptive Modeling for Long-Term Service Robots. [π Paperο½π Project Page]
- UnitedVLN: Generalizable Gaussian Splatting for Continuous Vision-Language Navigation. [π Paper]
- Visual Geometry Grounded Deep Structure From Motion. [π Paper | π» Codeο½π Project Page]
- Light3R-SfM: Towards Feed-forward Structure-from-Motion. [π Paperο½π Project Page]
- Mast3r-sfm: A Fully-Integrated Solution for Unconstrained Structure-from-Motion. [π Paper | π» Code]
- Regist3R: Incremental Registration with Stereo Foundation Model. [π Paper | π» Code]
- VGGT-Long: Chunk it, Loop it, Align it -- Pushing VGGT's Limits on Kilometer-scale Long RGB Sequences. [π Paper | π» Code]
- Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass. [π Paper | π» Codeο½π Project Page]
- FLARE: Feed-forward Geometry, Appearance and Camera Estimation from Uncalibrated Sparse Views. [π Paper | π» Codeο½π Project Page]
- MASt3R-SLAM: Real-Time Dense SLAM with 3D Reconstruction Priors. [π Paper | π» Codeο½π Project Page]
- SLAM3R. [π Paper | π» Code]
- VGGT-SLAM. [π Paper | π» Code]
- ARTDECO: Towards Efficient and High-Fidelity On-the-Fly 3D Reconstruction with Structured Scene Representation. [π Paper]
- EC3R-SLAM: Efficient and Consistent Monocular Dense SLAM with Feed-Forward 3D Reconstruction. [π Paper]
- MASt3R-Fusion: Integrating Feed-Forward Visual Model with IMU, GNSS for High-Functionality SLAM. [π Paper]
- ViSTA-SLAM: Visual SLAM with Symmetric Two-view Association. [π Paper | π» Codeο½π Project Page]
- SLGaussian: Fast Language Gaussian Splatting in Sparse Views. [π Paper]
- GSemSplat: Generalizable Semantic 3D Gaussian Splatting from Uncalibrated Image Pairs. [π Paper]
- SegMASt3R: Geometry Grounded Segment Matching. [π Paperο½π Project Page]
- PartField: Learning 3D Feature Fields for Part Segmentation and Beyond. [π Paper | π» Codeο½π Project Page]
- Semantic Gaussians: Open-Vocabulary Scene Understanding with 3D Gaussian Splatting. [π Paper | π» Codeο½π Project Page]
- SemanticSplat: Feed-Forward 3D Scene Understanding with Language-Aware Gaussian Fields. [π Paper | π» Codeο½π Project Page]
- UniForward: Unified 3D Scene and Semantic Field Reconstruction via Feed-Forward Gaussian Splatting from Only Sparse-View Images. [π Paper]
- Large Spatial Model: End-to-end Unposed Images to Semantic 3D. [π Paper | π» Codeο½π Project Page]
- AlignGS: Aligning Geometry and Semantics for Robust Indoor Reconstruction from Sparse Views. [π Paperο½π Project Page]
- MLLMs Need 3D-Aware Representation Supervision for Scene Understanding. [π Paper | π» Codeο½π Project Page]
- Spatio-Temporal LLM: Reasoning about Environments and Actions. [π Paper | π» Codeο½π Project Page]
- Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence. [π Paper | π» Codeο½π Project Page]
- Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors. [π Paper | π» Codeο½π Project Page]
- VLM-3R: Vision-Language Models Augmented with Instruction-Aligned 3D Reconstruction. [π Paper | π» Codeο½π Project Page]
- MVSplat360: Feed-Forward 360 Scene Synthesis from Sparse Views. [π Paper | π» Codeο½π Project Page]
- JOG3R: Towards 3D-Consistent Video Generators. [π Paperο½π Project Page]
- GenFusion: Closing the Loop between Reconstruction and Generation via Videos. [π Paper | π» Codeο½π Project Page]
- ReVision: High-Quality, Low-Cost Video Generation with Explicit 3D Physics Modeling for Complex Motion and Interaction. [π Paperο½π Project Page]
- Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling. [π Paper | π» Codeο½π Project Page]
- Video World Models with Long-term Spatial Memory. [π Paperο½π Project Page]
- SteerX: Creating Any Camera-Free 3D and 4D Scenes with Geometric Steering. [π Paper | π» Codeο½π Project Page]
- 4DNeX: Feed-Forward 4D Generative Modeling Made Easy. [π Paper | π» Codeο½π Project Page]
- Lyra: Generative 3D Scene Reconstruction via Video Diffusion Model Self-Distillation. [π Paper]
- ShapeGen4D: Towards High Quality 4D Shape Generation from Videos. [π Paperο½π Project Page]
- EvoWorld: Evolving Panoramic World Generation with Explicit 3D Memory. [π Paper]
- WorldForge: Unlocking Emergent 3D/4D Generation in Video Diffusion Model via Training-Free Guidance. [π Paper]
- FantasyWorld: Geometry-Consistent World Modeling via Unified Video and 3D Prediction. [π Paper]
- Splatter-360: Generalizable 360Β° Gaussian Splatting for Wide-baseline Panoramic Images. [π Paper | π» Codeο½π Project Page]
- PanSplat: 4K Panorama Synthesis with Feed-Forward Gaussian Splatting. [π Paper | π» Codeο½π Project Page]
- PanoVGGT: Feed-Forward 3D Reconstruction from Panoramic Imagery. [π Paper]
- Reloc3r: Large-Scale Training of Relative Camera Pose Regression for Generalizable, Fast, and Accurate Visual Localization. [π Paper]
- A Scene is Worth a Thousand Features: Feed-Forward Camera Localization from a Collection of Image Features. [π Paper]
- Multi-View 3D Point Tracking. [π Paperο½π Project Page]
- SAIL-Recon: Large SfM by Augmenting Scene Regression with Localization. [π Paper | π» Codeο½π Project Page]
- Human3R: Everyone Everywhere All at Once. [π Paper | π» Codeο½π Project Page]
- LoRA3D: Low-Rank Self-Calibration of 3D Geometric Foundation Models. [π Paperο½π Project Page]
- BevSplat: Resolving Height Ambiguity via Feature-Based Gaussian Primitives for Weakly-Supervised Cross-View Localization.
- InstaInpaint: Instant 3D-Scene Inpainting with Masked Large Reconstruction Model. [π Paperο½π Project Page]
- Reflect3r: Single-View 3D Stereo Reconstruction Aided by Mirror Reflections. [π Paperο½π Project Page]



