A curated list of resources including papers, datasets, and relevant links pertaining to generative image composition (object insertion). Generative image composition aims to generate plausible composite images based on a background image (optional bounding box) and a (resp., a few) foreground image (resp., images) of a specific object. For more complete resources on general image composition (object insertion), please refer to Awesome-Image-Composition.
Contributions are welcome. If you wish to contribute, feel free to send a pull request. If you have suggestions for new sections to be included, please raise an issue and discuss before sending a pull request.
A brief review on generative image composition is included in the following survey on image composition:
Li Niu, Wenyan Cong, Liu Liu, Yan Hong, Bo Zhang, Jing Liang, Liqing Zhang: "Making Images Real Again: A Comprehensive Survey on Deep Image Composition." arXiv:2106.14490 (2021). [arxiv] [slides]
Try this online demo for image composition (object insertion) built upon libcom toolbox and have fun!
demo.mp4
- MureCom (within-domain, multi-ref): This dataset contains 32 category subfolders. Each category subfolder has: (1) 20 background images with bounding boxes to insert foreground object; (2) 3 foreground sets (5 images each) with object masks, bounding box masks, object-free variants, and lighting variants.
- COCOEE (within-domain, single-ref): 500 background images from MSCOCO validation set. Each background image has a bounding box and a foreground image from MSCOCO training set.
- TF-ICON test benchmark (cross-domain, single-ref): 332 samples. Each sample consists of a background image, a foreground image, a user mask, and a text prompt.
- DreamEditBench (within-domain, multi-ref): 220 background images and 30 unique foreground objects from 15 categories.
- SAM-FB (within-domain, single-ref): built upon SA-1B (SAM dataset). 3,160,403 images with 3,439 foreground categories.
- Subjects 200K (within-domain, double-ref): 200,000 paired images. Each pair has the same subject yet various scene contexts.
- ORIDa (within-domain, multi-ref): 200 unique foreground objects. Each object is placed in an average of 50 diverse scenes. In each scene, one object is placed at 1~4 different positions.
- AnyInsertion (within-domain, single-ref): The training set includes 136,385 samples across two prompt types: 58,188 mask-prompt image pairs and 78,197 text-prompt image pairs. The test set includes 158 data pairs: 120 mask-prompt pairs and 38 text-prompt pairs.
- Shilin Lu, Zhuming Lian, Zihan Zhou, Shaocong Zhang, Chen Zhao, Adams Wai-Kin Kong: "Does FLUX Already Know How to Perform Physically Plausible Image Composition?" ICLR (2026) [arxiv]
- Yu Xu, Fan Tang, You Wu, Lin Gao, Oliver Deussen, Hongbin Yan, Jintao Li, Juan Cao, Tong-Yee Lee: "In-Context Brush: Zero-shot Customized Subject Insertion with Context-Aware Latent Space Manipulation." ACM SIGGRAPH ASIA (2025) [arxiv] [paper] [code]
- Haowen Li, Zhenfeng Fan, Zhang Wen, Zhengzhou Zhu, Yunjin Li: "AIComposer: Any Style and Content Image Composition via Feature Integration." (+text) ICCV (2025) [arxiv] [paper] [code]
- Pengzhi Li, Qiang Nie, Ying Chen, Xi Jiang, Kai Wu, Yuhuan Lin, Yong Liu, Jinlong Peng, Chengjie Wang, Feng Zheng: "Tuning-Free Image Customization with Image and Text Guidance." (+text) ECCV (2024) [arxiv] [paper] [code]
- Yibin Wang, Weizhong Zhang, Jianwei Zheng, Cheng Jin: "PrimeComposer: Faster Progressively Combined Diffusion for Image Composition with Attention Steering." (+text) ACM MM (2024) [arxiv] [paper] [code]
- Shilin Lu, Yanzhu Liu, Adams Wai-Kin Kong: "TF-ICON: Diffusion-based Training-free Cross-domain Image Composition." (+text) ICCV (2023) [arxiv] [paper] [code]
- Roy Hachnochi, Mingrui Zhao, Nadav Orzech, Rinon Gal, Ali Mahdavi-Amiri, Daniel Cohen-Or, Amit Haim Bermano: "Cross-domain Compositing with Pretrained Diffusion Models." arXiv:2302.10167 (2023) [arxiv] [code]
- Wensong Song, Hong Jiang, Zongxing Yang, Ruijie Quan, Yi Yang: "Insert Anything: Image Insertion via In-Context Editing in DiT." AAAI (2026) [arxiv] [code]
- Raghu Vamsi Chittersu, Yuvraj Singh Rathore, Pranav Adlinge, Kunal Swami: "Insert In Style: A Zero-Shot Generative Framework for Harmonious Cross-Domain Object Composition." arXiv:2511.15197 (2025) [arxiv]
- Dong Liang, Jinyuan Jia, Yuhao Liu, Rynson W.H. Lau: "HOComp: Interaction-Aware Human-Object Composition." NeurIPS (2025) [arxiv] [paper] [code]
- Qi Zhang, Guanyu Xing, Mengting Luo, Jianwei Zhang, Yanli Liu: "Inserting Objects into Any Background Images via Implicit Parametric Representation." IEEE Transactions on Visualization and Computer Graphics (2025) [paper]
- Lu Yang, Yuanhao Wang, Yicheng Liu, Enze Wang, Ziyang Zhao, Yanqi He, Zexian Song, Hao Lua: "UNICOM: Unified, foreground-aware, and context-realistic deep image composition with diffusion model." Neurocomputing (2025) [paper]
- Xi Chen, Lianghua Huang, Yu Liu, Yujun Shen, Deli Zhao, Hengshuang Zhao: "AnyDoor: Zero-shot Image Customization with Region-to-region Reference." T-PAMI (2025) [paper]
- Jinwoo Kim, Sangmin Han, Jinho Jeong, Jiwoo Choi, Dongyeong Kim, Seon Joo Kim: "ORIDa: Object-centric Real-world Image Composition Dataset." CVPR (2025) [arxiv] [paper]
- Gemma Canet Tarrés, Zhe Lin, Zhifei Zhang, He Zhang, Andrew Gilbert, John Collomosse, Soo Ye Kim: "Multitwine: Multi-Object Compositing with Text and Layout Control." (+text) CVPR (2025) [arxiv] [paper]
- Junjia Huang, Pengxiang Yan, Jiyang Liu, Jie Wu, Zhao Wang, Yitong Wang, Liang Lin, Guanbin Li: "DreamFuse: Adaptive Image Fusion with Diffusion Transformer." ICCV (2025) (+text) [arxiv] [paper] [code]
- Haoxuan Wang, Jinlong Peng, Qingdong He, Hao Yang, Ying Jin, Jiafu Wu, Xiaobin Hu, Yanjie Pan, Zhenye Gan, Mingmin Chi, Bo Peng, Yabiao Wang: "UniCombine: Unified Multi-Conditional Combination with Diffusion Transformer." ICCV (2025) [arxiv] [paper] [code]
- Daniel Winter, Asaf Shul, Matan Cohen, Dana Berman, Yael Pritch, Alex Rav-Acha, Yedid Hoshen: "ObjectMate: A Recurrence Prior for Object Insertion and Subject-Driven Generation." ICCV (2025) [arxiv] [paper]
- Yongsheng Yu, Ziyun Zeng, Haitian Zheng, Jiebo Luo: "OmniPaint: Mastering Object-Oriented Editing via Disentangled Insertion-Removal Inpainting." ICCV (2025) [arxiv] [paper] [code]
- Zitian Zhang, Frederic Fortier-Chouinard, Mathieu Garon, Anand Bhattad, Jean-Francois Lalonde: "ZeroComp: Zero-shot Object Compositing from Image Intrinsics via Diffusion." WACV (2025) [arxiv] [paper] [code]
- Jixuan He, Wanhua Li, Ye Liu, Junsik Kim, Donglai Wei, Hanspeter Pfister: "Affordance-Aware Object Insertion via Mask-Aware Dual Diffusion." arXiv:2412.14462 (2024) [arxiv] [code]
- Weijing Tao, Xiaofeng Yang, Biwen Lei, Miaomiao Cui, Xuansong Xie, Guosheng Lin: "MotionCom: Automatic and Motion-Aware Image Composition with LLM and Video Diffusion Prior." arXiv:2409.10090 (2024) [arxiv] [code]
- Daniel Winter, Matan Cohen, Shlomi Fruchter, Yael Pritch, Alex Rav-Acha, Yedid Hoshen: "ObjectDrop: Bootstrapping Counterfactuals for Photorealistic Object Removal and Insertion." ECCV (2024) [arxiv] [paper]
- Gemma Canet Tarrés, Zhe Lin, Zhifei Zhang, Jianming Zhang, Yizhi Song, Dan Ruta, Andrew Gilbert, John Collomosse, Soo Ye Kim:"Thinking Outside the BBox: Unconstrained Generative Object Compositing." ECCV (2024) [arxiv] [paper]
- Yizhi Song, Zhifei Zhang, Zhe Lin, Scott Cohen, Brian Price, Jianming Zhang, Soo Ye Kim, He Zhang, Wei Xiong, Daniel Aliaga: "IMPRINT: Generative Object Compositing by Learning Identity-Preserving Representation." CVPR (2024) [arxiv] [paper]
- Xi Chen, Lianghua Huang, Yu Liu, Yujun Shen, Deli Zhao, Hengshuang Zhao: "AnyDoor: Zero-shot Object-level Image Customization." CVPR (2024) [arxiv] [paper] [code]
- Vishnu Sarukkai, Linden Li, Arden Ma, Christopher Re, Kayvon Fatahalian: "Collage Diffusion." WACV (2024) [arxiv] [paper] [code]
- Ziyang Yuan, Mingdeng Cao, Xintao Wang, Zhongang Qi, Chun Yuan, Ying Shan: "CustomNet: Zero-shot Object Customization with Variable-Viewpoints in Text-to-Image Diffusion Models." ACM MM (2024) [arxiv] [paper] [code]
- Bo Zhang, Yuxuan Duan, Jun Lan, Yan Hong, Huijia Zhu, Weiqiang Wang, Li Niu: "ControlCom: Controllable Image Composition using Diffusion Model." arXiv:2308.10040 (2023) [arxiv] [code]
- Xin Zhang, Jiaxian Guo, Paul Yoo, Yutaka Matsuo, Yusuke Iwasawa: "Paste, Inpaint and Harmonize via Denoising: Subject-Driven Image Editing with Pre-Trained Diffusion Model." arXiv:2306.07596 (2023) [arxiv]
- Binxin Yang, Shuyang Gu, Bo Zhang, Ting Zhang, Xuejin Chen, Xiaoyan Sun, Dong Chen, Fang Wen: "Paint by Example: Exemplar-based Image Editing with Diffusion Models." CVPR (2023) [arxiv] [paper] [code]
- Yizhi Song, Zhifei Zhang, Zhe Lin, Scott Cohen, Brian Price, Jianming Zhang, Soo Ye Kim, Daniel Aliaga: "ObjectStitch: Generative Object Compositing." CVPR (2023) [arxiv] [paper] [code]
- Sumith Kulal, Tim Brooks, Alex Aiken, Jiajun Wu, Jimei Yang, Jingwan Lu, Alexei A. Efros, Krishna Kumar Singh: "Putting People in Their Place: Affordance-Aware Human Insertion into Scenes." CVPR (2023) [arxiv] [paper] [code]
- Jiaxuan Chen, Bo Zhang, Qingdong He, Jinlong Peng, Li Niu: "CareCom: Generative Image Composition with Calibrated Reference Features", AAAI (2026) [arxiv] [project]
- Jiaxuan Chen, Bo Zhang, Qingdong He, Jinlong Peng, Li Niu: "MureObjectStitch: Multi-reference Image Composition." arXiv:2411.07462 (2025) [arxiv] [code]
- Nataniel Ruiz, Yuanzhen Li, Neal Wadhwa, Yael Pritch, Michael Rubinstein, David E. Jacobs, Shlomi Fruchter: "Magic Insert: Style-Aware Drag-and-Drop." ICCV (2025) [arxiv] [paper]
- Zhekai Chen, Wen Wang, Zhen Yang, Zeqing Yuan, Hao Chen, Chunhua Shen: "FreeCompose: Generic Zero-Shot Image Composition with Diffusion Prior." ECCV (2024) [arxiv] [paper] [code]
- Lingxiao Lu, Bo Zhang, Li Niu: "DreamCom: Finetuning Text-guided Inpainting Model for Image Composition." arXiv:2309.15508 (2023) [arxiv] [code]
- Tianle Li, Max Ku, Cong Wei, Wenhu Chen: "DreamEdit: Subject-driven Image Editing." TMLR (2023) [arxiv] [paper] [code]
