The Segment Anything Model (SAM) is a foundational model for image segmentation tasks, known for its strong generalization across diverse applications. However, its impressive performance comes with significant computational and resource demands, making it challenging to deploy in resource-limited environments such as mobile devices. To address this, a variety of SAM variants have been proposed to enhance efficiency without sacrificing accuracy. This survey provides the first comprehensive review of these efficient SAM variants. We begin by exploring the motivations driving this research. We then present core techniques used in SAM and model acceleration. This is followed by an in-depth analysis of various acceleration strategies, categorized by approach. Finally, we offer a unified and extensive evaluation of these methods, assessing their efficiency and accuracy on representative benchmarks, and providing a clear comparison of their overall performance.
Segment Anything (SegAny), i.e. the promptable segmentation task, is the foundation task of SAM, whose goal is to return a valid mask with any given prompt (e.g. a point, a box, a mask, and text).
Variants below focus on accelerating SegAny:
| Model | Paper | Code | Description |
|---|---|---|---|
| FastSAM | arXiv | Github | Reformulate SAM’s pipeline with YOLOv8-Seg for all-instance segmentation and the later prompts-guided selection for SegAny. |
| SqueezeSAM | arXiv | Substitute SAM’s architecture with UNet-based encoder-decoder. | |
| EfficientSAM | CVPR2024 | Github | Leverage SAMI pre-trained ViT-T/ViT-S as lightweight image encoder. |
| RMP-SAM | ICLR2025 | Github | Construct with a lite backbone and a unified dynamic convolution decoder, with addpters for multi-purpose segmentation. |
| SAM 2 | arXiv | Github | Apply Hiera as efficient backbone and introduce memory mechanism to extent SAM to video tasks. |
| MobileSAM | arXiv | Github | Leverage encoder-only distillation from SAM’s ViT to MobileSAM’s TinyViT. |
| ESAM | ResearchGate | Replace the image encoder with EfficientFormerV2 and conduct holistic distillation from a expert model. | |
| NanoSAM | Github | Distill from MobileSAM with ResNet18 as backbone and optimize with TensorRT. | |
| PicoSAM2 | arXiv | U-Net based ultra light-weight SAM variant that can be depolyed on edge sensor with 8MB memory. | |
| RepViT-SAM | arXiv | Github | Substitute the image encoder with pure CNN-based RepViT and leverage MobileSAM’s distillation pipeline. |
| EdgeSAM | arXiv | Github | Substitue SAM’s image encoder with RepViT and adopt prompt-in-the-loop distillation. |
| EfficientViT-SAM | CVPR2024 | Github | Adopt the EfficientViT with ReLU linear attention as backbone and distill it from ViT-H. |
| FastSAM3D | MICCAI2024 | Github | Replace the image encoder with a ViT-Tiny variant and incorporate the Dilated Attention and FlashAttention for efficiency. |
| SAM-Lightening | arXiv | Leverage delated flash attention and propose dynamic layer-wise distillation. | |
| RWKV-SAM | arXiv | Adopt linear attention model RWKV into building efficient image encoder. | |
| TinySAM | AAAI2025 | Github | Leverage full-stage distillation with TinyViT as backbone, and adopt 8-bit quantization on encoder to get Q-TinySAM. |
| PTQ4SAM | CVPR2024 | Github | Eliminate the detrimental modal distribution and take the adaptive quantization on different distribution. |
| PQ-SAM | ECCV2024 | Transfer the activation distribution into quantization-friendly distribution by truncating, grouping and learnable transformation. | |
| SlimSAM | NeurIPS2024 | Github | Divide image encoder into two substructures and conduct structured pruning in an alternative manner. |
| SuperSAM | arXiv | Github | Apply the one-shot Neural Architecture Search with pruning-based methods to build up a supernetwork of SAM. |
| SAMfast | PyTorch Blog | Github | A rewrote version of SAM with pure, nature Pytorch optimizations. |
Segment Everything (SegEvery), i.e. the all-masks generation task, is an extension of SegAny task, which aims to segment all objects in a picture.
Variants below focus on accelerating SegEvery:
| Model | Paper | Code | Description |
|---|---|---|---|
| FastSAM | arXiv | Github | Directly leverage YOLOv8-Seg to segment everything in high efficiency. |
| MobileSAMV2 | arxiv | Github | Object-aware prompt sampling based on the external YOLOv8 detector. |
| TinySAM | AAAI2025 | Github | Hierarchical sampling strategy for efficient prompts selection. |
| Lite-SAM | ECCV2024 | LiteViT as lightweight backbone and AutoPPN for efficient prompts generation. | |
| AoP-SAM | AAAI2025 | Github | Generate prompts iteratively by coarse prediction and fine-grained filtering. |
Note: Variants like FastSAM and TinySAM propose efficient strategies for both tasks, so we put them in both lists.
Segment Anything Model 2, successor of Segment Anything Model, which not only achieves higher accuracy and efficiency on image segmentation task but also extends its powerful capability into video segmentation tasks, has also suffered inefficiency issues. Up till now, several works fousing on efficient SAM 2 have emerged. We organize them in following list.
| Model | Paper | Code | Description |
|---|---|---|---|
| EfficientTAM | arXiv | Github | Leverage SAMI-pretrained lightweight ViT as image encoder propose an efficient memory cross-attention to further improve the efficiency. |
| EdgeTAM | CVPR2025 | Github | Substitue backbone with RepViT and leverage a global Perciever and a novel 2D Spatial Perceiver to compress memories. |
| Surgical SAM 2 | arXiv | Github | Efficient frame pruning strategy to only retain the most informative frames. |
@artical{sun2024efficientvariantssegmentmodel,
title={On Efficient Variants of Segment Anything Model: A Survey},
author={Xiaorui Sun and Jun Liu and Heng Tao Shen and Xiaofeng Zhu and Ping Hu},
journal={arXiv preprint arXiv:2410.04960},
year={2024}
}
