Skip to content

bhllx/On-Efficient-Variants-of-Segment-Anything-Model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 

Repository files navigation

On-Efficient-Variants-of-Segment-Anything-Model

[Paper]

The Segment Anything Model (SAM) is a foundational model for image segmentation tasks, known for its strong generalization across diverse applications. However, its impressive performance comes with significant computational and resource demands, making it challenging to deploy in resource-limited environments such as mobile devices. To address this, a variety of SAM variants have been proposed to enhance efficiency without sacrificing accuracy. This survey provides the first comprehensive review of these efficient SAM variants. We begin by exploring the motivations driving this research. We then present core techniques used in SAM and model acceleration. This is followed by an in-depth analysis of various acceleration strategies, categorized by approach. Finally, we offer a unified and extensive evaluation of these methods, assessing their efficiency and accuracy on representative benchmarks, and providing a clear comparison of their overall performance.

Taxonomy

Efficient SAM Variants

Accelerating SegAny

Segment Anything (SegAny), i.e. the promptable segmentation task, is the foundation task of SAM, whose goal is to return a valid mask with any given prompt (e.g. a point, a box, a mask, and text).

Variants below focus on accelerating SegAny:

Model Paper Code Description
FastSAM arXiv Github Reformulate SAM’s pipeline with YOLOv8-Seg for all-instance segmentation and the later prompts-guided selection for SegAny.
SqueezeSAM arXiv Substitute SAM’s architecture with UNet-based encoder-decoder.
EfficientSAM CVPR2024 Github Leverage SAMI pre-trained ViT-T/ViT-S as lightweight image encoder.
RMP-SAM ICLR2025 Github Construct with a lite backbone and a unified dynamic convolution decoder, with addpters for multi-purpose segmentation.
SAM 2 arXiv Github Apply Hiera as efficient backbone and introduce memory mechanism to extent SAM to video tasks.
MobileSAM arXiv Github Leverage encoder-only distillation from SAM’s ViT to MobileSAM’s TinyViT.
ESAM ResearchGate Replace the image encoder with EfficientFormerV2 and conduct holistic distillation from a expert model.
NanoSAM Github Distill from MobileSAM with ResNet18 as backbone and optimize with TensorRT.
PicoSAM2 arXiv U-Net based ultra light-weight SAM variant that can be depolyed on edge sensor with 8MB memory.
RepViT-SAM arXiv Github Substitute the image encoder with pure CNN-based RepViT and leverage MobileSAM’s distillation pipeline.
EdgeSAM arXiv Github Substitue SAM’s image encoder with RepViT and adopt prompt-in-the-loop distillation.
EfficientViT-SAM CVPR2024 Github Adopt the EfficientViT with ReLU linear attention as backbone and distill it from ViT-H.
FastSAM3D MICCAI2024 Github Replace the image encoder with a ViT-Tiny variant and incorporate the Dilated Attention and FlashAttention for efficiency.
SAM-Lightening arXiv Leverage delated flash attention and propose dynamic layer-wise distillation.
RWKV-SAM arXiv Adopt linear attention model RWKV into building efficient image encoder.
TinySAM AAAI2025 Github Leverage full-stage distillation with TinyViT as backbone, and adopt 8-bit quantization on encoder to get Q-TinySAM.
PTQ4SAM CVPR2024 Github Eliminate the detrimental modal distribution and take the adaptive quantization on different distribution.
PQ-SAM ECCV2024 Transfer the activation distribution into quantization-friendly distribution by truncating, grouping and learnable transformation.
SlimSAM NeurIPS2024 Github Divide image encoder into two substructures and conduct structured pruning in an alternative manner.
SuperSAM arXiv Github Apply the one-shot Neural Architecture Search with pruning-based methods to build up a supernetwork of SAM.
SAMfast PyTorch Blog Github A rewrote version of SAM with pure, nature Pytorch optimizations.

Accelerating SegEvery

Segment Everything (SegEvery), i.e. the all-masks generation task, is an extension of SegAny task, which aims to segment all objects in a picture.

Variants below focus on accelerating SegEvery:

Model Paper Code Description
FastSAM arXiv Github Directly leverage YOLOv8-Seg to segment everything in high efficiency.
MobileSAMV2 arxiv Github Object-aware prompt sampling based on the external YOLOv8 detector.
TinySAM AAAI2025 Github Hierarchical sampling strategy for efficient prompts selection.
Lite-SAM ECCV2024 LiteViT as lightweight backbone and AutoPPN for efficient prompts generation.
AoP-SAM AAAI2025 Github Generate prompts iteratively by coarse prediction and fine-grained filtering.

Note: Variants like FastSAM and TinySAM propose efficient strategies for both tasks, so we put them in both lists.

Efficient SAM 2 Variants

Segment Anything Model 2, successor of Segment Anything Model, which not only achieves higher accuracy and efficiency on image segmentation task but also extends its powerful capability into video segmentation tasks, has also suffered inefficiency issues. Up till now, several works fousing on efficient SAM 2 have emerged. We organize them in following list.

Model Paper Code Description
EfficientTAM arXiv Github Leverage SAMI-pretrained lightweight ViT as image encoder propose an efficient memory cross-attention to further improve the efficiency.
EdgeTAM CVPR2025 Github Substitue backbone with RepViT and leverage a global Perciever and a novel 2D Spatial Perceiver to compress memories.
Surgical SAM 2 arXiv Github Efficient frame pruning strategy to only retain the most informative frames.

Citation

  @artical{sun2024efficientvariantssegmentmodel,
        title={On Efficient Variants of Segment Anything Model: A Survey}, 
        author={Xiaorui Sun and Jun Liu and Heng Tao Shen and Xiaofeng Zhu and Ping Hu},
        journal={arXiv preprint arXiv:2410.04960},
        year={2024}
  }

Releases

No releases published

Packages

 
 
 

Contributors