Vision Type NLP Model들을 총망라하여 요약한 표 (HF-based)

Name Full Name Architecture Base Model Developed Training Dataset Lib. & Framework Use Cases HF URL Githhub URL
BEiT Bidirectional Encoder representation from Image Transformers Vision Transformer ViT 2021 ImageNet-21k, ImageNet-1k PyTorch, Hugging Face Transformers Image classification, semantic segmentation https://huggingface.co/microsoft/beit-base-patch16-224 https://github.com/microsoft/unilm/tree/master/beit
BiT Big Transfer ResNet ResNet 2019 JFT-300M, ImageNet-21k TensorFlow, Hugging Face Transformers Image classification, transfer learning https://huggingface.co/google/bit-50 https://github.com/google-research/big_transfer
Conditional DETR Conditional DETR Transformer DETR 2021 COCO PyTorch, Hugging Face Transformers Object detection https://huggingface.co/microsoft/conditional-detr-resnet-50 https://github.com/Atten4Vis/ConditionalDETR
ConvNeXT ConvNeXT Convolutional Neural Network ResNet 2022 ImageNet-1k PyTorch, Hugging Face Transformers Image classification https://huggingface.co/facebook/convnext-tiny-224 https://github.com/facebookresearch/ConvNeXt
ConvNeXTV2 ConvNeXT V2 Convolutional Neural Network ConvNeXT 2023 ImageNet-22k PyTorch, Hugging Face Transformers Image classification https://huggingface.co/facebook/convnextv2-tiny-1k-224 https://github.com/facebookresearch/ConvNeXt-V2
CvT Convolutional vision Transformer Vision Transformer ViT 2021 ImageNet-1k PyTorch, Hugging Face Transformers Image classification https://huggingface.co/microsoft/cvt-13 https://github.com/microsoft/CvT
DAB-DETR Dynamic Anchor Boxes DETR Transformer Conditional DETR 2022 COCO 2017 PyTorch, Hugging Face Transformers Object detection https://huggingface.co/IDEA-Research/dab-detr-resnet-50 https://github.com/IDEA-Research/DAB-DETR
Deformable DETR Deformable DETR Transformer DETR 2020 COCO PyTorch, Hugging Face Transformers Object detection https://huggingface.co/SenseTime/deformable-detr https://github.com/fundamentalvision/Deformable-DETR
DeiT Data-efficient image Transformers Vision Transformer ViT 2020 ImageNet-1k PyTorch, Hugging Face Transformers Image classification https://huggingface.co/facebook/deit-base-distilled-patch16-224 https://github.com/facebookresearch/deit
Depth Anything Depth Anything Vision Transformer DPT 2024 MiDaS dataset, custom large-scale dataset PyTorch, Hugging Face Transformers Monocular depth estimation https://huggingface.co/LiheYoung/depth-anything-small-hf https://github.com/LiheYoung/Depth-Anything
Depth Anything V2 Depth Anything V2 Dense Prediction Transformer (DPT) DINOv2 2024 595K synthetic images, 62M+ real unlabeled images PyTorch, Hugging Face Transformers Monocular depth estimation https://huggingface.co/LiheYoung/depth-anything-small-hf https://github.com/LiheYoung/Depth-Anything
DepthPro Depth Pro Multi-scale Vision Transformer - 2024 Mix of real and synthetic images PyTorch Monocular depth estimation, AR applications - -
DETA Detection Transformers with Assignment Transformer Swin Transformer 2023 COCO PyTorch, Hugging Face Transformers Object detection https://huggingface.co/jozhang97/deta-swin-large https://github.com/jozhang97/DETA
DETR DEtection TRansformer Transformer ResNet 2020 COCO PyTorch, Hugging Face Transformers Object detection https://huggingface.co/facebook/detr-resnet-50 https://github.com/facebookresearch/detr
DiNAT Dilated Neighborhood Attention Transformer Hierarchical Vision Transformer NAT 2022 ImageNet-1k PyTorch, NATTEN Image classification, object detection, segmentation https://huggingface.co/shi-labs/dinat-mini-in1k-224 https://github.com/SHI-Labs/Neighborhood-Attention-Transformer
DINOV2 DINO v2 Vision Transformer ViT 2023 Curated dataset from diverse sources PyTorch, Hugging Face Transformers Image classification, visual feature extraction https://huggingface.co/facebook/dinov2-base https://github.com/facebookresearch/dinov2
DINOv2 with Registers DINO v2 with Registers Vision Transformer DINOv2 2025 Same as DINOv2 PyTorch, Hugging Face Transformers Image classification, visual feature extraction https://huggingface.co/facebook/dinov2-with-registers-base https://github.com/facebookresearch/dinov2
DiT Document Image Transformer Vision Transformer BEiT 2022 Various document datasets PyTorch, Hugging Face Transformers Document image analysis, layout analysis, table detection https://huggingface.co/microsoft/dit-base https://github.com/microsoft/unilm/tree/master/dit
DPT Dense Prediction Transformer Vision Transformer ViT 2021 Various, including NYU Depth V2 PyTorch, Hugging Face Transformers Monocular depth estimation, semantic segmentation https://huggingface.co/Intel/dpt-large https://github.com/isl-org/DPT
EfficientFormer EfficientFormer Transformer - 2022 ImageNet-1K PyTorch Image Classification, Object Detection, Segmentation https://huggingface.co/docs/transformers/model_doc/efficientformer https://github.com/snap-research/EfficientFormer
EfficientNet EfficientNet Convolutional Neural Network MobileNetV2 2019 ImageNet TensorFlow, PyTorch Image classification, transfer learning https://huggingface.co/google/efficientnet-b0 https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet
FocalNet Focal Modulation Network Vision Transformer - 2022 ImageNet-1K, ImageNet-22K PyTorch Image classification, object detection, semantic segmentation https://huggingface.co/microsoft/focalnet-tiny https://github.com/microsoft/FocalNet
GLPN Global-Local Path Networks Hierarchical mix-Transformer SegFormer 2022 NYU Depth V2, KITTI PyTorch, Hugging Face Transformers Monocular depth estimation https://huggingface.co/vinvino02/glpn-kitti https://github.com/vinvino02/GLPDepth
Hiera Hierarchical Vision Transformer Vision Transformer - 2023 ImageNet-1K PyTorch Image and video recognition https://huggingface.co/facebook/hiera-base-224 https://github.com/facebookresearch/hiera
I-JEPA Image Joint Embedding Predictive Architecture Joint Embedding Predictive Architecture - 2024 Large-scale image datasets PyTorch Self-supervised image representation learning - -
ImageGPT Generative Pretraining from Pixels GPT-2-like GPT-2 2020 ImageNet PyTorch, Transformers Image Generation, Image Classification https://huggingface.co/docs/transformers/model_doc/imagegpt https://github.com/openai/image-gpt
LeViT LeViT Vision Transformer - 2018 ImageNet PyTorch, Hugging Face Transformers Image classification https://huggingface.co/docs/transformers/model_doc/levit https://github.com/huggingface/transformers
Mask2Former Masked-attention Mask Transformer Transformer Swin Transformer 2022 COCO, ADE20K, Cityscapes PyTorch, Detectron2 Instance Segmentation, Panoptic Segmentation, Semantic Segmentation https://huggingface.co/docs/transformers/model_doc/mask2former https://github.com/facebookresearch/Mask2Former
MaskFormer MaskFormer Transformer - 2023 ADE20K, Cityscapes, COCO, Mapillary Vistas PyTorch, Hugging Face Transformers Semantic segmentation, instance segmentation, panoptic segmentation https://huggingface.co/facebook/maskformer-swin-base-ade https://github.com/facebookresearch/MaskFormer
MobileNetV1 MobileNet Version 1 Convolutional Neural Network - 2017 ImageNet TensorFlow, PyTorch Mobile and embedded vision applications https://huggingface.co/google/mobilenet_v1_0.75_192 https://github.com/tensorflow/models/tree/master/research/slim/nets/mobilene
MobileNetV2 MobileNet Version 2 Convolutional Neural Network MobileNetV1 2019 ImageNet TensorFlow, Keras, PyTorch Mobile and embedded vision applications, image classification, object detection https://huggingface.co/google/mobilenet_v2_1.0_224 https://github.com/tensorflow/models/tree/master/research/slim/nets/mobilenet
MobileViT Mobile Vision Transformer Vision Transformer - 2021 ImageNet PyTorch, Hugging Face Transformers Image classification, object detection https://huggingface.co/apple/mobilevit-small https://github.com/apple/ml-cvnets
MobileViTV2 Mobile Vision Transformer Version 2 Vision Transformer MobileViT 2023 ImageNet PyTorch, Hugging Face Transformers Image classification, object detection https://huggingface.co/apple/mobilevitv2-1.0 https://github.com/apple/ml-cvnets
NAT Neighborhood Attention Transformer Vision Transformer - 2022 ImageNet PyTorch Image classification, object detection, segmentation https://huggingface.co/shi-labs/nat-mini-in1k-224 https://github.com/SHI-Labs/Neighborhood-Attention-Transformer
PoolFormer PoolFormer Transformer - 2022 ImageNet-1K PyTorch Image Classification https://huggingface.co/docs/transformers/model_doc/poolformer https://github.com/sail-sg/poolformer
PVT Pyramid Vision Transformer Vision Transformer - 2021 ImageNet PyTorch, Hugging Face Transformers Image classification, object detection, segmentation https://huggingface.co/microsoft/pvt-tiny-224 https://github.com/whai362/PVT
PVTv2 Pyramid Vision Transformer Version 2 Vision Transformer PVT 2022 ImageNet PyTorch, Hugging Face Transformers Image classification, object detection, segmentation https://huggingface.co/microsoft/pvt-v2-b0-224 https://github.com/whai362/PVT
RegNet Designing Network Design Spaces ConvNet - 2020 ImageNet PyTorch, FAIR Image Classification, Object Detection https://huggingface.co/docs/transformers/model_doc/regnet https://github.com/facebookresearch/pycls
ResNet Residual Network Convolutional Neural Network - 2015 ImageNet PyTorch, TensorFlow, Keras Image classification, object detection, segmentation https://huggingface.co/microsoft/resnet-50 https://github.com/KaimingHe/deep-residual-networks
RT-DETR Real-Time Detection Transformer Transformer - 2024 COCO PyTorch, Hugging Face Transformers Real-time object detection https://huggingface.co/docs/transformers/model_doc/rt_detr https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rtdetr
RT-DETRv2 Real-Time Detection Transformer Version 2 Transformer RT-DETR 2024 COCO PyTorch, Hugging Face Transformers Real-time object detection https://huggingface.co/docs/transformers/model_doc/rt_detr https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rtdetr
SegFormer Segmentation Transformer Vision Transformer - 2021 ADE20K, Cityscapes PyTorch, Hugging Face Transformers Semantic segmentation https://huggingface.co/docs/transformers/model_doc/segformer https://github.com/NVlabs/SegFormer
SegGpt Segmenting Everything In Context Transformer GPT 2023 SA-1B PyTorch Image Segmentation, Visual Grounding https://huggingface.co/BAAI/SegGPT https://github.com/baaivision/Painter
SuperGlue SuperGlue Graph Neural Network - 2020 MegaDepth, COCO PyTorch Feature Matching, Image Registration https://huggingface.co/docs/transformers/model_doc/superglue https://github.com/magicleap/SuperGluePretrainedNetwork
SuperPoint SuperPoint ConvNet - 2018 MS-COCO PyTorch Feature Detection, Description https://huggingface.co/docs/transformers/model_doc/superpoint https://github.com/magicleap/SuperPointPretrainedNetwork
SwiftFormer SwiftFormer Transformer-based with efficient additive attention - 2023 ImageNet-1K PyTorch, Hugging Face Transformers Image classification, mobile vision applications https://huggingface.co/MBZUAI/swiftformer-s https://github.com/huggingface/transformers/blob/main/src/transformers/models/swiftformer/modeling_swiftformer.py
Swin Transformer Swin Transformer Hierarchical Transformer - 2021 ImageNet-1K, ImageNet-22K PyTorch, Hugging Face Transformers Image classification, object detection, semantic segmentation https://huggingface.co/microsoft/swin-tiny-patch4-window7-224 https://github.com/microsoft/Swin-Transformer
Swin Transformer V2 Swin Transformer V2 Hierarchical Transformer with improved training stability Swin Transformer 2022 ImageNet-22K PyTorch, Hugging Face Transformers Image classification, object detection, semantic segmentation https://huggingface.co/microsoft/swinv2-tiny-patch4-window8-256 https://github.com/microsoft/Swin-Transformer
Swin2SR Swin2SR Swin Transformer for Super-Resolution Swin Transformer 2022 DIV2K, Flickr2K PyTorch Image super-resolution https://huggingface.co/caidas/swin2SR-classical-sr-x2-64 https://github.com/mv-lab/swin2sr
Table Transformer Table Transformer Transformer-based DETR 2022 PubTables-1M PyTorch, Hugging Face Transformers Table structure recognition https://huggingface.co/microsoft/table-transformer-detection https://github.com/microsoft/table-transformer
TextNet TextNet CNN-based - 2018 SynthText, Total-Text PyTorch Scene text detection and recognition https://huggingface.co/microsoft/trocr-base-printed https://github.com/tonghe90/textnet
Timm Wrapper PyTorch Image Models Wrapper Various - 2025 ImageNet PyTorch, Hugging Face Transformers Image classification https://huggingface.co/docs/transformers/en/model_doc/timm_wrapper https://github.com/huggingface/transformers
UperNet Unified Perceptual Parsing Network Transformer Various (e.g., Swin, ConvNeXt) 2018 ADE20K, Cityscapes PyTorch, Hugging Face Transformers Semantic segmentation https://huggingface.co/docs/transformers/model_doc/upernet https://github.com/huggingface/transformers
VAN Visual Attention Network Attention-based CNN - 2022 ImageNet-1K PyTorch, Hugging Face Transformers Image classification https://huggingface.co/Visual-Attention-Network/van-base https://github.com/Visual-Attention-Network/VAN-Classification
Vision Transformer (ViT) Vision Transformer Transformer - 2020 ImageNet PyTorch, TensorFlow, Hugging Face Transformers Image classification https://huggingface.co/google/vit-base-patch16-224 https://github.com/google-research/vision_transformer
ViT Hybrid Vision Transformer Hybrid Hybrid CNN-Transformer - 2020 ImageNet-21K, ImageNet-1K PyTorch, Hugging Face Transformers Image classification https://huggingface.co/google/vit-hybrid-base-bit-384 https://github.com/google-research/vision_transformer
ViTDet Vision Transformer for Object Detection Transformer-based ViT 2022 COCO PyTorch, Detectron2 Object detection https://huggingface.co/facebook/vit-det-base https://github.com/facebookresearch/detectron2
ViTMAE Vision Transformer with Masked Autoencoders Transformer-based ViT 2021 ImageNet-1K PyTorch, Hugging Face Transformers Self-supervised learning, image classification https://huggingface.co/facebook/vit-mae-base https://github.com/facebookresearch/mae
ViTMatte Vision Transformer for Image Matting Transformer-based ViT 2022 Adobe Image Matting Dataset PyTorch Image matting https://huggingface.co/hustvl/vitmatte-small-composition-1k https://github.com/hustvl/ViTMatte
ViTMSN Vision Transformer with Masked Siamese Networks Transformer-based ViT 2022 ImageNet-1K PyTorch Self-supervised learning, image classification https://huggingface.co/facebook/vit-msn-small https://github.com/facebookresearch/msn
ViTPose Vision Transformer for Human Pose Estimation Transformer-based ViT 2022 COCO PyTorch, MMPose Human pose estimation https://huggingface.co/open-mmlab/vit-pose-base https://github.com/open-mmlab/mmpose
YOLOS You Only Look at One Sequence Transformer-based DETR 2021 COCO PyTorch, Hugging Face Transformers Object detection https://huggingface.co/hustvl/yolos-tiny https://github.com/hustvl/YOLOS
ZoeDepth ZoeDepth Transformer-based DPT 2023 NYU Depth V2, KITTI PyTorch Monocular depth estimation https://huggingface.co/shariqfarooq/ZoeDepth https://github.com/isl-org/ZoeDepth