Text Type NLP Model들을 총망라하여 요약한 표 (HF-based)

Name Full Name Architecture Base Model Developed Training Dataset Lib. & Framework Use Cases HF URL Githhub URL
ALBERT A Lite BERT Transformer-based sequence-to-sequence BERT 2019 BookCorpus, English Wikipedia TensorFlow, PyTorch Natural Language Understanding https://huggingface.co/albert-base-v2 https://github.com/google-research/albert
Bamba Bamba Transformer GPT-2 2023 Bambara language corpus PyTorch Bambara language generation https://huggingface.co/masakhane/bamba https://github.com/masakhane-io/bamba
BART Bidirectional and Auto-Regressive Transformers Transformer BERT+GPT 2019 BookCorpus, CC-News, OpenWebText, Stories PyTorch Text summarization, translation, question answering https://huggingface.co/facebook/bart-large https://github.com/facebookresearch/fairseq/tree/main/examples/bart
BARThez BART for French Transformer BART 2020 French subset of OSCAR corpus PyTorch French text generation and understanding https://huggingface.co/moussaKam/barthez https://github.com/moussaKam/BARThez
BARTpho BART for Vietnamese Transformer BART 2022 Vietnamese news articles, web content PyTorch Vietnamese text summarization, generation https://huggingface.co/vinai/bartpho-syllable https://github.com/VinAIResearch/BARTpho
BERT Bidirectional Encoder Representations from Transformers Transformer Original Encoder-only 2018 BookCorpus, English Wikipedia TensorFlow, PyTorch Natural Language Understanding https://huggingface.co/bert-base-uncased https://github.com/google-research/bert
BertGeneration BERT for text generation Transformer BERT 2019 Same as BERT PyTorch Text generation https://huggingface.co/google/bert_for_seq_generation_L-24_bbc_encoder https://github.com/google-research/bert/tree/master/generation
BertJapanese BERT for Japanese Transformer BERT 2019 Japanese Wikipedia, web texts TensorFlow, PyTorch Japanese language understanding https://huggingface.co/cl-tohoku/bert-base-japanese https://github.com/cl-tohoku/bert-japanese
Bertweet BERT for Twitter Transformer RoBERTa 2020 850M English tweets PyTorch Twitter text analysis, sentiment analysis https://huggingface.co/vinai/bertweet-base https://github.com/VinAIResearch/BERTweet
BigBird Big Bidirectional Insertion Representations Transformer BERT 2020 RealNews dataset, English Wikipedia, C4 TensorFlow, PyTorch Long document understanding https://huggingface.co/google/bigbird-roberta-base https://github.com/google-research/bigbird
BigBirdPegasus BigBird-Pegasus Transformer BERT, Pegasus 2020 RealNews, English Wikipedia, C4 PyTorch Long document summarization, question-answering https://huggingface.co/google/bigbird-pegasus-large-bigpatent https://github.com/google-research/bigbird
BioGPT Biomedical Generative Pre-trained Transformer Transformer GPT-2 2022 15M PubMed abstracts PyTorch Biomedical text generation, relation extraction, question answering https://huggingface.co/microsoft/biogpt https://github.com/microsoft/BioGPT
BlenderBot BlenderBot 3 Transformer-based OPT-175B 2022 Not specified PyTorch, Hugging Face Transformers Open-domain chatbot, conversational AI https://huggingface.co/facebook/blenderbot-3B https://github.com/facebookresearch/ParlAI
BlenderBot Small BlenderBot Small Transformer-based encoder-decoder BlenderBot 2020 Not specified PyTorch, Hugging Face Transformers Open-domain dialogue, conversational AI https://huggingface.co/facebook/blenderbot_small-90M https://github.com/facebookresearch/ParlAI
BLOOM BLOOM Transformer-based Decoder-only 2022 1.5TB of pre-processed text (350B unique tokens) in 45 natural languages and 12 programming languages PyTorch, Hugging Face Transformers Language generation, information extraction, question answering, summarization https://huggingface.co/bigscience/bloom https://github.com/bigscience-workshop/bigscience
BORT BERT Optimized for Resource-constrained Training Transformer BERT 2020 BookCorpus, English Wikipedia PyTorch Efficient natural language understanding https://huggingface.co/alexa/bort https://github.com/alexa/bort
ByT5 Byte-level T5 Transformer T5 2021 C4 dataset TensorFlow, JAX Multilingual text-to-text generation https://huggingface.co/google/byt5-small https://github.com/google-research/byt5
CamemBERT Camembert Transformer RoBERTa 2019 OSCAR French corpus PyTorch French language understanding https://huggingface.co/camembert-base https://github.com/pytorch/fairseq
CANINE Character Architecture for Networks with Integrated Embeddings Transformer BERT 2021 Wikipedia in 100+ languages TensorFlow Multilingual text processing https://huggingface.co/google/canine-s https://github.com/google-research/language
CodeGen CodeGen Transformer GPT 2022 GitHub code repositories PyTorch Code generation, completion https://huggingface.co/Salesforce/codegen-350M-mono https://github.com/salesforce/CodeGen
CodeLlama Code Llama Transformer LLaMA 2023 500B tokens of code and code-related data PyTorch Code generation, completion, debugging https://huggingface.co/codellama https://github.com/facebookresearch/llama
Cohere Cohere's Command Models Transformer Self-developed 2025 Proprietary dataset PyTorch Text generation, summarization https://huggingface.co/docs/transformers/ko/model_doc/cohere https://cohere.com
Cohere2 Cohere's second generation of Command models Transformer Self-developed 2025 Proprietary dataset PyTorch Conversational agents, RAG systems https://cohere.com
ConvBERT Convolutional BERT Transformer + CNN BERT 2020 BookCorpus, English Wikipedia PyTorch Natural language understanding https://huggingface.co/YituTech/conv-bert-base https://github.com/YituTech/ConvBERT
CPM Chinese Pre-trained Models Transformer GPT 2020 Large-scale Chinese corpus TensorFlow Chinese text generation, understanding https://huggingface.co/TsinghuaAI/CPM-Generate
CPMANT CPM Ant Transformer CPM 2021 Large-scale Chinese corpus TensorFlow Chinese language tasks
CTRL Conditional Transformer Language Model Transformer Transformer-XL 2019 140GB of structured text TensorFlow Controlled text generation https://huggingface.co/salesforce/ctrl https://github.com/salesforce/ctrl
DBRX Deep Bidirectional Representations with XLM-Roberta Extensions (DBRX) Transformer XLM-Roberta Multilingual datasets PyTorch Multilingual text understanding
DeBERTa Decoding-enhanced BERT with Disentangled Attention Transformer BERT 2020 BookCorpus, English Wikipedia PyTorch Natural language understanding https://huggingface.co/microsoft/deberta-base https://github.com/microsoft/DeBERTa
DeBERTa-v2 Decoding-enhanced BERT with Disentangled Attention v2 Transformer DeBERTa 2021 BookCorpus, English Wikipedia, CC-News, OpenWebText, Stories PyTorch Natural language understanding, classification tasks https://huggingface.co/microsoft/deberta-v2-xlarge-mnli https://github.com/microsoft/DeBERTa
DialoGPT Dialogue Generative Pre-trained Transformer Transformer GPT-2 2020 147M conversation-like exchanges from Reddit PyTorch Open-domain dialogue generation https://huggingface.co/microsoft/DialoGPT-small https://github.com/microsoft/DialoGPT
DiffLlama Diffusion LLaMA Transformer LLaMA 2023 PyTorch Text generation with diffusion https://huggingface.co/stabilityai/diffllama
DistilBERT Distilled BERT Transformer BERT 2019 Same as BERT (BookCorpus, English Wikipedia) PyTorch Efficient natural language understanding https://huggingface.co/distilbert-base-uncased https://github.com/huggingface/transformers
DPR Dense Passage Retriever Dual Encoder BERT 2020 Wikipedia PyTorch Open-domain question answering https://huggingface.co/facebook/dpr-question_encoder-single-nq-base https://github.com/facebookresearch/DPR
ELECTRA Efficiently Learning an Encoder that Classifies Token Replacements Accurately Transformer BERT 2020 Same as BERT TensorFlow, PyTorch Natural language understanding https://huggingface.co/google/electra-small-discriminator https://github.com/google-research/electra
ERNIE Enhanced Representation through kNowledge IntEgration Transformer BERT 2019 Chinese Wikipedia, Baidu Baike, Baidu news PaddlePaddle, PyTorch Chinese language understanding https://huggingface.co/nghuyong/ernie-1.0-base-zh https://github.com/PaddlePaddle/ERNIE
ErnieM ERNIE-M Transformer ERNIE 2021 Multilingual corpus PaddlePaddle Multilingual language understanding https://huggingface.co/susnato/ernie-m-base_pytorch https://github.com/PaddlePaddle/ERNIE
ESM Evolutionary Scale Modeling Transformer encoder, similar to BERT 2021 Protein sequences PyTorch Protein language modeling https://huggingface.co/facebook/esm2_t33_650M_UR50D https://github.com/facebookresearch/esm
Falcon Falcon Transformer Decoder-only, similar to GPT 2023 RefinedWeb dataset PyTorch General language tasks https://huggingface.co/tiiuae/falcon-7b https://github.com/falconry/falcon
Falcon3 Falcon 3 Transformer Falcon model family 2024 RefinedWeb dataset, high-quality data PyTorch General language tasks, reasoning, math https://huggingface.co/tiiuae/falcon3-7b-base
FalconMamba Falcon Mamba Mamba (SSM) Falcon, Mamba, a State Space Model (SSM) 2024 5.8 trillion tokens PyTorch Long-context language tasks https://huggingface.co/tiiuae/Falcon3-Mamba-7B-Base
FLAN-T5 Fine-tuned Language Net T5 Transformer T5 2022 Over 1,000 tasks in multiple languages JAX, PyTorch Multi-task language understanding https://huggingface.co/google/flan-t5-small https://github.com/google-research/t5x
FLAN-UL2 Fine-tuned Language Net UL2 Transformer UL2 2022 Multiple datasets, similar to FLAN-T5 JAX, PyTorch Multi-task language understanding https://huggingface.co/google/flan-ul2
FlauBERT French Language Understanding BERT Transformer BERT 2019 Large heterogeneous French corpus PyTorch French NLP tasks https://huggingface.co/flaubert/flaubert_base_cased
FNet Fourier Neural Network Fourier Transform Fourier Transforms 2021 Similar to BERT JAX, PyTorch Efficient language understanding https://huggingface.co/google/fnet-base https://github.com/google-research/google-research/tree/master/f_net
FSMT Facebook Speech-to-Text Translation Transformer Transformer-based sequence-to-sequence 2020 Various parallel corpora PyTorch Machine translation https://huggingface.co/facebook/wmt19-en-de https://github.com/pytorch/fairseq
Funnel Transformer Funnel Transformer Transformer BERT 2020 Similar to BERT TensorFlow, PyTorch Efficient language understanding https://huggingface.co/funnel-transformer/small https://github.com/laiguokun/Funnel-Transformer
Fuyu Fuyu-8B Transformer multimodal transformer 2023 JAX Multimodal tasks https://huggingface.co/adept/fuyu-8b
Gemma Gemma Transformer Transformer-based Language 2024 JAX General language tasks https://huggingface.co/google/gemma-7b
Gemma2 Gemma 2 Transformer improved version of Gemma 2024 13 trillion tokens (27B), 8 trillion tokens (9B) JAX General language tasks, reasoning https://huggingface.co/google/gemma-7b
GLM General Language Model Transformer PyTorch General language tasks https://huggingface.co/THUDM/glm-large-chinese https://github.com/THUDM/GLM
GPT Generative Pre-trained Transformer Transformer 2018 WebText TensorFlow, PyTorch Text generation, language understanding https://huggingface.co/openai-gpt https://github.com/openai/gpt-2
GPT Neo GPT Neo Transformer GPT 2021 The Pile TensorFlow, JAX Text generation, language tasks https://huggingface.co/EleutherAI/gpt-neo-1.3B https://github.com/EleutherAI/gpt-neo
GPT NeoX GPT NeoX Transformer GPT Neo 2022 The Pile PyTorch Large-scale language modeling https://huggingface.co/EleutherAI/gpt-neox-20b https://github.com/EleutherAI/gpt-neox
GPT NeoX Japanese GPT NeoX Japanese Transformer GPT NeoX Japanese web texts PyTorch Japanese language tasks https://huggingface.co/rinna/japanese-gpt-neox-3.6b
GPT-J GPT-J Transformer GPT 2021 The Pile (825 GB) JAX Text generation, language tasks https://huggingface.co/EleutherAI/gpt-j-6B https://github.com/kingoflolz/mesh-transformer-jax
GPT2 Generative Pre-trained Transformer 2 Transformer GPT 2019 WebText TensorFlow, PyTorch Text generation, language understanding https://huggingface.co/gpt2 https://github.com/openai/gpt-2
GPTBigCode GPT BigCode Transformer GPT Code repositories PyTorch Code generation, completion https://huggingface.co/bigcode https://github.com/bigcode-project/bigcode-encoder
GPTSAN Japanese GPT Self-Attention Network Japanese Transformer Japanese web texts PyTorch Japanese language tasks https://huggingface.co/tanreinama/GPTSAN-2.7B-instrct-sft
GPTSw3 GPT Swedish 3 Transformer GPT 2023 320B tokens in Swedish, Norwegian, Danish, Icelandic, English, and programming code PyTorch Nordic language generation, coding https://huggingface.co/AI-Sweden-Models/gpt-sw3-6.7b
Granite Granite 3.0 Transformer 2024 12 trillion tokens PyTorch NLP, coding, reasoning, tool usage https://huggingface.co/ibm-granite https://github.com/ibm-granite/granite-3.0-language-models
GraniteMoe Granite 3.0 Mixture-of-Experts Transformer (MoE) 2024 10 trillion tokens PyTorch Efficient NLP tasks https://huggingface.co/ibm-granite https://github.com/ibm-granite/granite-3.0-language-models
GraniteMoeShared GraniteMoeShared Mixture of Experts (MoE) Granite 2025 PyTorch, Hugging Face Transformers General language tasks https://huggingface.co/ibm-granite/granite-moe-shared
GraniteVision Granite Vision 3.1 2B Hybrid (Vision Transformer + Language Model) LLaVA-NeXT, Granite 2025 Publicly available datasets, internally created synthetic data PyTorch, Hugging Face Transformers Visual document understanding, content extraction from tables, charts, diagrams, sketches, and infographics https://huggingface.co/ibm-granite/granite-vision-3.1-2b-preview
Helium Helium-1 Transformer 2025 2.5T tokens PyTorch Multilingual language tasks, edge computing https://huggingface.co/kyutai/helium-1-preview-2b
HerBERT HerBERT Transformer BERT 2020 6 Polish corpora, 8B tokens PyTorch Polish NLP tasks https://huggingface.co/allegro/herbert-base-cased
I-BERT
Jamba Jamba Hybrid Transformer-Mamba MoE 2024 PyTorch Long-context language tasks, reasoning https://huggingface.co/ai21labs/Jamba-v0.1
JetMoe JetMoe Mixture-of-Experts (MoE) 2024 1.25T tokens from public datasets PyTorch Text generation, code completion, conversational dialogue https://huggingface.co/docs/transformers/model_doc/jetmoe
Jukebox Jukebox VQ-VAE 2020 1.2 million songs PyTorch Music generation with lyrics https://huggingface.co/docs/transformers/model_doc/jukebox https://github.com/openai/jukebox
LED Longformer Encoder-Decoder Transformer BART 2020 Various long-form datasets PyTorch, Transformers Long document summarization, question answering https://huggingface.co/docs/transformers/model_doc/led https://github.com/allenai/longformer
LLaMA Large Language Model Meta AI Transformer 2023 2 trillion tokens PyTorch General language tasks, text generation https://huggingface.co/docs/transformers/model_doc/llama https://github.com/facebookresearch/llama
Llama2 Large Language Model Meta AI 2 Transformer LLaMA 2023 2 trillion tokens PyTorch General language tasks, text generation https://huggingface.co/meta-llama https://github.com/facebookresearch/llama
Llama3 Large Language Model Meta AI 3 Transformer Llama2 2024 15 trillion tokens PyTorch Advanced language tasks, reasoning, code generation https://huggingface.co/meta-llama Not publicly available
Longformer Longformer Transformer RoBERTa 2020 Various long-form datasets PyTorch, Transformers Long document processing, classification https://huggingface.co/docs/transformers/model_doc/longformer https://github.com/allenai/longformer
LongT5 Long Text-To-Text Transfer Transformer Transformer T5 2021 C4, Wikipedia, PubMed JAX, Flax Long-form generation, summarization https://huggingface.co/docs/transformers/model_doc/longt5 https://github.com/google-research/longt5
LUKE Language Understanding with Knowledge-based Embeddings Transformer RoBERTa 2020 Wikipedia PyTorch Entity-aware NLP tasks, named entity recognition https://huggingface.co/docs/transformers/model_doc/luke https://github.com/studio-ousia/luke
M2M100 Many-to-Many Multilingual Translation Transformer 2020 Web-crawled data from 100 languages PyTorch, Fairseq Multilingual machine translation https://huggingface.co/docs/transformers/model_doc/m2m_100 https://github.com/pytorch/fairseq/tree/main/examples/m2m_100
MADLAD-400 Massive Diverse Language Model Transformer PaLM 2023 2.2T tokens from 419 languages JAX, T5X Multilingual NLP tasks, translation https://huggingface.co/google/madlad-400-3b-mt https://github.com/google-research/t5x
Mamba Mamba Structured State Space Model (SSM) 2023 PyTorch Sequence modeling, language tasks https://huggingface.co/state-spaces/mamba-2.8b https://github.com/state-spaces/mamba
mamba2 Mamba-2 Structured State Space Model (SSM) Mamba 2024 PyTorch Sequence modeling, language tasks
MarianMT Marian Machine Translation Transformer 2018 Various parallel corpora PyTorch, Transformers Neural machine translation https://huggingface.co/docs/transformers/model_doc/marian https://github.com/marian-nmt/marian
MarkupLM Markup Language Model Transformer RoBERTa 2021 HTML/XML documents PyTorch, Transformers Web page understanding, HTML/XML processing https://huggingface.co/docs/transformers/model_doc/markuplm https://github.com/microsoft/unilm/tree/master/markuplm
MBart and MBart-50 Multilingual BART Transformer BART 2020 CC25 corpus, 50 languages PyTorch, Transformers Multilingual translation, text generation https://huggingface.co/docs/transformers/model_doc/mbart https://github.com/pytorch/fairseq/tree/main/examples/mbart
MEGA Moving Average Equipped Gated Attention Transformer 2022 PyTorch Long-range dependency modeling https://huggingface.co/docs/transformers/model_doc/mega https://github.com/facebookresearch/mega
MegatronBERT Megatron BERT Transformer BERT 2020 PyTorch Large-scale language understanding https://huggingface.co/docs/transformers/model_doc/megatron-bert https://github.com/NVIDIA/Megatron-LM
MegatronGPT2 Megatron GPT-2 Transformer GPT-2 2020 PyTorch Large-scale language generation https://huggingface.co/docs/transformers/model_doc/megatron_gpt2 https://github.com/NVIDIA/Megatron-LM
Mistral Mistral Transformer 2023 PyTorch General language tasks, text generation https://huggingface.co/mistralai/Mistral-7B-v0.1 https://github.com/mistralai/mistral-src
Mixtral Mixtral Mixture of Experts (MoE) Mistral 2023 PyTorch General language tasks, text generation https://huggingface.co/mistralai/Mixtral-8x7B-v0.1 Not publicly available
mLUKE Multilingual Language Understanding with Knowledge-based Embeddings Transformer XLM-RoBERTa 2022 Wikipedia in 24 languages PyTorch, Transformers Cross-lingual NLP tasks, entity-based reasoning https://huggingface.co/docs/transformers/model_doc/mluke https://github.com/studio-ousia/luke
MobileBERT MobileBERT Transformer BERT 2020 Same as BERT TensorFlow, PyTorch On-device NLP tasks https://huggingface.co/docs/transformers/model_doc/mobilebert https://github.com/google-research/google-research/tree/master/mobilebert
ModernBERT Modern BERT Transformer BERT 2023 PyTorch General language understanding
MPNet Masked and Permuted Pre-training for Language Understanding Transformer 2020 BookCorpus, English Wikipedia PyTorch, Transformers General language understanding https://huggingface.co/docs/transformers/model_doc/mpnet https://github.com/microsoft/MPNet
MPT MPT (MosaicML Pretrained Transformer) Transformer GPT 2023 PyTorch General language tasks https://huggingface.co/mosaicml/mpt-7b https://github.com/mosaicml/llm-foundry
MRA Masked Recursive Attention Transformer 2022 PyTorch Efficient long-range modeling https://huggingface.co/docs/transformers/model_doc/mra https://github.com/microsoft/torchscale
MT5 Multilingual T5 Transformer T5 2021 mC4 dataset JAX, Flax Multilingual text-to-text tasks https://huggingface.co/docs/transformers/model_doc/mt5 https://github.com/google-research/multilingual-t5
MVP Masked Visual Pre-training Transformer ViT 2022 ImageNet-21K PyTorch Vision-language tasks https://huggingface.co/docs/transformers/model_doc/mvp https://github.com/RUCAIBox/MVP
myt5 Multilingual YouTube T5 Transformer T5 2022 YouTube subtitles TensorFlow Multilingual video understanding https://huggingface.co/google/mt5-small https://github.com/google-research/google-research/tree/master/myt5
Nemotron Nemotron Transformer 2023 PyTorch General language tasks
NEZHA Neural Contextualized Representation for Chinese Language Understanding Transformer BERT 2019 Chinese corpora PyTorch Chinese NLU tasks https://huggingface.co/docs/transformers/model_doc/nezha https://github.com/PaddlePaddle/PaddleNLP
NLLB No Language Left Behind Transformer 2022 Parallel data across 200 languages PyTorch Multilingual translation https://huggingface.co/facebook/nllb-200-3.3B https://github.com/facebookresearch/fairseq/tree/nllb
NLLB-MoE No Language Left Behind - Mixture of Experts Transformer with MoE NLLB 2024 PyTorch Efficient multilingual translation https://huggingface.co/facebook/nllb-moe-54b
Nystromformer Nystromformer Transformer with Nystrom method 2021 PyTorch Efficient long-sequence modeling https://huggingface.co/docs/transformers/model_doc/nystromformer https://github.com/mlpen/Nystromformer
OLMo Open Language Model Transformer 2024 PyTorch General language tasks https://huggingface.co/allenai/olmo-7b https://github.com/allenai/OLMo
OLMo2 Open Language Model 2 Transformer OLMo 2025 OLMo-Mix-1124, Dolmino-Mix-1124 PyTorch General language tasks
OLMoE Open Language Model of Experts Mixture of Experts OLMo 2024 PyTorch Efficient language modeling
Open-Llama Open-Llama Transformer LLaMA 2023 PyTorch General language tasks https://huggingface.co/openlm-research/open_llama_3b https://github.com/openlm-research/open_llama
OPT Open Pretrained Transformer Transformer GPT 2022 PyTorch General language tasks https://huggingface.co/facebook/opt-350m https://github.com/facebookresearch/metaseq
Pegasus Pre-training with Extracted Gap-sentences for Abstractive Summarization Transformer 2019 C4, HugeNews TensorFlow Text summarization https://huggingface.co/google/pegasus-xsum https://github.com/google-research/pegasus
PEGASUS-X PEGASUS-X Transformer PEGASUS 2022 TensorFlow, JAX Long-form text summarization https://huggingface.co/google/pegasus-x-base https://github.com/google-research/pegasus
Persimmon Persimmon Transformer 2023 PyTorch General language tasks https://huggingface.co/adept/persimmon-8b-base
Phi Phi Transformer 2023 Filtered web data, synthetic data PyTorch General language tasks https://huggingface.co/microsoft/phi-1_5
Phi-3 Phi-3 Transformer 2024 3.3 trillion tokens PyTorch General language tasks https://huggingface.co/docs/transformers/model_doc/phi3
PhiMoE Phi-3.5 Mixture of Experts Transformer with MoE Phi-3 2024 4.9T tokens PyTorch Multilingual and long-context tasks https://huggingface.co/microsoft/Phi-3.5-MoE-instruct
PhoBERT PhoBERT Transformer RoBERTa 2020 Vietnamese texts PyTorch Vietnamese NLP tasks https://huggingface.co/vinai/phobert-base https://github.com/VinAIResearch/PhoBERT
PLBart Programming Language BART Transformer BART 2021 GitHub code corpus PyTorch Code-related tasks https://huggingface.co/docs/transformers/model_doc/plbart https://github.com/wasiahmad/PLBART
ProphetNet ProphetNet Transformer 2020 PyTorch Sequence-to-sequence tasks https://huggingface.co/docs/transformers/model_doc/prophetnet https://github.com/microsoft/ProphetNet
QDQBert Quantized-Dequantized BERT Transformer BERT 2020 Same as BERT PyTorch Efficient BERT inference https://huggingface.co/docs/transformers/model_doc/qdqbert
Qwen2 Qwen2 Transformer 2024 PyTorch General language tasks https://huggingface.co/Qwen/Qwen2-7B-Chat https://github.com/QwenLM/Qwen2
Qwen2MoE Qwen2 Mixture of Experts Transformer with MoE Qwen2 2025 Up to 18 trillion tokens PyTorch General language tasks https://huggingface.co/docs/transformers/model_doc/qwen2_moe https://github.com/QwenLM/Qwen2.5
RAG Retrieval-Augmented Generation Transformer BART 2020 Wikipedia PyTorch Open-domain question answering https://huggingface.co/docs/transformers/model_doc/rag https://github.com/huggingface/transformers/tree/main/examples/research_projects/rag
REALM Retrieval-Augmented Language Model Pre-Training Transformer BERT 2020 Wikipedia, CC-News TensorFlow Open-domain question answering https://huggingface.co/docs/transformers/model_doc/realm https://github.com/google-research/language/tree/master/language/realm
RecurrentGemma RecurrentGemma Transformer with linear recurrences 2025 JAX, PyTorch General language tasks, long sequence processing https://huggingface.co/google/recurrentgemma-2b-it
Reformer Reformer Transformer with LSH attention 2020 PyTorch, JAX Long sequence tasks https://huggingface.co/docs/transformers/model_doc/reformer https://github.com/google/trax/tree/master/trax/models/reformer
RemBERT Rembert Transformer BERT 2021 TensorFlow Multilingual NLP tasks https://huggingface.co/docs/transformers/model_doc/rembert https://github.com/google-research/rembert
RetriBERT RetriBERT Transformer BERT 2020 MS MARCO PyTorch Information retrieval https://huggingface.co/docs/transformers/model_doc/retribert https://github.com/huggingface/transformers/tree/main/examples/research_projects/distillation
RoBERTa Robustly Optimized BERT Approach Transformer BERT 2019 CC-News, WebText, Stories PyTorch General language understanding https://huggingface.co/docs/transformers/model_doc/roberta https://github.com/pytorch/fairseq/tree/main/examples/roberta
RoBERTa-PreLayerNorm RoBERTa with Pre-Layer Normalization Transformer RoBERTa 2020 Same as RoBERTa PyTorch General language understanding https://huggingface.co/docs/transformers/model_doc/roberta-prelayernorm
RoCBert Robust Chinese BERT Transformer BERT 2021 Chinese corpora PyTorch Chinese NLP tasks https://huggingface.co/docs/transformers/model_doc/rocbert https://github.com/FudanNLP/ROCBERT
RoFormer RoFormer Transformer with rotary position embeddings BERT 2021 PyTorch Long text classification tasks https://huggingface.co/docs/transformers/model_doc/roformer https://github.com/ZhuiyiTechnology/roformer
RWKV RWKV Recurrent neural network with attention 2022 PyTorch General language tasks https://huggingface.co/docs/transformers/model_doc/rwkv https://github.com/BlinkDL/RWKV-LM
Splinter Splinter Transformer BERT 2021 PyTorch Question answering https://huggingface.co/docs/transformers/model_doc/splinter
SqueezeBERT SqueezeBERT Transformer BERT 2020 Same as BERT PyTorch Efficient NLP tasks https://huggingface.co/docs/transformers/model_doc/squeezebert https://github.com/huggingface/transformers/tree/main/examples/research_projects/squeezebert
StableLm Stable Language Model Transformer 2023 1 trillion tokens, multilingual PyTorch General language tasks https://huggingface.co/stabilityai/stablelm-base-alpha-3b https://github.com/Stability-AI/StableLM
Starcoder2 Starcoder2 Transformer 2024 PyTorch Code generation https://huggingface.co/bigcode/starcoder2-15b
SwitchTransformers Switch Transformers Transformer with sparse routing T5 2021 C4 dataset JAX, Flax Efficient language modeling https://huggingface.co/docs/transformers/model_doc/switch_transformers https://github.com/google-research/switch-transformer
T5 Text-to-Text Transfer Transformer Transformer 2019 C4 dataset TensorFlow, PyTorch Text-to-text tasks https://huggingface.co/docs/transformers/model_doc/t5 https://github.com/google-research/text-to-text-transfer-transformer
T5v1.1 T5 version 1.1 Transformer T5 2020 C4 dataset JAX, Flax Text-to-text tasks https://huggingface.co/docs/transformers/model_doc/t5v1.1 https://github.com/google-research/text-to-text-transfer-transformer
TAPEX Table Pre-training via Execution Transformer BART 2022 WikiTables, WikiSQL PyTorch Table-based question answering https://huggingface.co/docs/transformers/model_doc/tapex https://github.com/microsoft/Table-Pretraining
Transformer XL Transformer-XL Transformer with relative positional encoding 2019 PyTorch Long-range dependency modeling https://huggingface.co/docs/transformers/model_doc/transfo-xl https://github.com/kimiyoung/transformer-xl
UL2 Unified Language Learner Transformer 2024 JAX, Flax General language tasks https://huggingface.co/google/ul2
UMT5 Universal Multilingual T5 Transformer T5 2023 PyTorch Multilingual text-to-text tasks https://huggingface.co/google/umt5-small
X-MOD Cross-lingual Modular Transformer with language adapters XLM-R 2022 Filtered CommonCrawl, 81 languages PyTorch Multilingual NLP tasks https://huggingface.co/facebook/xmod-base
XGLM Cross-lingual Language Model Transformer 2022 PyTorch Multilingual language tasks https://huggingface.co/facebook/xglm-564M
XLM Cross-lingual Language Model Transformer 2019 Wikipedia PyTorch Multilingual NLP tasks https://huggingface.co/xlm-mlm-en-2048 https://github.com/facebookresearch/XLM
XLM-ProphetNet Cross-lingual ProphetNet Transformer ProphetNet 2021 PyTorch Multilingual sequence-to-sequence tasks https://huggingface.co/docs/transformers/model_doc/xlm-prophetnet
XLM-RoBERTa Cross-lingual RoBERTa Transformer RoBERTa 2019 2.5TB filtered CommonCrawl, 100 languages4 PyTorch Multilingual NLP tasks https://huggingface.co/xlm-roberta-base https://github.com/pytorch/fairseq/tree/main/examples/xlmr
XLM-RoBERTa-XL Cross-lingual RoBERTa Extra Large Transformer RoBERTa 2022 2.5TB filtered CommonCrawl, 100 languages1 PyTorch Multilingual NLP tasks https://huggingface.co/facebook/xlm-roberta-xl
XLM-V Cross-lingual Language Model V Transformer 2023 PyTorch Multilingual and vision-language tasks https://huggingface.co/facebook/xlm-v-base
XLNet XLNet: Generalized Autoregressive Pretraining for Language Understanding Transformer with permutation-based training 2019 PyTorch General language understanding tasks https://huggingface.co/xlnet-base-cased https://github.com/zihangdai/xlnet
YOSO You Only Sample (Almost) Once Transformer with Bernoulli sampling attention 2021 PyTorch Efficient self-attention for long sequences https://huggingface.co/docs/transformers/model_doc/yoso https://github.com/mlpen/YOSO
Zamba Zamba-7B-v1 Hybrid Mamba-Transformer 2024 1T tokens of text and code data, 50B high-quality tokens PyTorch, Hugging Face Transformers General language tasks, next-token prediction https://huggingface.co/Zyphra/Zamba-7B-v1 https://github.com/Zyphra/Zamba
Zamba2 Zamba2-2.7B Hybrid SSM-Transformer Zamba 2024 3T tokens of text and code data, 100B high-quality tokens PyTorch, Hugging Face Transformers General language tasks, on-device applications https://huggingface.co/Zyphra/Zamba2-2.7B https://github.com/Zyphra/Zamba2