faq:nlp-models

Text Type NLP Model들을 총망라하여 요약한 표 (HF-based)

Name	Full Name	Architecture	Base Model	Developed	Training Dataset	Lib. & Framework	Use Cases	HF URL	Githhub URL
ALBERT	A Lite BERT	Transformer-based sequence-to-sequence	BERT	2019	BookCorpus, English Wikipedia	TensorFlow, PyTorch	Natural Language Understanding	https://huggingface.co/albert-base-v2	https://github.com/google-research/albert
Bamba	Bamba	Transformer	GPT-2	2023	Bambara language corpus	PyTorch	Bambara language generation	https://huggingface.co/masakhane/bamba	https://github.com/masakhane-io/bamba
BART	Bidirectional and Auto-Regressive Transformers	Transformer	BERT+GPT	2019	BookCorpus, CC-News, OpenWebText, Stories	PyTorch	Text summarization, translation, question answering	https://huggingface.co/facebook/bart-large	https://github.com/facebookresearch/fairseq/tree/main/examples/bart
BARThez	BART for French	Transformer	BART	2020	French subset of OSCAR corpus	PyTorch	French text generation and understanding	https://huggingface.co/moussaKam/barthez	https://github.com/moussaKam/BARThez
BARTpho	BART for Vietnamese	Transformer	BART	2022	Vietnamese news articles, web content	PyTorch	Vietnamese text summarization, generation	https://huggingface.co/vinai/bartpho-syllable	https://github.com/VinAIResearch/BARTpho
BERT	Bidirectional Encoder Representations from Transformers	Transformer	Original Encoder-only	2018	BookCorpus, English Wikipedia	TensorFlow, PyTorch	Natural Language Understanding	https://huggingface.co/bert-base-uncased	https://github.com/google-research/bert
BertGeneration	BERT for text generation	Transformer	BERT	2019	Same as BERT	PyTorch	Text generation	https://huggingface.co/google/bert_for_seq_generation_L-24_bbc_encoder	https://github.com/google-research/bert/tree/master/generation
BertJapanese	BERT for Japanese	Transformer	BERT	2019	Japanese Wikipedia, web texts	TensorFlow, PyTorch	Japanese language understanding	https://huggingface.co/cl-tohoku/bert-base-japanese	https://github.com/cl-tohoku/bert-japanese
Bertweet	BERT for Twitter	Transformer	RoBERTa	2020	850M English tweets	PyTorch	Twitter text analysis, sentiment analysis	https://huggingface.co/vinai/bertweet-base	https://github.com/VinAIResearch/BERTweet
BigBird	Big Bidirectional Insertion Representations	Transformer	BERT	2020	RealNews dataset, English Wikipedia, C4	TensorFlow, PyTorch	Long document understanding	https://huggingface.co/google/bigbird-roberta-base	https://github.com/google-research/bigbird
BigBirdPegasus	BigBird-Pegasus	Transformer	BERT, Pegasus	2020	RealNews, English Wikipedia, C4	PyTorch	Long document summarization, question-answering	https://huggingface.co/google/bigbird-pegasus-large-bigpatent	https://github.com/google-research/bigbird
BioGPT	Biomedical Generative Pre-trained Transformer	Transformer	GPT-2	2022	15M PubMed abstracts	PyTorch	Biomedical text generation, relation extraction, question answering	https://huggingface.co/microsoft/biogpt	https://github.com/microsoft/BioGPT
BlenderBot	BlenderBot 3	Transformer-based	OPT-175B	2022	Not specified	PyTorch, Hugging Face Transformers	Open-domain chatbot, conversational AI	https://huggingface.co/facebook/blenderbot-3B	https://github.com/facebookresearch/ParlAI
BlenderBot Small	BlenderBot Small	Transformer-based encoder-decoder	BlenderBot	2020	Not specified	PyTorch, Hugging Face Transformers	Open-domain dialogue, conversational AI	https://huggingface.co/facebook/blenderbot_small-90M	https://github.com/facebookresearch/ParlAI
BLOOM	BLOOM	Transformer-based	Decoder-only	2022	1.5TB of pre-processed text (350B unique tokens) in 45 natural languages and 12 programming languages	PyTorch, Hugging Face Transformers	Language generation, information extraction, question answering, summarization	https://huggingface.co/bigscience/bloom	https://github.com/bigscience-workshop/bigscience
BORT	BERT Optimized for Resource-constrained Training	Transformer	BERT	2020	BookCorpus, English Wikipedia	PyTorch	Efficient natural language understanding	https://huggingface.co/alexa/bort	https://github.com/alexa/bort
ByT5	Byte-level T5	Transformer	T5	2021	C4 dataset	TensorFlow, JAX	Multilingual text-to-text generation	https://huggingface.co/google/byt5-small	https://github.com/google-research/byt5
CamemBERT	Camembert	Transformer	RoBERTa	2019	OSCAR French corpus	PyTorch	French language understanding	https://huggingface.co/camembert-base	https://github.com/pytorch/fairseq
CANINE	Character Architecture for Networks with Integrated Embeddings	Transformer	BERT	2021	Wikipedia in 100+ languages	TensorFlow	Multilingual text processing	https://huggingface.co/google/canine-s	https://github.com/google-research/language
CodeGen	CodeGen	Transformer	GPT	2022	GitHub code repositories	PyTorch	Code generation, completion	https://huggingface.co/Salesforce/codegen-350M-mono	https://github.com/salesforce/CodeGen
CodeLlama	Code Llama	Transformer	LLaMA	2023	500B tokens of code and code-related data	PyTorch	Code generation, completion, debugging	https://huggingface.co/codellama	https://github.com/facebookresearch/llama
Cohere	Cohere's Command Models	Transformer	Self-developed	2025	Proprietary dataset	PyTorch	Text generation, summarization	https://huggingface.co/docs/transformers/ko/model_doc/cohere	https://cohere.com
Cohere2	Cohere's second generation of Command models	Transformer	Self-developed	2025	Proprietary dataset	PyTorch	Conversational agents, RAG systems		https://cohere.com
ConvBERT	Convolutional BERT	Transformer + CNN	BERT	2020	BookCorpus, English Wikipedia	PyTorch	Natural language understanding	https://huggingface.co/YituTech/conv-bert-base	https://github.com/YituTech/ConvBERT
CPM	Chinese Pre-trained Models	Transformer	GPT	2020	Large-scale Chinese corpus	TensorFlow	Chinese text generation, understanding	https://huggingface.co/TsinghuaAI/CPM-Generate
CPMANT	CPM Ant	Transformer	CPM	2021	Large-scale Chinese corpus	TensorFlow	Chinese language tasks
CTRL	Conditional Transformer Language Model	Transformer	Transformer-XL	2019	140GB of structured text	TensorFlow	Controlled text generation	https://huggingface.co/salesforce/ctrl	https://github.com/salesforce/ctrl
DBRX	Deep Bidirectional Representations with XLM-Roberta Extensions (DBRX)	Transformer	XLM-Roberta		Multilingual datasets	PyTorch	Multilingual text understanding
DeBERTa	Decoding-enhanced BERT with Disentangled Attention	Transformer	BERT	2020	BookCorpus, English Wikipedia	PyTorch	Natural language understanding	https://huggingface.co/microsoft/deberta-base	https://github.com/microsoft/DeBERTa
DeBERTa-v2	Decoding-enhanced BERT with Disentangled Attention v2	Transformer	DeBERTa	2021	BookCorpus, English Wikipedia, CC-News, OpenWebText, Stories	PyTorch	Natural language understanding, classification tasks	https://huggingface.co/microsoft/deberta-v2-xlarge-mnli	https://github.com/microsoft/DeBERTa
DialoGPT	Dialogue Generative Pre-trained Transformer	Transformer	GPT-2	2020	147M conversation-like exchanges from Reddit	PyTorch	Open-domain dialogue generation	https://huggingface.co/microsoft/DialoGPT-small	https://github.com/microsoft/DialoGPT
DiffLlama	Diffusion LLaMA	Transformer	LLaMA	2023		PyTorch	Text generation with diffusion	https://huggingface.co/stabilityai/diffllama
DistilBERT	Distilled BERT	Transformer	BERT	2019	Same as BERT (BookCorpus, English Wikipedia)	PyTorch	Efficient natural language understanding	https://huggingface.co/distilbert-base-uncased	https://github.com/huggingface/transformers
DPR	Dense Passage Retriever	Dual Encoder	BERT	2020	Wikipedia	PyTorch	Open-domain question answering	https://huggingface.co/facebook/dpr-question_encoder-single-nq-base	https://github.com/facebookresearch/DPR
ELECTRA	Efficiently Learning an Encoder that Classifies Token Replacements Accurately	Transformer	BERT	2020	Same as BERT	TensorFlow, PyTorch	Natural language understanding	https://huggingface.co/google/electra-small-discriminator	https://github.com/google-research/electra
ERNIE	Enhanced Representation through kNowledge IntEgration	Transformer	BERT	2019	Chinese Wikipedia, Baidu Baike, Baidu news	PaddlePaddle, PyTorch	Chinese language understanding	https://huggingface.co/nghuyong/ernie-1.0-base-zh	https://github.com/PaddlePaddle/ERNIE
ErnieM	ERNIE-M	Transformer	ERNIE	2021	Multilingual corpus	PaddlePaddle	Multilingual language understanding	https://huggingface.co/susnato/ernie-m-base_pytorch	https://github.com/PaddlePaddle/ERNIE
ESM	Evolutionary Scale Modeling	Transformer	encoder, similar to BERT	2021	Protein sequences	PyTorch	Protein language modeling	https://huggingface.co/facebook/esm2_t33_650M_UR50D	https://github.com/facebookresearch/esm
Falcon	Falcon	Transformer	Decoder-only, similar to GPT	2023	RefinedWeb dataset	PyTorch	General language tasks	https://huggingface.co/tiiuae/falcon-7b	https://github.com/falconry/falcon
Falcon3	Falcon 3	Transformer	Falcon model family	2024	RefinedWeb dataset, high-quality data	PyTorch	General language tasks, reasoning, math	https://huggingface.co/tiiuae/falcon3-7b-base
FalconMamba	Falcon Mamba	Mamba (SSM)	Falcon, Mamba, a State Space Model (SSM)	2024	5.8 trillion tokens	PyTorch	Long-context language tasks	https://huggingface.co/tiiuae/Falcon3-Mamba-7B-Base
FLAN-T5	Fine-tuned Language Net T5	Transformer	T5	2022	Over 1,000 tasks in multiple languages	JAX, PyTorch	Multi-task language understanding	https://huggingface.co/google/flan-t5-small	https://github.com/google-research/t5x
FLAN-UL2	Fine-tuned Language Net UL2	Transformer	UL2	2022	Multiple datasets, similar to FLAN-T5	JAX, PyTorch	Multi-task language understanding	https://huggingface.co/google/flan-ul2
FlauBERT	French Language Understanding BERT	Transformer	BERT	2019	Large heterogeneous French corpus	PyTorch	French NLP tasks	https://huggingface.co/flaubert/flaubert_base_cased
FNet	Fourier Neural Network	Fourier Transform	Fourier Transforms	2021	Similar to BERT	JAX, PyTorch	Efficient language understanding	https://huggingface.co/google/fnet-base	https://github.com/google-research/google-research/tree/master/f_net
FSMT	Facebook Speech-to-Text Translation	Transformer	Transformer-based sequence-to-sequence	2020	Various parallel corpora	PyTorch	Machine translation	https://huggingface.co/facebook/wmt19-en-de	https://github.com/pytorch/fairseq
Funnel Transformer	Funnel Transformer	Transformer	BERT	2020	Similar to BERT	TensorFlow, PyTorch	Efficient language understanding	https://huggingface.co/funnel-transformer/small	https://github.com/laiguokun/Funnel-Transformer
Fuyu	Fuyu-8B	Transformer	multimodal transformer	2023		JAX	Multimodal tasks	https://huggingface.co/adept/fuyu-8b
Gemma	Gemma	Transformer	Transformer-based Language	2024		JAX	General language tasks	https://huggingface.co/google/gemma-7b
Gemma2	Gemma 2	Transformer	improved version of Gemma	2024	13 trillion tokens (27B), 8 trillion tokens (9B)	JAX	General language tasks, reasoning	https://huggingface.co/google/gemma-7b
GLM	General Language Model	Transformer				PyTorch	General language tasks	https://huggingface.co/THUDM/glm-large-chinese	https://github.com/THUDM/GLM
GPT	Generative Pre-trained Transformer	Transformer		2018	WebText	TensorFlow, PyTorch	Text generation, language understanding	https://huggingface.co/openai-gpt	https://github.com/openai/gpt-2
GPT Neo	GPT Neo	Transformer	GPT	2021	The Pile	TensorFlow, JAX	Text generation, language tasks	https://huggingface.co/EleutherAI/gpt-neo-1.3B	https://github.com/EleutherAI/gpt-neo
GPT NeoX	GPT NeoX	Transformer	GPT Neo	2022	The Pile	PyTorch	Large-scale language modeling	https://huggingface.co/EleutherAI/gpt-neox-20b	https://github.com/EleutherAI/gpt-neox
GPT NeoX Japanese	GPT NeoX Japanese	Transformer	GPT NeoX		Japanese web texts	PyTorch	Japanese language tasks	https://huggingface.co/rinna/japanese-gpt-neox-3.6b
GPT-J	GPT-J	Transformer	GPT	2021	The Pile (825 GB)	JAX	Text generation, language tasks	https://huggingface.co/EleutherAI/gpt-j-6B	https://github.com/kingoflolz/mesh-transformer-jax
GPT2	Generative Pre-trained Transformer 2	Transformer	GPT	2019	WebText	TensorFlow, PyTorch	Text generation, language understanding	https://huggingface.co/gpt2	https://github.com/openai/gpt-2
GPTBigCode	GPT BigCode	Transformer	GPT		Code repositories	PyTorch	Code generation, completion	https://huggingface.co/bigcode	https://github.com/bigcode-project/bigcode-encoder
GPTSAN Japanese	GPT Self-Attention Network Japanese	Transformer			Japanese web texts	PyTorch	Japanese language tasks	https://huggingface.co/tanreinama/GPTSAN-2.7B-instrct-sft
GPTSw3	GPT Swedish 3	Transformer	GPT	2023	320B tokens in Swedish, Norwegian, Danish, Icelandic, English, and programming code	PyTorch	Nordic language generation, coding	https://huggingface.co/AI-Sweden-Models/gpt-sw3-6.7b
Granite	Granite 3.0	Transformer		2024	12 trillion tokens	PyTorch	NLP, coding, reasoning, tool usage	https://huggingface.co/ibm-granite	https://github.com/ibm-granite/granite-3.0-language-models
GraniteMoe	Granite 3.0 Mixture-of-Experts	Transformer (MoE)		2024	10 trillion tokens	PyTorch	Efficient NLP tasks	https://huggingface.co/ibm-granite	https://github.com/ibm-granite/granite-3.0-language-models
GraniteMoeShared	GraniteMoeShared	Mixture of Experts (MoE)	Granite	2025		PyTorch, Hugging Face Transformers	General language tasks	https://huggingface.co/ibm-granite/granite-moe-shared
GraniteVision	Granite Vision 3.1 2B	Hybrid (Vision Transformer + Language Model)	LLaVA-NeXT, Granite	2025	Publicly available datasets, internally created synthetic data	PyTorch, Hugging Face Transformers	Visual document understanding, content extraction from tables, charts, diagrams, sketches, and infographics	https://huggingface.co/ibm-granite/granite-vision-3.1-2b-preview
Helium	Helium-1	Transformer		2025	2.5T tokens	PyTorch	Multilingual language tasks, edge computing	https://huggingface.co/kyutai/helium-1-preview-2b
HerBERT	HerBERT	Transformer	BERT	2020	6 Polish corpora, 8B tokens	PyTorch	Polish NLP tasks	https://huggingface.co/allegro/herbert-base-cased
I-BERT
Jamba	Jamba	Hybrid Transformer-Mamba MoE		2024		PyTorch	Long-context language tasks, reasoning	https://huggingface.co/ai21labs/Jamba-v0.1
JetMoe	JetMoe	Mixture-of-Experts (MoE)		2024	1.25T tokens from public datasets	PyTorch	Text generation, code completion, conversational dialogue	https://huggingface.co/docs/transformers/model_doc/jetmoe
Jukebox	Jukebox	VQ-VAE		2020	1.2 million songs	PyTorch	Music generation with lyrics	https://huggingface.co/docs/transformers/model_doc/jukebox	https://github.com/openai/jukebox
LED	Longformer Encoder-Decoder	Transformer	BART	2020	Various long-form datasets	PyTorch, Transformers	Long document summarization, question answering	https://huggingface.co/docs/transformers/model_doc/led	https://github.com/allenai/longformer
LLaMA	Large Language Model Meta AI	Transformer		2023	2 trillion tokens	PyTorch	General language tasks, text generation	https://huggingface.co/docs/transformers/model_doc/llama	https://github.com/facebookresearch/llama
Llama2	Large Language Model Meta AI 2	Transformer	LLaMA	2023	2 trillion tokens	PyTorch	General language tasks, text generation	https://huggingface.co/meta-llama	https://github.com/facebookresearch/llama
Llama3	Large Language Model Meta AI 3	Transformer	Llama2	2024	15 trillion tokens	PyTorch	Advanced language tasks, reasoning, code generation	https://huggingface.co/meta-llama	Not publicly available
Longformer	Longformer	Transformer	RoBERTa	2020	Various long-form datasets	PyTorch, Transformers	Long document processing, classification	https://huggingface.co/docs/transformers/model_doc/longformer	https://github.com/allenai/longformer
LongT5	Long Text-To-Text Transfer Transformer	Transformer	T5	2021	C4, Wikipedia, PubMed	JAX, Flax	Long-form generation, summarization	https://huggingface.co/docs/transformers/model_doc/longt5	https://github.com/google-research/longt5
LUKE	Language Understanding with Knowledge-based Embeddings	Transformer	RoBERTa	2020	Wikipedia	PyTorch	Entity-aware NLP tasks, named entity recognition	https://huggingface.co/docs/transformers/model_doc/luke	https://github.com/studio-ousia/luke
M2M100	Many-to-Many Multilingual Translation	Transformer		2020	Web-crawled data from 100 languages	PyTorch, Fairseq	Multilingual machine translation	https://huggingface.co/docs/transformers/model_doc/m2m_100	https://github.com/pytorch/fairseq/tree/main/examples/m2m_100
MADLAD-400	Massive Diverse Language Model	Transformer	PaLM	2023	2.2T tokens from 419 languages	JAX, T5X	Multilingual NLP tasks, translation	https://huggingface.co/google/madlad-400-3b-mt	https://github.com/google-research/t5x
Mamba	Mamba	Structured State Space Model (SSM)		2023		PyTorch	Sequence modeling, language tasks	https://huggingface.co/state-spaces/mamba-2.8b	https://github.com/state-spaces/mamba
mamba2	Mamba-2	Structured State Space Model (SSM)	Mamba	2024		PyTorch	Sequence modeling, language tasks
MarianMT	Marian Machine Translation	Transformer		2018	Various parallel corpora	PyTorch, Transformers	Neural machine translation	https://huggingface.co/docs/transformers/model_doc/marian	https://github.com/marian-nmt/marian
MarkupLM	Markup Language Model	Transformer	RoBERTa	2021	HTML/XML documents	PyTorch, Transformers	Web page understanding, HTML/XML processing	https://huggingface.co/docs/transformers/model_doc/markuplm	https://github.com/microsoft/unilm/tree/master/markuplm
MBart and MBart-50	Multilingual BART	Transformer	BART	2020	CC25 corpus, 50 languages	PyTorch, Transformers	Multilingual translation, text generation	https://huggingface.co/docs/transformers/model_doc/mbart	https://github.com/pytorch/fairseq/tree/main/examples/mbart
MEGA	Moving Average Equipped Gated Attention	Transformer		2022		PyTorch	Long-range dependency modeling	https://huggingface.co/docs/transformers/model_doc/mega	https://github.com/facebookresearch/mega
MegatronBERT	Megatron BERT	Transformer	BERT	2020		PyTorch	Large-scale language understanding	https://huggingface.co/docs/transformers/model_doc/megatron-bert	https://github.com/NVIDIA/Megatron-LM
MegatronGPT2	Megatron GPT-2	Transformer	GPT-2	2020		PyTorch	Large-scale language generation	https://huggingface.co/docs/transformers/model_doc/megatron_gpt2	https://github.com/NVIDIA/Megatron-LM
Mistral	Mistral	Transformer		2023		PyTorch	General language tasks, text generation	https://huggingface.co/mistralai/Mistral-7B-v0.1	https://github.com/mistralai/mistral-src
Mixtral	Mixtral	Mixture of Experts (MoE)	Mistral	2023		PyTorch	General language tasks, text generation	https://huggingface.co/mistralai/Mixtral-8x7B-v0.1	Not publicly available
mLUKE	Multilingual Language Understanding with Knowledge-based Embeddings	Transformer	XLM-RoBERTa	2022	Wikipedia in 24 languages	PyTorch, Transformers	Cross-lingual NLP tasks, entity-based reasoning	https://huggingface.co/docs/transformers/model_doc/mluke	https://github.com/studio-ousia/luke
MobileBERT	MobileBERT	Transformer	BERT	2020	Same as BERT	TensorFlow, PyTorch	On-device NLP tasks	https://huggingface.co/docs/transformers/model_doc/mobilebert	https://github.com/google-research/google-research/tree/master/mobilebert
ModernBERT	Modern BERT	Transformer	BERT	2023		PyTorch	General language understanding
MPNet	Masked and Permuted Pre-training for Language Understanding	Transformer		2020	BookCorpus, English Wikipedia	PyTorch, Transformers	General language understanding	https://huggingface.co/docs/transformers/model_doc/mpnet	https://github.com/microsoft/MPNet
MPT	MPT (MosaicML Pretrained Transformer)	Transformer	GPT	2023		PyTorch	General language tasks	https://huggingface.co/mosaicml/mpt-7b	https://github.com/mosaicml/llm-foundry
MRA	Masked Recursive Attention	Transformer		2022		PyTorch	Efficient long-range modeling	https://huggingface.co/docs/transformers/model_doc/mra	https://github.com/microsoft/torchscale
MT5	Multilingual T5	Transformer	T5	2021	mC4 dataset	JAX, Flax	Multilingual text-to-text tasks	https://huggingface.co/docs/transformers/model_doc/mt5	https://github.com/google-research/multilingual-t5
MVP	Masked Visual Pre-training	Transformer	ViT	2022	ImageNet-21K	PyTorch	Vision-language tasks	https://huggingface.co/docs/transformers/model_doc/mvp	https://github.com/RUCAIBox/MVP
myt5	Multilingual YouTube T5	Transformer	T5	2022	YouTube subtitles	TensorFlow	Multilingual video understanding	https://huggingface.co/google/mt5-small	https://github.com/google-research/google-research/tree/master/myt5
Nemotron	Nemotron	Transformer		2023		PyTorch	General language tasks
NEZHA	Neural Contextualized Representation for Chinese Language Understanding	Transformer	BERT	2019	Chinese corpora	PyTorch	Chinese NLU tasks	https://huggingface.co/docs/transformers/model_doc/nezha	https://github.com/PaddlePaddle/PaddleNLP
NLLB	No Language Left Behind	Transformer		2022	Parallel data across 200 languages	PyTorch	Multilingual translation	https://huggingface.co/facebook/nllb-200-3.3B	https://github.com/facebookresearch/fairseq/tree/nllb
NLLB-MoE	No Language Left Behind - Mixture of Experts	Transformer with MoE	NLLB	2024		PyTorch	Efficient multilingual translation	https://huggingface.co/facebook/nllb-moe-54b
Nystromformer	Nystromformer	Transformer with Nystrom method		2021		PyTorch	Efficient long-sequence modeling	https://huggingface.co/docs/transformers/model_doc/nystromformer	https://github.com/mlpen/Nystromformer
OLMo	Open Language Model	Transformer		2024		PyTorch	General language tasks	https://huggingface.co/allenai/olmo-7b	https://github.com/allenai/OLMo
OLMo2	Open Language Model 2	Transformer	OLMo	2025	OLMo-Mix-1124, Dolmino-Mix-1124	PyTorch	General language tasks
OLMoE	Open Language Model of Experts	Mixture of Experts	OLMo	2024		PyTorch	Efficient language modeling
Open-Llama	Open-Llama	Transformer	LLaMA	2023		PyTorch	General language tasks	https://huggingface.co/openlm-research/open_llama_3b	https://github.com/openlm-research/open_llama
OPT	Open Pretrained Transformer	Transformer	GPT	2022		PyTorch	General language tasks	https://huggingface.co/facebook/opt-350m	https://github.com/facebookresearch/metaseq
Pegasus	Pre-training with Extracted Gap-sentences for Abstractive Summarization	Transformer		2019	C4, HugeNews	TensorFlow	Text summarization	https://huggingface.co/google/pegasus-xsum	https://github.com/google-research/pegasus
PEGASUS-X	PEGASUS-X	Transformer	PEGASUS	2022		TensorFlow, JAX	Long-form text summarization	https://huggingface.co/google/pegasus-x-base	https://github.com/google-research/pegasus
Persimmon	Persimmon	Transformer		2023		PyTorch	General language tasks	https://huggingface.co/adept/persimmon-8b-base
Phi	Phi	Transformer		2023	Filtered web data, synthetic data	PyTorch	General language tasks	https://huggingface.co/microsoft/phi-1_5
Phi-3	Phi-3	Transformer		2024	3.3 trillion tokens	PyTorch	General language tasks	https://huggingface.co/docs/transformers/model_doc/phi3
PhiMoE	Phi-3.5 Mixture of Experts	Transformer with MoE	Phi-3	2024	4.9T tokens	PyTorch	Multilingual and long-context tasks	https://huggingface.co/microsoft/Phi-3.5-MoE-instruct
PhoBERT	PhoBERT	Transformer	RoBERTa	2020	Vietnamese texts	PyTorch	Vietnamese NLP tasks	https://huggingface.co/vinai/phobert-base	https://github.com/VinAIResearch/PhoBERT
PLBart	Programming Language BART	Transformer	BART	2021	GitHub code corpus	PyTorch	Code-related tasks	https://huggingface.co/docs/transformers/model_doc/plbart	https://github.com/wasiahmad/PLBART
ProphetNet	ProphetNet	Transformer		2020		PyTorch	Sequence-to-sequence tasks	https://huggingface.co/docs/transformers/model_doc/prophetnet	https://github.com/microsoft/ProphetNet
QDQBert	Quantized-Dequantized BERT	Transformer	BERT	2020	Same as BERT	PyTorch	Efficient BERT inference	https://huggingface.co/docs/transformers/model_doc/qdqbert
Qwen2	Qwen2	Transformer		2024		PyTorch	General language tasks	https://huggingface.co/Qwen/Qwen2-7B-Chat	https://github.com/QwenLM/Qwen2
Qwen2MoE	Qwen2 Mixture of Experts	Transformer with MoE	Qwen2	2025	Up to 18 trillion tokens	PyTorch	General language tasks	https://huggingface.co/docs/transformers/model_doc/qwen2_moe	https://github.com/QwenLM/Qwen2.5
RAG	Retrieval-Augmented Generation	Transformer	BART	2020	Wikipedia	PyTorch	Open-domain question answering	https://huggingface.co/docs/transformers/model_doc/rag	https://github.com/huggingface/transformers/tree/main/examples/research_projects/rag
REALM	Retrieval-Augmented Language Model Pre-Training	Transformer	BERT	2020	Wikipedia, CC-News	TensorFlow	Open-domain question answering	https://huggingface.co/docs/transformers/model_doc/realm	https://github.com/google-research/language/tree/master/language/realm
RecurrentGemma	RecurrentGemma	Transformer with linear recurrences		2025		JAX, PyTorch	General language tasks, long sequence processing	https://huggingface.co/google/recurrentgemma-2b-it
Reformer	Reformer	Transformer with LSH attention		2020		PyTorch, JAX	Long sequence tasks	https://huggingface.co/docs/transformers/model_doc/reformer	https://github.com/google/trax/tree/master/trax/models/reformer
RemBERT	Rembert	Transformer	BERT	2021		TensorFlow	Multilingual NLP tasks	https://huggingface.co/docs/transformers/model_doc/rembert	https://github.com/google-research/rembert
RetriBERT	RetriBERT	Transformer	BERT	2020	MS MARCO	PyTorch	Information retrieval	https://huggingface.co/docs/transformers/model_doc/retribert	https://github.com/huggingface/transformers/tree/main/examples/research_projects/distillation
RoBERTa	Robustly Optimized BERT Approach	Transformer	BERT	2019	CC-News, WebText, Stories	PyTorch	General language understanding	https://huggingface.co/docs/transformers/model_doc/roberta	https://github.com/pytorch/fairseq/tree/main/examples/roberta
RoBERTa-PreLayerNorm	RoBERTa with Pre-Layer Normalization	Transformer	RoBERTa	2020	Same as RoBERTa	PyTorch	General language understanding	https://huggingface.co/docs/transformers/model_doc/roberta-prelayernorm
RoCBert	Robust Chinese BERT	Transformer	BERT	2021	Chinese corpora	PyTorch	Chinese NLP tasks	https://huggingface.co/docs/transformers/model_doc/rocbert	https://github.com/FudanNLP/ROCBERT
RoFormer	RoFormer	Transformer with rotary position embeddings	BERT	2021		PyTorch	Long text classification tasks	https://huggingface.co/docs/transformers/model_doc/roformer	https://github.com/ZhuiyiTechnology/roformer
RWKV	RWKV	Recurrent neural network with attention		2022		PyTorch	General language tasks	https://huggingface.co/docs/transformers/model_doc/rwkv	https://github.com/BlinkDL/RWKV-LM
Splinter	Splinter	Transformer	BERT	2021		PyTorch	Question answering	https://huggingface.co/docs/transformers/model_doc/splinter
SqueezeBERT	SqueezeBERT	Transformer	BERT	2020	Same as BERT	PyTorch	Efficient NLP tasks	https://huggingface.co/docs/transformers/model_doc/squeezebert	https://github.com/huggingface/transformers/tree/main/examples/research_projects/squeezebert
StableLm	Stable Language Model	Transformer		2023	1 trillion tokens, multilingual	PyTorch	General language tasks	https://huggingface.co/stabilityai/stablelm-base-alpha-3b	https://github.com/Stability-AI/StableLM
Starcoder2	Starcoder2	Transformer		2024		PyTorch	Code generation	https://huggingface.co/bigcode/starcoder2-15b
SwitchTransformers	Switch Transformers	Transformer with sparse routing	T5	2021	C4 dataset	JAX, Flax	Efficient language modeling	https://huggingface.co/docs/transformers/model_doc/switch_transformers	https://github.com/google-research/switch-transformer
T5	Text-to-Text Transfer Transformer	Transformer		2019	C4 dataset	TensorFlow, PyTorch	Text-to-text tasks	https://huggingface.co/docs/transformers/model_doc/t5	https://github.com/google-research/text-to-text-transfer-transformer
T5v1.1	T5 version 1.1	Transformer	T5	2020	C4 dataset	JAX, Flax	Text-to-text tasks	https://huggingface.co/docs/transformers/model_doc/t5v1.1	https://github.com/google-research/text-to-text-transfer-transformer
TAPEX	Table Pre-training via Execution	Transformer	BART	2022	WikiTables, WikiSQL	PyTorch	Table-based question answering	https://huggingface.co/docs/transformers/model_doc/tapex	https://github.com/microsoft/Table-Pretraining
Transformer XL	Transformer-XL	Transformer with relative positional encoding		2019		PyTorch	Long-range dependency modeling	https://huggingface.co/docs/transformers/model_doc/transfo-xl	https://github.com/kimiyoung/transformer-xl
UL2	Unified Language Learner	Transformer		2024		JAX, Flax	General language tasks	https://huggingface.co/google/ul2
UMT5	Universal Multilingual T5	Transformer	T5	2023		PyTorch	Multilingual text-to-text tasks	https://huggingface.co/google/umt5-small
X-MOD	Cross-lingual Modular	Transformer with language adapters	XLM-R	2022	Filtered CommonCrawl, 81 languages	PyTorch	Multilingual NLP tasks	https://huggingface.co/facebook/xmod-base
XGLM	Cross-lingual Language Model	Transformer		2022		PyTorch	Multilingual language tasks	https://huggingface.co/facebook/xglm-564M
XLM	Cross-lingual Language Model	Transformer		2019	Wikipedia	PyTorch	Multilingual NLP tasks	https://huggingface.co/xlm-mlm-en-2048	https://github.com/facebookresearch/XLM
XLM-ProphetNet	Cross-lingual ProphetNet	Transformer	ProphetNet	2021		PyTorch	Multilingual sequence-to-sequence tasks	https://huggingface.co/docs/transformers/model_doc/xlm-prophetnet
XLM-RoBERTa	Cross-lingual RoBERTa	Transformer	RoBERTa	2019	2.5TB filtered CommonCrawl, 100 languages4	PyTorch	Multilingual NLP tasks	https://huggingface.co/xlm-roberta-base	https://github.com/pytorch/fairseq/tree/main/examples/xlmr
XLM-RoBERTa-XL	Cross-lingual RoBERTa Extra Large	Transformer	RoBERTa	2022	2.5TB filtered CommonCrawl, 100 languages1	PyTorch	Multilingual NLP tasks	https://huggingface.co/facebook/xlm-roberta-xl
XLM-V	Cross-lingual Language Model V	Transformer		2023		PyTorch	Multilingual and vision-language tasks	https://huggingface.co/facebook/xlm-v-base
XLNet	XLNet: Generalized Autoregressive Pretraining for Language Understanding	Transformer with permutation-based training		2019		PyTorch	General language understanding tasks	https://huggingface.co/xlnet-base-cased	https://github.com/zihangdai/xlnet
YOSO	You Only Sample (Almost) Once	Transformer with Bernoulli sampling attention		2021		PyTorch	Efficient self-attention for long sequences	https://huggingface.co/docs/transformers/model_doc/yoso	https://github.com/mlpen/YOSO
Zamba	Zamba-7B-v1	Hybrid Mamba-Transformer		2024	1T tokens of text and code data, 50B high-quality tokens	PyTorch, Hugging Face Transformers	General language tasks, next-token prediction	https://huggingface.co/Zyphra/Zamba-7B-v1	https://github.com/Zyphra/Zamba
Zamba2	Zamba2-2.7B	Hybrid SSM-Transformer	Zamba	2024	3T tokens of text and code data, 100B high-quality tokens	PyTorch, Hugging Face Transformers	General language tasks, on-device applications	https://huggingface.co/Zyphra/Zamba2-2.7B	https://github.com/Zyphra/Zamba2