Name | Full Name | Architecture | Base Model | Developed | Training Dataset | Lib. & Framework | Use Cases | HF URL | Githhub URL |
ALBERT | A Lite BERT | Transformer-based sequence-to-sequence | BERT | 2019 | BookCorpus, English Wikipedia | TensorFlow, PyTorch | Natural Language Understanding | https://huggingface.co/albert-base-v2 | https://github.com/google-research/albert |
Bamba | Bamba | Transformer | GPT-2 | 2023 | Bambara language corpus | PyTorch | Bambara language generation | https://huggingface.co/masakhane/bamba | https://github.com/masakhane-io/bamba |
BART | Bidirectional and Auto-Regressive Transformers | Transformer | BERT+GPT | 2019 | BookCorpus, CC-News, OpenWebText, Stories | PyTorch | Text summarization, translation, question answering | https://huggingface.co/facebook/bart-large | https://github.com/facebookresearch/fairseq/tree/main/examples/bart |
BARThez | BART for French | Transformer | BART | 2020 | French subset of OSCAR corpus | PyTorch | French text generation and understanding | https://huggingface.co/moussaKam/barthez | https://github.com/moussaKam/BARThez |
BARTpho | BART for Vietnamese | Transformer | BART | 2022 | Vietnamese news articles, web content | PyTorch | Vietnamese text summarization, generation | https://huggingface.co/vinai/bartpho-syllable | https://github.com/VinAIResearch/BARTpho |
BERT | Bidirectional Encoder Representations from Transformers | Transformer | Original Encoder-only | 2018 | BookCorpus, English Wikipedia | TensorFlow, PyTorch | Natural Language Understanding | https://huggingface.co/bert-base-uncased | https://github.com/google-research/bert |
BertGeneration | BERT for text generation | Transformer | BERT | 2019 | Same as BERT | PyTorch | Text generation | https://huggingface.co/google/bert_for_seq_generation_L-24_bbc_encoder | https://github.com/google-research/bert/tree/master/generation |
BertJapanese | BERT for Japanese | Transformer | BERT | 2019 | Japanese Wikipedia, web texts | TensorFlow, PyTorch | Japanese language understanding | https://huggingface.co/cl-tohoku/bert-base-japanese | https://github.com/cl-tohoku/bert-japanese |
Bertweet | BERT for Twitter | Transformer | RoBERTa | 2020 | 850M English tweets | PyTorch | Twitter text analysis, sentiment analysis | https://huggingface.co/vinai/bertweet-base | https://github.com/VinAIResearch/BERTweet |
BigBird | Big Bidirectional Insertion Representations | Transformer | BERT | 2020 | RealNews dataset, English Wikipedia, C4 | TensorFlow, PyTorch | Long document understanding | https://huggingface.co/google/bigbird-roberta-base | https://github.com/google-research/bigbird |
BigBirdPegasus | BigBird-Pegasus | Transformer | BERT, Pegasus | 2020 | RealNews, English Wikipedia, C4 | PyTorch | Long document summarization, question-answering | https://huggingface.co/google/bigbird-pegasus-large-bigpatent | https://github.com/google-research/bigbird |
BioGPT | Biomedical Generative Pre-trained Transformer | Transformer | GPT-2 | 2022 | 15M PubMed abstracts | PyTorch | Biomedical text generation, relation extraction, question answering | https://huggingface.co/microsoft/biogpt | https://github.com/microsoft/BioGPT |
BlenderBot | BlenderBot 3 | Transformer-based | OPT-175B | 2022 | Not specified | PyTorch, Hugging Face Transformers | Open-domain chatbot, conversational AI | https://huggingface.co/facebook/blenderbot-3B | https://github.com/facebookresearch/ParlAI |
BlenderBot Small | BlenderBot Small | Transformer-based encoder-decoder | BlenderBot | 2020 | Not specified | PyTorch, Hugging Face Transformers | Open-domain dialogue, conversational AI | https://huggingface.co/facebook/blenderbot_small-90M | https://github.com/facebookresearch/ParlAI |
BLOOM | BLOOM | Transformer-based | Decoder-only | 2022 | 1.5TB of pre-processed text (350B unique tokens) in 45 natural languages and 12 programming languages | PyTorch, Hugging Face Transformers | Language generation, information extraction, question answering, summarization | https://huggingface.co/bigscience/bloom | https://github.com/bigscience-workshop/bigscience |
BORT | BERT Optimized for Resource-constrained Training | Transformer | BERT | 2020 | BookCorpus, English Wikipedia | PyTorch | Efficient natural language understanding | https://huggingface.co/alexa/bort | https://github.com/alexa/bort |
ByT5 | Byte-level T5 | Transformer | T5 | 2021 | C4 dataset | TensorFlow, JAX | Multilingual text-to-text generation | https://huggingface.co/google/byt5-small | https://github.com/google-research/byt5 |
CamemBERT | Camembert | Transformer | RoBERTa | 2019 | OSCAR French corpus | PyTorch | French language understanding | https://huggingface.co/camembert-base | https://github.com/pytorch/fairseq |
CANINE | Character Architecture for Networks with Integrated Embeddings | Transformer | BERT | 2021 | Wikipedia in 100+ languages | TensorFlow | Multilingual text processing | https://huggingface.co/google/canine-s | https://github.com/google-research/language |
CodeGen | CodeGen | Transformer | GPT | 2022 | GitHub code repositories | PyTorch | Code generation, completion | https://huggingface.co/Salesforce/codegen-350M-mono | https://github.com/salesforce/CodeGen |
CodeLlama | Code Llama | Transformer | LLaMA | 2023 | 500B tokens of code and code-related data | PyTorch | Code generation, completion, debugging | https://huggingface.co/codellama | https://github.com/facebookresearch/llama |
Cohere | Cohere's Command Models | Transformer | Self-developed | 2025 | Proprietary dataset | PyTorch | Text generation, summarization | https://huggingface.co/docs/transformers/ko/model_doc/cohere | https://cohere.com |
Cohere2 | Cohere's second generation of Command models | Transformer | Self-developed | 2025 | Proprietary dataset | PyTorch | Conversational agents, RAG systems | | https://cohere.com |
ConvBERT | Convolutional BERT | Transformer + CNN | BERT | 2020 | BookCorpus, English Wikipedia | PyTorch | Natural language understanding | https://huggingface.co/YituTech/conv-bert-base | https://github.com/YituTech/ConvBERT |
CPM | Chinese Pre-trained Models | Transformer | GPT | 2020 | Large-scale Chinese corpus | TensorFlow | Chinese text generation, understanding | https://huggingface.co/TsinghuaAI/CPM-Generate | |
CPMANT | CPM Ant | Transformer | CPM | 2021 | Large-scale Chinese corpus | TensorFlow | Chinese language tasks | | |
CTRL | Conditional Transformer Language Model | Transformer | Transformer-XL | 2019 | 140GB of structured text | TensorFlow | Controlled text generation | https://huggingface.co/salesforce/ctrl | https://github.com/salesforce/ctrl |
DBRX | Deep Bidirectional Representations with XLM-Roberta Extensions (DBRX) | Transformer | XLM-Roberta | | Multilingual datasets | PyTorch | Multilingual text understanding | | |
DeBERTa | Decoding-enhanced BERT with Disentangled Attention | Transformer | BERT | 2020 | BookCorpus, English Wikipedia | PyTorch | Natural language understanding | https://huggingface.co/microsoft/deberta-base | https://github.com/microsoft/DeBERTa |
DeBERTa-v2 | Decoding-enhanced BERT with Disentangled Attention v2 | Transformer | DeBERTa | 2021 | BookCorpus, English Wikipedia, CC-News, OpenWebText, Stories | PyTorch | Natural language understanding, classification tasks | https://huggingface.co/microsoft/deberta-v2-xlarge-mnli | https://github.com/microsoft/DeBERTa |
DialoGPT | Dialogue Generative Pre-trained Transformer | Transformer | GPT-2 | 2020 | 147M conversation-like exchanges from Reddit | PyTorch | Open-domain dialogue generation | https://huggingface.co/microsoft/DialoGPT-small | https://github.com/microsoft/DialoGPT |
DiffLlama | Diffusion LLaMA | Transformer | LLaMA | 2023 | | PyTorch | Text generation with diffusion | https://huggingface.co/stabilityai/diffllama | |
DistilBERT | Distilled BERT | Transformer | BERT | 2019 | Same as BERT (BookCorpus, English Wikipedia) | PyTorch | Efficient natural language understanding | https://huggingface.co/distilbert-base-uncased | https://github.com/huggingface/transformers |
DPR | Dense Passage Retriever | Dual Encoder | BERT | 2020 | Wikipedia | PyTorch | Open-domain question answering | https://huggingface.co/facebook/dpr-question_encoder-single-nq-base | https://github.com/facebookresearch/DPR |
ELECTRA | Efficiently Learning an Encoder that Classifies Token Replacements Accurately | Transformer | BERT | 2020 | Same as BERT | TensorFlow, PyTorch | Natural language understanding | https://huggingface.co/google/electra-small-discriminator | https://github.com/google-research/electra |
ERNIE | Enhanced Representation through kNowledge IntEgration | Transformer | BERT | 2019 | Chinese Wikipedia, Baidu Baike, Baidu news | PaddlePaddle, PyTorch | Chinese language understanding | https://huggingface.co/nghuyong/ernie-1.0-base-zh | https://github.com/PaddlePaddle/ERNIE |
ErnieM | ERNIE-M | Transformer | ERNIE | 2021 | Multilingual corpus | PaddlePaddle | Multilingual language understanding | https://huggingface.co/susnato/ernie-m-base_pytorch | https://github.com/PaddlePaddle/ERNIE |
ESM | Evolutionary Scale Modeling | Transformer | encoder, similar to BERT | 2021 | Protein sequences | PyTorch | Protein language modeling | https://huggingface.co/facebook/esm2_t33_650M_UR50D | https://github.com/facebookresearch/esm |
Falcon | Falcon | Transformer | Decoder-only, similar to GPT | 2023 | RefinedWeb dataset | PyTorch | General language tasks | https://huggingface.co/tiiuae/falcon-7b | https://github.com/falconry/falcon |
Falcon3 | Falcon 3 | Transformer | Falcon model family | 2024 | RefinedWeb dataset, high-quality data | PyTorch | General language tasks, reasoning, math | https://huggingface.co/tiiuae/falcon3-7b-base | |
FalconMamba | Falcon Mamba | Mamba (SSM) | Falcon, Mamba, a State Space Model (SSM) | 2024 | 5.8 trillion tokens | PyTorch | Long-context language tasks | https://huggingface.co/tiiuae/Falcon3-Mamba-7B-Base | |
FLAN-T5 | Fine-tuned Language Net T5 | Transformer | T5 | 2022 | Over 1,000 tasks in multiple languages | JAX, PyTorch | Multi-task language understanding | https://huggingface.co/google/flan-t5-small | https://github.com/google-research/t5x |
FLAN-UL2 | Fine-tuned Language Net UL2 | Transformer | UL2 | 2022 | Multiple datasets, similar to FLAN-T5 | JAX, PyTorch | Multi-task language understanding | https://huggingface.co/google/flan-ul2 | |
FlauBERT | French Language Understanding BERT | Transformer | BERT | 2019 | Large heterogeneous French corpus | PyTorch | French NLP tasks | https://huggingface.co/flaubert/flaubert_base_cased | |
FNet | Fourier Neural Network | Fourier Transform | Fourier Transforms | 2021 | Similar to BERT | JAX, PyTorch | Efficient language understanding | https://huggingface.co/google/fnet-base | https://github.com/google-research/google-research/tree/master/f_net |
FSMT | Facebook Speech-to-Text Translation | Transformer | Transformer-based sequence-to-sequence | 2020 | Various parallel corpora | PyTorch | Machine translation | https://huggingface.co/facebook/wmt19-en-de | https://github.com/pytorch/fairseq |
Funnel Transformer | Funnel Transformer | Transformer | BERT | 2020 | Similar to BERT | TensorFlow, PyTorch | Efficient language understanding | https://huggingface.co/funnel-transformer/small | https://github.com/laiguokun/Funnel-Transformer |
Fuyu | Fuyu-8B | Transformer | multimodal transformer | 2023 | | JAX | Multimodal tasks | https://huggingface.co/adept/fuyu-8b | |
Gemma | Gemma | Transformer | Transformer-based Language | 2024 | | JAX | General language tasks | https://huggingface.co/google/gemma-7b | |
Gemma2 | Gemma 2 | Transformer | improved version of Gemma | 2024 | 13 trillion tokens (27B), 8 trillion tokens (9B) | JAX | General language tasks, reasoning | https://huggingface.co/google/gemma-7b | |
GLM | General Language Model | Transformer | | | | PyTorch | General language tasks | https://huggingface.co/THUDM/glm-large-chinese | https://github.com/THUDM/GLM |
GPT | Generative Pre-trained Transformer | Transformer | | 2018 | WebText | TensorFlow, PyTorch | Text generation, language understanding | https://huggingface.co/openai-gpt | https://github.com/openai/gpt-2 |
GPT Neo | GPT Neo | Transformer | GPT | 2021 | The Pile | TensorFlow, JAX | Text generation, language tasks | https://huggingface.co/EleutherAI/gpt-neo-1.3B | https://github.com/EleutherAI/gpt-neo |
GPT NeoX | GPT NeoX | Transformer | GPT Neo | 2022 | The Pile | PyTorch | Large-scale language modeling | https://huggingface.co/EleutherAI/gpt-neox-20b | https://github.com/EleutherAI/gpt-neox |
GPT NeoX Japanese | GPT NeoX Japanese | Transformer | GPT NeoX | | Japanese web texts | PyTorch | Japanese language tasks | https://huggingface.co/rinna/japanese-gpt-neox-3.6b | |
GPT-J | GPT-J | Transformer | GPT | 2021 | The Pile (825 GB) | JAX | Text generation, language tasks | https://huggingface.co/EleutherAI/gpt-j-6B | https://github.com/kingoflolz/mesh-transformer-jax |
GPT2 | Generative Pre-trained Transformer 2 | Transformer | GPT | 2019 | WebText | TensorFlow, PyTorch | Text generation, language understanding | https://huggingface.co/gpt2 | https://github.com/openai/gpt-2 |
GPTBigCode | GPT BigCode | Transformer | GPT | | Code repositories | PyTorch | Code generation, completion | https://huggingface.co/bigcode | https://github.com/bigcode-project/bigcode-encoder |
GPTSAN Japanese | GPT Self-Attention Network Japanese | Transformer | | | Japanese web texts | PyTorch | Japanese language tasks | https://huggingface.co/tanreinama/GPTSAN-2.7B-instrct-sft | |
GPTSw3 | GPT Swedish 3 | Transformer | GPT | 2023 | 320B tokens in Swedish, Norwegian, Danish, Icelandic, English, and programming code | PyTorch | Nordic language generation, coding | https://huggingface.co/AI-Sweden-Models/gpt-sw3-6.7b | |
Granite | Granite 3.0 | Transformer | | 2024 | 12 trillion tokens | PyTorch | NLP, coding, reasoning, tool usage | https://huggingface.co/ibm-granite | https://github.com/ibm-granite/granite-3.0-language-models |
GraniteMoe | Granite 3.0 Mixture-of-Experts | Transformer (MoE) | | 2024 | 10 trillion tokens | PyTorch | Efficient NLP tasks | https://huggingface.co/ibm-granite | https://github.com/ibm-granite/granite-3.0-language-models |
GraniteMoeShared | GraniteMoeShared | Mixture of Experts (MoE) | Granite | 2025 | | PyTorch, Hugging Face Transformers | General language tasks | https://huggingface.co/ibm-granite/granite-moe-shared | |
GraniteVision | Granite Vision 3.1 2B | Hybrid (Vision Transformer + Language Model) | LLaVA-NeXT, Granite | 2025 | Publicly available datasets, internally created synthetic data | PyTorch, Hugging Face Transformers | Visual document understanding, content extraction from tables, charts, diagrams, sketches, and infographics | https://huggingface.co/ibm-granite/granite-vision-3.1-2b-preview | |
Helium | Helium-1 | Transformer | | 2025 | 2.5T tokens | PyTorch | Multilingual language tasks, edge computing | https://huggingface.co/kyutai/helium-1-preview-2b | |
HerBERT | HerBERT | Transformer | BERT | 2020 | 6 Polish corpora, 8B tokens | PyTorch | Polish NLP tasks | https://huggingface.co/allegro/herbert-base-cased | |
I-BERT | | | | | | | | | |
Jamba | Jamba | Hybrid Transformer-Mamba MoE | | 2024 | | PyTorch | Long-context language tasks, reasoning | https://huggingface.co/ai21labs/Jamba-v0.1 | |
JetMoe | JetMoe | Mixture-of-Experts (MoE) | | 2024 | 1.25T tokens from public datasets | PyTorch | Text generation, code completion, conversational dialogue | https://huggingface.co/docs/transformers/model_doc/jetmoe | |
Jukebox | Jukebox | VQ-VAE | | 2020 | 1.2 million songs | PyTorch | Music generation with lyrics | https://huggingface.co/docs/transformers/model_doc/jukebox | https://github.com/openai/jukebox |
LED | Longformer Encoder-Decoder | Transformer | BART | 2020 | Various long-form datasets | PyTorch, Transformers | Long document summarization, question answering | https://huggingface.co/docs/transformers/model_doc/led | https://github.com/allenai/longformer |
LLaMA | Large Language Model Meta AI | Transformer | | 2023 | 2 trillion tokens | PyTorch | General language tasks, text generation | https://huggingface.co/docs/transformers/model_doc/llama | https://github.com/facebookresearch/llama |
Llama2 | Large Language Model Meta AI 2 | Transformer | LLaMA | 2023 | 2 trillion tokens | PyTorch | General language tasks, text generation | https://huggingface.co/meta-llama | https://github.com/facebookresearch/llama |
Llama3 | Large Language Model Meta AI 3 | Transformer | Llama2 | 2024 | 15 trillion tokens | PyTorch | Advanced language tasks, reasoning, code generation | https://huggingface.co/meta-llama | Not publicly available |
Longformer | Longformer | Transformer | RoBERTa | 2020 | Various long-form datasets | PyTorch, Transformers | Long document processing, classification | https://huggingface.co/docs/transformers/model_doc/longformer | https://github.com/allenai/longformer |
LongT5 | Long Text-To-Text Transfer Transformer | Transformer | T5 | 2021 | C4, Wikipedia, PubMed | JAX, Flax | Long-form generation, summarization | https://huggingface.co/docs/transformers/model_doc/longt5 | https://github.com/google-research/longt5 |
LUKE | Language Understanding with Knowledge-based Embeddings | Transformer | RoBERTa | 2020 | Wikipedia | PyTorch | Entity-aware NLP tasks, named entity recognition | https://huggingface.co/docs/transformers/model_doc/luke | https://github.com/studio-ousia/luke |
M2M100 | Many-to-Many Multilingual Translation | Transformer | | 2020 | Web-crawled data from 100 languages | PyTorch, Fairseq | Multilingual machine translation | https://huggingface.co/docs/transformers/model_doc/m2m_100 | https://github.com/pytorch/fairseq/tree/main/examples/m2m_100 |
MADLAD-400 | Massive Diverse Language Model | Transformer | PaLM | 2023 | 2.2T tokens from 419 languages | JAX, T5X | Multilingual NLP tasks, translation | https://huggingface.co/google/madlad-400-3b-mt | https://github.com/google-research/t5x |
Mamba | Mamba | Structured State Space Model (SSM) | | 2023 | | PyTorch | Sequence modeling, language tasks | https://huggingface.co/state-spaces/mamba-2.8b | https://github.com/state-spaces/mamba |
mamba2 | Mamba-2 | Structured State Space Model (SSM) | Mamba | 2024 | | PyTorch | Sequence modeling, language tasks | | |
MarianMT | Marian Machine Translation | Transformer | | 2018 | Various parallel corpora | PyTorch, Transformers | Neural machine translation | https://huggingface.co/docs/transformers/model_doc/marian | https://github.com/marian-nmt/marian |
MarkupLM | Markup Language Model | Transformer | RoBERTa | 2021 | HTML/XML documents | PyTorch, Transformers | Web page understanding, HTML/XML processing | https://huggingface.co/docs/transformers/model_doc/markuplm | https://github.com/microsoft/unilm/tree/master/markuplm |
MBart and MBart-50 | Multilingual BART | Transformer | BART | 2020 | CC25 corpus, 50 languages | PyTorch, Transformers | Multilingual translation, text generation | https://huggingface.co/docs/transformers/model_doc/mbart | https://github.com/pytorch/fairseq/tree/main/examples/mbart |
MEGA | Moving Average Equipped Gated Attention | Transformer | | 2022 | | PyTorch | Long-range dependency modeling | https://huggingface.co/docs/transformers/model_doc/mega | https://github.com/facebookresearch/mega |
MegatronBERT | Megatron BERT | Transformer | BERT | 2020 | | PyTorch | Large-scale language understanding | https://huggingface.co/docs/transformers/model_doc/megatron-bert | https://github.com/NVIDIA/Megatron-LM |
MegatronGPT2 | Megatron GPT-2 | Transformer | GPT-2 | 2020 | | PyTorch | Large-scale language generation | https://huggingface.co/docs/transformers/model_doc/megatron_gpt2 | https://github.com/NVIDIA/Megatron-LM |
Mistral | Mistral | Transformer | | 2023 | | PyTorch | General language tasks, text generation | https://huggingface.co/mistralai/Mistral-7B-v0.1 | https://github.com/mistralai/mistral-src |
Mixtral | Mixtral | Mixture of Experts (MoE) | Mistral | 2023 | | PyTorch | General language tasks, text generation | https://huggingface.co/mistralai/Mixtral-8x7B-v0.1 | Not publicly available |
mLUKE | Multilingual Language Understanding with Knowledge-based Embeddings | Transformer | XLM-RoBERTa | 2022 | Wikipedia in 24 languages | PyTorch, Transformers | Cross-lingual NLP tasks, entity-based reasoning | https://huggingface.co/docs/transformers/model_doc/mluke | https://github.com/studio-ousia/luke |
MobileBERT | MobileBERT | Transformer | BERT | 2020 | Same as BERT | TensorFlow, PyTorch | On-device NLP tasks | https://huggingface.co/docs/transformers/model_doc/mobilebert | https://github.com/google-research/google-research/tree/master/mobilebert |
ModernBERT | Modern BERT | Transformer | BERT | 2023 | | PyTorch | General language understanding | | |
MPNet | Masked and Permuted Pre-training for Language Understanding | Transformer | | 2020 | BookCorpus, English Wikipedia | PyTorch, Transformers | General language understanding | https://huggingface.co/docs/transformers/model_doc/mpnet | https://github.com/microsoft/MPNet |
MPT | MPT (MosaicML Pretrained Transformer) | Transformer | GPT | 2023 | | PyTorch | General language tasks | https://huggingface.co/mosaicml/mpt-7b | https://github.com/mosaicml/llm-foundry |
MRA | Masked Recursive Attention | Transformer | | 2022 | | PyTorch | Efficient long-range modeling | https://huggingface.co/docs/transformers/model_doc/mra | https://github.com/microsoft/torchscale |
MT5 | Multilingual T5 | Transformer | T5 | 2021 | mC4 dataset | JAX, Flax | Multilingual text-to-text tasks | https://huggingface.co/docs/transformers/model_doc/mt5 | https://github.com/google-research/multilingual-t5 |
MVP | Masked Visual Pre-training | Transformer | ViT | 2022 | ImageNet-21K | PyTorch | Vision-language tasks | https://huggingface.co/docs/transformers/model_doc/mvp | https://github.com/RUCAIBox/MVP |
myt5 | Multilingual YouTube T5 | Transformer | T5 | 2022 | YouTube subtitles | TensorFlow | Multilingual video understanding | https://huggingface.co/google/mt5-small | https://github.com/google-research/google-research/tree/master/myt5 |
Nemotron | Nemotron | Transformer | | 2023 | | PyTorch | General language tasks | | |
NEZHA | Neural Contextualized Representation for Chinese Language Understanding | Transformer | BERT | 2019 | Chinese corpora | PyTorch | Chinese NLU tasks | https://huggingface.co/docs/transformers/model_doc/nezha | https://github.com/PaddlePaddle/PaddleNLP |
NLLB | No Language Left Behind | Transformer | | 2022 | Parallel data across 200 languages | PyTorch | Multilingual translation | https://huggingface.co/facebook/nllb-200-3.3B | https://github.com/facebookresearch/fairseq/tree/nllb |
NLLB-MoE | No Language Left Behind - Mixture of Experts | Transformer with MoE | NLLB | 2024 | | PyTorch | Efficient multilingual translation | https://huggingface.co/facebook/nllb-moe-54b | |
Nystromformer | Nystromformer | Transformer with Nystrom method | | 2021 | | PyTorch | Efficient long-sequence modeling | https://huggingface.co/docs/transformers/model_doc/nystromformer | https://github.com/mlpen/Nystromformer |
OLMo | Open Language Model | Transformer | | 2024 | | PyTorch | General language tasks | https://huggingface.co/allenai/olmo-7b | https://github.com/allenai/OLMo |
OLMo2 | Open Language Model 2 | Transformer | OLMo | 2025 | OLMo-Mix-1124, Dolmino-Mix-1124 | PyTorch | General language tasks | | |
OLMoE | Open Language Model of Experts | Mixture of Experts | OLMo | 2024 | | PyTorch | Efficient language modeling | | |
Open-Llama | Open-Llama | Transformer | LLaMA | 2023 | | PyTorch | General language tasks | https://huggingface.co/openlm-research/open_llama_3b | https://github.com/openlm-research/open_llama |
OPT | Open Pretrained Transformer | Transformer | GPT | 2022 | | PyTorch | General language tasks | https://huggingface.co/facebook/opt-350m | https://github.com/facebookresearch/metaseq |
Pegasus | Pre-training with Extracted Gap-sentences for Abstractive Summarization | Transformer | | 2019 | C4, HugeNews | TensorFlow | Text summarization | https://huggingface.co/google/pegasus-xsum | https://github.com/google-research/pegasus |
PEGASUS-X | PEGASUS-X | Transformer | PEGASUS | 2022 | | TensorFlow, JAX | Long-form text summarization | https://huggingface.co/google/pegasus-x-base | https://github.com/google-research/pegasus |
Persimmon | Persimmon | Transformer | | 2023 | | PyTorch | General language tasks | https://huggingface.co/adept/persimmon-8b-base | |
Phi | Phi | Transformer | | 2023 | Filtered web data, synthetic data | PyTorch | General language tasks | https://huggingface.co/microsoft/phi-1_5 | |
Phi-3 | Phi-3 | Transformer | | 2024 | 3.3 trillion tokens | PyTorch | General language tasks | https://huggingface.co/docs/transformers/model_doc/phi3 | |
PhiMoE | Phi-3.5 Mixture of Experts | Transformer with MoE | Phi-3 | 2024 | 4.9T tokens | PyTorch | Multilingual and long-context tasks | https://huggingface.co/microsoft/Phi-3.5-MoE-instruct | |
PhoBERT | PhoBERT | Transformer | RoBERTa | 2020 | Vietnamese texts | PyTorch | Vietnamese NLP tasks | https://huggingface.co/vinai/phobert-base | https://github.com/VinAIResearch/PhoBERT |
PLBart | Programming Language BART | Transformer | BART | 2021 | GitHub code corpus | PyTorch | Code-related tasks | https://huggingface.co/docs/transformers/model_doc/plbart | https://github.com/wasiahmad/PLBART |
ProphetNet | ProphetNet | Transformer | | 2020 | | PyTorch | Sequence-to-sequence tasks | https://huggingface.co/docs/transformers/model_doc/prophetnet | https://github.com/microsoft/ProphetNet |
QDQBert | Quantized-Dequantized BERT | Transformer | BERT | 2020 | Same as BERT | PyTorch | Efficient BERT inference | https://huggingface.co/docs/transformers/model_doc/qdqbert | |
Qwen2 | Qwen2 | Transformer | | 2024 | | PyTorch | General language tasks | https://huggingface.co/Qwen/Qwen2-7B-Chat | https://github.com/QwenLM/Qwen2 |
Qwen2MoE | Qwen2 Mixture of Experts | Transformer with MoE | Qwen2 | 2025 | Up to 18 trillion tokens | PyTorch | General language tasks | https://huggingface.co/docs/transformers/model_doc/qwen2_moe | https://github.com/QwenLM/Qwen2.5 |
RAG | Retrieval-Augmented Generation | Transformer | BART | 2020 | Wikipedia | PyTorch | Open-domain question answering | https://huggingface.co/docs/transformers/model_doc/rag | https://github.com/huggingface/transformers/tree/main/examples/research_projects/rag |
REALM | Retrieval-Augmented Language Model Pre-Training | Transformer | BERT | 2020 | Wikipedia, CC-News | TensorFlow | Open-domain question answering | https://huggingface.co/docs/transformers/model_doc/realm | https://github.com/google-research/language/tree/master/language/realm |
RecurrentGemma | RecurrentGemma | Transformer with linear recurrences | | 2025 | | JAX, PyTorch | General language tasks, long sequence processing | https://huggingface.co/google/recurrentgemma-2b-it | |
Reformer | Reformer | Transformer with LSH attention | | 2020 | | PyTorch, JAX | Long sequence tasks | https://huggingface.co/docs/transformers/model_doc/reformer | https://github.com/google/trax/tree/master/trax/models/reformer |
RemBERT | Rembert | Transformer | BERT | 2021 | | TensorFlow | Multilingual NLP tasks | https://huggingface.co/docs/transformers/model_doc/rembert | https://github.com/google-research/rembert |
RetriBERT | RetriBERT | Transformer | BERT | 2020 | MS MARCO | PyTorch | Information retrieval | https://huggingface.co/docs/transformers/model_doc/retribert | https://github.com/huggingface/transformers/tree/main/examples/research_projects/distillation |
RoBERTa | Robustly Optimized BERT Approach | Transformer | BERT | 2019 | CC-News, WebText, Stories | PyTorch | General language understanding | https://huggingface.co/docs/transformers/model_doc/roberta | https://github.com/pytorch/fairseq/tree/main/examples/roberta |
RoBERTa-PreLayerNorm | RoBERTa with Pre-Layer Normalization | Transformer | RoBERTa | 2020 | Same as RoBERTa | PyTorch | General language understanding | https://huggingface.co/docs/transformers/model_doc/roberta-prelayernorm | |
RoCBert | Robust Chinese BERT | Transformer | BERT | 2021 | Chinese corpora | PyTorch | Chinese NLP tasks | https://huggingface.co/docs/transformers/model_doc/rocbert | https://github.com/FudanNLP/ROCBERT |
RoFormer | RoFormer | Transformer with rotary position embeddings | BERT | 2021 | | PyTorch | Long text classification tasks | https://huggingface.co/docs/transformers/model_doc/roformer | https://github.com/ZhuiyiTechnology/roformer |
RWKV | RWKV | Recurrent neural network with attention | | 2022 | | PyTorch | General language tasks | https://huggingface.co/docs/transformers/model_doc/rwkv | https://github.com/BlinkDL/RWKV-LM |
Splinter | Splinter | Transformer | BERT | 2021 | | PyTorch | Question answering | https://huggingface.co/docs/transformers/model_doc/splinter | |
SqueezeBERT | SqueezeBERT | Transformer | BERT | 2020 | Same as BERT | PyTorch | Efficient NLP tasks | https://huggingface.co/docs/transformers/model_doc/squeezebert | https://github.com/huggingface/transformers/tree/main/examples/research_projects/squeezebert |
StableLm | Stable Language Model | Transformer | | 2023 | 1 trillion tokens, multilingual | PyTorch | General language tasks | https://huggingface.co/stabilityai/stablelm-base-alpha-3b | https://github.com/Stability-AI/StableLM |
Starcoder2 | Starcoder2 | Transformer | | 2024 | | PyTorch | Code generation | https://huggingface.co/bigcode/starcoder2-15b | |
SwitchTransformers | Switch Transformers | Transformer with sparse routing | T5 | 2021 | C4 dataset | JAX, Flax | Efficient language modeling | https://huggingface.co/docs/transformers/model_doc/switch_transformers | https://github.com/google-research/switch-transformer |
T5 | Text-to-Text Transfer Transformer | Transformer | | 2019 | C4 dataset | TensorFlow, PyTorch | Text-to-text tasks | https://huggingface.co/docs/transformers/model_doc/t5 | https://github.com/google-research/text-to-text-transfer-transformer |
T5v1.1 | T5 version 1.1 | Transformer | T5 | 2020 | C4 dataset | JAX, Flax | Text-to-text tasks | https://huggingface.co/docs/transformers/model_doc/t5v1.1 | https://github.com/google-research/text-to-text-transfer-transformer |
TAPEX | Table Pre-training via Execution | Transformer | BART | 2022 | WikiTables, WikiSQL | PyTorch | Table-based question answering | https://huggingface.co/docs/transformers/model_doc/tapex | https://github.com/microsoft/Table-Pretraining |
Transformer XL | Transformer-XL | Transformer with relative positional encoding | | 2019 | | PyTorch | Long-range dependency modeling | https://huggingface.co/docs/transformers/model_doc/transfo-xl | https://github.com/kimiyoung/transformer-xl |
UL2 | Unified Language Learner | Transformer | | 2024 | | JAX, Flax | General language tasks | https://huggingface.co/google/ul2 | |
UMT5 | Universal Multilingual T5 | Transformer | T5 | 2023 | | PyTorch | Multilingual text-to-text tasks | https://huggingface.co/google/umt5-small | |
X-MOD | Cross-lingual Modular | Transformer with language adapters | XLM-R | 2022 | Filtered CommonCrawl, 81 languages | PyTorch | Multilingual NLP tasks | https://huggingface.co/facebook/xmod-base | |
XGLM | Cross-lingual Language Model | Transformer | | 2022 | | PyTorch | Multilingual language tasks | https://huggingface.co/facebook/xglm-564M | |
XLM | Cross-lingual Language Model | Transformer | | 2019 | Wikipedia | PyTorch | Multilingual NLP tasks | https://huggingface.co/xlm-mlm-en-2048 | https://github.com/facebookresearch/XLM |
XLM-ProphetNet | Cross-lingual ProphetNet | Transformer | ProphetNet | 2021 | | PyTorch | Multilingual sequence-to-sequence tasks | https://huggingface.co/docs/transformers/model_doc/xlm-prophetnet | |
XLM-RoBERTa | Cross-lingual RoBERTa | Transformer | RoBERTa | 2019 | 2.5TB filtered CommonCrawl, 100 languages4 | PyTorch | Multilingual NLP tasks | https://huggingface.co/xlm-roberta-base | https://github.com/pytorch/fairseq/tree/main/examples/xlmr |
XLM-RoBERTa-XL | Cross-lingual RoBERTa Extra Large | Transformer | RoBERTa | 2022 | 2.5TB filtered CommonCrawl, 100 languages1 | PyTorch | Multilingual NLP tasks | https://huggingface.co/facebook/xlm-roberta-xl | |
XLM-V | Cross-lingual Language Model V | Transformer | | 2023 | | PyTorch | Multilingual and vision-language tasks | https://huggingface.co/facebook/xlm-v-base | |
XLNet | XLNet: Generalized Autoregressive Pretraining for Language Understanding | Transformer with permutation-based training | | 2019 | | PyTorch | General language understanding tasks | https://huggingface.co/xlnet-base-cased | https://github.com/zihangdai/xlnet |
YOSO | You Only Sample (Almost) Once | Transformer with Bernoulli sampling attention | | 2021 | | PyTorch | Efficient self-attention for long sequences | https://huggingface.co/docs/transformers/model_doc/yoso | https://github.com/mlpen/YOSO |
Zamba | Zamba-7B-v1 | Hybrid Mamba-Transformer | | 2024 | 1T tokens of text and code data, 50B high-quality tokens | PyTorch, Hugging Face Transformers | General language tasks, next-token prediction | https://huggingface.co/Zyphra/Zamba-7B-v1 | https://github.com/Zyphra/Zamba |
Zamba2 | Zamba2-2.7B | Hybrid SSM-Transformer | Zamba | 2024 | 3T tokens of text and code data, 100B high-quality tokens | PyTorch, Hugging Face Transformers | General language tasks, on-device applications | https://huggingface.co/Zyphra/Zamba2-2.7B | https://github.com/Zyphra/Zamba2 |