Video Type NLP Model들을 총망라하여 요약한 표 (HF-based)

Name Full Name Architecture Base Model Developed Training Dataset Lib. & Framework Use Cases HF URL Githhub URL
TimeSformer TimeSformer (Time-Space Transformer) Transformer Vision Transformer (ViT) 2021 Evaluated on datasets like Kinetics-400 and Kinetics-600 PyTorch Video classification and action recognition tasks
VideoMAE Video Masked Autoencoders Masked autoencoder Vision Transformer (ViT) 2022 Pre-trained on large-scale video datasets; specifics vary by implementation PyTorch Video classification, action recognition, and efficient video representation learning
ViViT Video Vision Transformer Pure transformer-based model Vision Transformer (ViT) 2021 Trained and evaluated on datasets such as Kinetics-400, Kinetics-600, Epic Kitchens, Something-Something V2, and Moments in Time. TensorFlow and JAX Video classification and action recognition tasks