MASTAF: A Model-Agnostic Spatio-Temporal Attention Fusion Network for Few-shot Video Classification

Liu, Rex; Zhang, Huanle; Pirsiavash, Hamed; Liu, Xin

MASTAF: A Model-Agnostic Spatio-Temporal Attention Fusion Network for Few-shot Video Classification

dc.contributor.author	Liu, Rex
dc.contributor.author	Zhang, Huanle
dc.contributor.author	Pirsiavash, Hamed
dc.contributor.author	Liu, Xin
dc.date.accessioned	2022-11-14T15:49:56Z
dc.date.available	2022-11-14T15:49:56Z
dc.date.issued	2023-02-06
dc.description	2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV); Waikoloa, HI, USA; 02-07 January 2023
dc.description.abstract	We propose MASTAF, a Model-Agnostic SpatioTemporal Attention Fusion network for few-shot video classification. MASTAF takes input from a general video spatial and temporal representation,e.g., using 2D CNN, 3D CNN, and Video Transformer. Then, to make the most of such representations, we use self- and cross-attention models to highlight the critical spatio-temporal region to increase the inter-class variations and decrease the intra-class variations. Last, MASTAF applies a lightweight fusion network and a nearest neighbor classifier to classify each query video. We demonstrate that MASTAF improves the state-of-the-art performance on three few-shot video classification benchmarks(UCF101, HMDB51, and Something-Something-V2), e.g., by up to 91.6%, 69.5%, and 60.7% for five-way one-shot video classification, respectively.	en
dc.description.uri	https://ieeexplore.ieee.org/abstract/document/10030894	en
dc.format.extent	10 pages	en
dc.genre	conference papers and proceedings	en
dc.genre	postprints	en
dc.identifier	doi:10.13016/m2j31x-akic
dc.identifier.citation	Liu, Xin, Huanle Zhang, Hamed Pirsiavash, and Xin Liu. “MASTAF: A Model-Agnostic Spatio-Temporal Attention Fusion Network for Few-Shot Video Classification.” In 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2507–16, 2023. https://doi.org/10.1109/WACV56688.2023.00254.
dc.identifier.uri	https://doi.org/10.1109/WACV56688.2023.00254
dc.identifier.uri	http://hdl.handle.net/11603/26320
dc.language.iso	en	en
dc.publisher	IEEE
dc.relation.isAvailableAt	The University of Maryland, Baltimore County (UMBC)
dc.relation.ispartof	UMBC Computer Science and Electrical Engineering Department Collection
dc.relation.ispartof	UMBC Faculty Collection
dc.rights	© 2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.	en
dc.title	MASTAF: A Model-Agnostic Spatio-Temporal Attention Fusion Network for Few-shot Video Classification	en
dc.title.alternative	STAF: A Spatio-Temporal Attention Fusion Network for Few-shot Video Classification
dc.type	Text	en

Files

Original bundle

Now showing 1 - 2 of 2

Name:: Liu_MASTAF_A_Model-Agnostic_Spatio-Temporal_Attention_Fusion_Network_for_Few-Shot_Video_WACV_2023_paper.pdf
Size:: 1.23 MB
Format:: Adobe Portable Document Format
Description:

Download

Name:: Liu_MASTAF_A_Model-Agnostic_WACV_2023_supplemental.pdf
Size:: 831.74 KB
Format:: Adobe Portable Document Format
Description:: Supplement

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 2.56 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

UMBC Computer Science and Electrical Engineering Department
UMBC Faculty Collection