MASTAF: A Model-Agnostic Spatio-Temporal Attention Fusion Network for Few-shot Video Classification
dc.contributor.author | Liu, Rex | |
dc.contributor.author | Zhang, Huanle | |
dc.contributor.author | Pirsiavash, Hamed | |
dc.contributor.author | Liu, Xin | |
dc.date.accessioned | 2022-11-14T15:49:56Z | |
dc.date.available | 2022-11-14T15:49:56Z | |
dc.date.issued | 2023-02-06 | |
dc.description | 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV); Waikoloa, HI, USA; 02-07 January 2023 | |
dc.description.abstract | We propose MASTAF, a Model-Agnostic SpatioTemporal Attention Fusion network for few-shot video classification. MASTAF takes input from a general video spatial and temporal representation,e.g., using 2D CNN, 3D CNN, and Video Transformer. Then, to make the most of such representations, we use self- and cross-attention models to highlight the critical spatio-temporal region to increase the inter-class variations and decrease the intra-class variations. Last, MASTAF applies a lightweight fusion network and a nearest neighbor classifier to classify each query video. We demonstrate that MASTAF improves the state-of-the-art performance on three few-shot video classification benchmarks(UCF101, HMDB51, and Something-Something-V2), e.g., by up to 91.6%, 69.5%, and 60.7% for five-way one-shot video classification, respectively. | en_US |
dc.description.uri | https://ieeexplore.ieee.org/abstract/document/10030894 | en_US |
dc.format.extent | 10 pages | en_US |
dc.genre | conference papers and proceedings | en_US |
dc.genre | postprints | en_US |
dc.identifier | doi:10.13016/m2j31x-akic | |
dc.identifier.citation | Liu, Xin, Huanle Zhang, Hamed Pirsiavash, and Xin Liu. “MASTAF: A Model-Agnostic Spatio-Temporal Attention Fusion Network for Few-Shot Video Classification.” In 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2507–16, 2023. https://doi.org/10.1109/WACV56688.2023.00254. | |
dc.identifier.uri | https://doi.org/10.1109/WACV56688.2023.00254 | |
dc.identifier.uri | http://hdl.handle.net/11603/26320 | |
dc.language.iso | en_US | en_US |
dc.publisher | IEEE | |
dc.relation.isAvailableAt | The University of Maryland, Baltimore County (UMBC) | |
dc.relation.ispartof | UMBC Computer Science and Electrical Engineering Department Collection | |
dc.relation.ispartof | UMBC Faculty Collection | |
dc.rights | © 2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. | en_US |
dc.title | MASTAF: A Model-Agnostic Spatio-Temporal Attention Fusion Network for Few-shot Video Classification | en_US |
dc.title.alternative | STAF: A Spatio-Temporal Attention Fusion Network for Few-shot Video Classification | |
dc.type | Text | en_US |
Files
Original bundle
1 - 2 of 2
Loading...
- Name:
- Liu_MASTAF_A_Model-Agnostic_Spatio-Temporal_Attention_Fusion_Network_for_Few-Shot_Video_WACV_2023_paper.pdf
- Size:
- 1.23 MB
- Format:
- Adobe Portable Document Format
- Description:
Loading...
- Name:
- Liu_MASTAF_A_Model-Agnostic_WACV_2023_supplemental.pdf
- Size:
- 831.74 KB
- Format:
- Adobe Portable Document Format
- Description:
- Supplement
License bundle
1 - 1 of 1
Loading...
- Name:
- license.txt
- Size:
- 2.56 KB
- Format:
- Item-specific license agreed upon to submission
- Description: