Person Re-Identification using Vision Transformer with Auxiliary Tokens
dc.contributor.advisor | Chapman, David | |
dc.contributor.author | Sharma, Charu | |
dc.contributor.department | Computer Science and Electrical Engineering | |
dc.contributor.program | Computer Science | |
dc.date.accessioned | 2022-09-29T15:37:44Z | |
dc.date.available | 2022-09-29T15:37:44Z | |
dc.date.issued | 2021-01-01 | |
dc.description.abstract | Person Re-Identification (re-ID) is an object re-ID problem that aims to re-identify a person by finding an association between the images of a person captured by multiple cameras. Due to its foundational role in computer-vision based video surveillance applications, it is vital to generate a robust feature embedding to represent a person. CNN-based methods are known for their feature learning abilities, and for many years were a prime choice for a person re-ID. In this theses, we explore a method that takes advantage of auxiliary local tokens and the global tokens of the vision transformer to generate the final feature embedding. We also propose a novel blockwise fine-tuning technique that improves the performance of the Vision Transformer. Our model trained with blockwise fine-tuning achieves $96.6$ rank-1 accuracy and $90.3$ mAP score on the Market-1501 dataset. On the CUHK-03 dataset, it achieves $97.5$ rank-1 accuracy and a $95.03$ mAP score. These performances are comparable to many recently published methods for this problem. | |
dc.format | application:pdf | |
dc.genre | theses | |
dc.identifier | doi:10.13016/m2m12t-wohl | |
dc.identifier.other | 12389 | |
dc.identifier.uri | http://hdl.handle.net/11603/25960 | |
dc.language | en | |
dc.relation.isAvailableAt | The University of Maryland, Baltimore County (UMBC) | |
dc.relation.ispartof | UMBC Computer Science and Electrical Engineering Department Collection | |
dc.relation.ispartof | UMBC Theses and Dissertations Collection | |
dc.relation.ispartof | UMBC Graduate School Collection | |
dc.relation.ispartof | UMBC Student Collection | |
dc.rights | This item may be protected under Title 17 of the U.S. Copyright Law. It is made available by UMBC for non-commercial research and education. For permission to publish or reproduce, please see http://aok.lib.umbc.edu/specoll/repro.php or contact Special Collections at speccoll(at)umbc.edu | |
dc.source | Original File Name: Sharma_umbc_0434M_12389.pdf | |
dc.subject | Computer Vision | |
dc.subject | Multi camera trekking | |
dc.subject | Pattern Recognition | |
dc.subject | Person Re-identification | |
dc.title | Person Re-Identification using Vision Transformer with Auxiliary Tokens | |
dc.type | Text | |
dcterms.accessRights | Access limited to the UMBC community. Item may possibly be obtained via Interlibrary Loan through a local library, pending author/copyright holder's permission. | |
dcterms.accessRights | Access limited to the UMBC community. Item may possibly be obtained via Interlibrary Loan thorugh a local library, pending author/copyright holder's permission. |