Chapman, DavidSharma, Charu2022-09-292022-09-292021-01-0112389http://hdl.handle.net/11603/25960Person Re-Identification (re-ID) is an object re-ID problem that aims to re-identify a person by finding an association between the images of a person captured by multiple cameras. Due to its foundational role in computer-vision based video surveillance applications, it is vital to generate a robust feature embedding to represent a person. CNN-based methods are known for their feature learning abilities, and for many years were a prime choice for a person re-ID. In this theses, we explore a method that takes advantage of auxiliary local tokens and the global tokens of the vision transformer to generate the final feature embedding. We also propose a novel blockwise fine-tuning technique that improves the performance of the Vision Transformer. Our model trained with blockwise fine-tuning achieves $96.6$ rank-1 accuracy and $90.3$ mAP score on the Market-1501 dataset. On the CUHK-03 dataset, it achieves $97.5$ rank-1 accuracy and a $95.03$ mAP score. These performances are comparable to many recently published methods for this problem.application:pdfThis item may be protected under Title 17 of the U.S. Copyright Law. It is made available by UMBC for non-commercial research and education. For permission to publish or reproduce, please see http://aok.lib.umbc.edu/specoll/repro.php or contact Special Collections at speccoll(at)umbc.eduComputer VisionMulti camera trekkingPattern RecognitionPerson Re-identificationPerson Re-Identification using Vision Transformer with Auxiliary TokensText