Oates, TimVartak, Akash Alok2022-09-292022-09-292022-01-0112504http://hdl.handle.net/11603/25991People identify man-made objects by their visual appearance and the text on them e.g., does a bottle say water or shampoo? We use text as an important visual cue to help distinguish between similar looking objects. This theses explores a novel joint model of visual appearance and textual cues for image classification.We perform this in three functions - (a) Isolating an object in an input image; (b) Extracting text from the image; (c) Training a joint vision/text model. We simplify the task by extracting text separately and presenting it to the model in machine readable format. Such a joint model has utility in many real world challenges where language is interpreted through a sensory perception like vision or sound. The aim of the research is to understand whether visual percepts, when understood in the context of extracted language, will provide a better classification of image objects than using only pure vision to perform image classification. In conclusion, we show that joint classifier models can successfully make use of text present in images to classify objects, provided that the extracted text from images is of high quality and we have the number of images proportional to the number of classification classes.application:pdfThis item may be protected under Title 17 of the U.S. Copyright Law. It is made available by UMBC for non-commercial research and education. For permission to publish or reproduce, please see http://aok.lib.umbc.edu/specoll/repro.php or contact Special Collections at speccoll(at)umbc.eduClassificationComputer VisionDeep LearningMachine LearningNatural Language ProcessingText ExtractionUsing Text to Improve Classification of Man-Made ObjectsText