Elhenawy, MohammedAbdelhay, AhmedAlhadidi, Taqwa I.Ashqar, HuthaifaJaradat, ShadiJaber, AhmedGlaser, SebastienRakotonirainy, Andry2024-10-282024-10-282024-06-11https://doi.org/10.48550/arXiv.2406.06865http://hdl.handle.net/11603/36802Multimodal Large Language Models (MLLMs) have demonstrated proficiency in processing di-verse modalities, including text, images, and audio. These models leverage extensive pre-existing knowledge, enabling them to address complex problems with minimal to no specific training examples, as evidenced in few-shot and zero-shot in-context learning scenarios. This paper investigates the use of MLLMs' visual capabilities to 'eyeball' solutions for the Traveling Salesman Problem (TSP) by analyzing images of point distributions on a two-dimensional plane. Our experiments aimed to validate the hypothesis that MLLMs can effectively 'eyeball' viable TSP routes. The results from zero-shot, few-shot, self-ensemble, and self-refine zero-shot evaluations show promising outcomes. We anticipate that these findings will inspire further exploration into MLLMs' visual reasoning abilities to tackle other combinatorial problems.19 pagesen-USAttribution 4.0 Internationalhttps://creativecommons.org/licenses/by/4.0/Eyeballing Combinatorial Problems: A Case Study of Using Multimodal Large Language Models to Solve Traveling Salesman ProblemsText