Fusion of Vision Transformer and Convolutional Neural Network for Explainable and Efficient Histopathological Image Classification in Cyber-Physical Healthcare Systems

dc.contributor.authorRahman, Mohammad Ishtiaque
dc.date.accessioned2025-11-21T00:30:01Z
dc.date.issued2025-10-06
dc.description.abstractAccurate and interpretable classification of breast cancer histopathology images is critical for early diagnosis and treatment planning. This study proposes a hybrid deep learning model that integrates convolutional neural networks (CNNs) with a Vision Transformer (ViT) to jointly capture local texture patterns and global contextual features. The fusion architecture is evaluated on two publicly available datasets: BreakHis and the invasive ductal carcinoma (IDC) dataset. Results demonstrate that the ViT+CNN model consistently outperforms standalone CNN and ViT models, achieving state-of-the-art accuracy while maintaining robustness across datasets. To assess the feasibility of deployment in real-world clinical scenarios, we benchmark inference latency and memory usage under both standard and edge-constrained environments. Although the fusion model has higher computational cost, its latency remains within acceptable thresholds for real-time diagnostic workflows. Furthermore, we enhance interpretability by combining Grad-CAM with attention rollout, allowing for transparent visual explanation of the model’s decisions. The findings support the clinical potential of hybrid transformer-convolutional models for scalable, reliable, and explainable medical image analysis.
dc.description.urihttps://link.springer.com/article/10.1007/s41314-025-00079-0
dc.format.extent13 pages
dc.genrejournal articles
dc.identifierdoi:10.13016/m2zlz4-w01w
dc.identifier.citationRahman, Mohammad Ishtiaque. “Fusion of Vision Transformer and Convolutional Neural Network for Explainable and Efficient Histopathological Image Classification in Cyber-Physical Healthcare Systems.” Journal of Transformative Technologies and Sustainable Development 9, no. 1 (2025): 8. https://doi.org/10.1007/s41314-025-00079-0.
dc.identifier.urihttps://doi.org/10.1007/s41314-025-00079-0
dc.identifier.urihttp://hdl.handle.net/11603/40823
dc.language.isoen
dc.publisherSpringer Nature
dc.relation.isAvailableAtThe University of Maryland, Baltimore County (UMBC)
dc.relation.ispartofUMBC Information Systems Department
dc.relation.ispartofUMBC Student Collection
dc.rightsAttribution 4.0 International
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.subjectCNN-ViT
dc.subjectHistopathology
dc.subjectViT
dc.subjectXAI
dc.subjectCNN
dc.titleFusion of Vision Transformer and Convolutional Neural Network for Explainable and Efficient Histopathological Image Classification in Cyber-Physical Healthcare Systems
dc.typeText
dcterms.creatorhttps://orcid.org/0009-0003-2392-7028

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
s41314025000790.pdf
Size:
1019.66 KB
Format:
Adobe Portable Document Format