Fusion of Vision Transformer and Convolutional Neural Network for Explainable and Efficient Histopathological Image Classification in Cyber-Physical Healthcare Systems
| dc.contributor.author | Rahman, Mohammad Ishtiaque | |
| dc.date.accessioned | 2025-11-21T00:30:01Z | |
| dc.date.issued | 2025-10-06 | |
| dc.description.abstract | Accurate and interpretable classification of breast cancer histopathology images is critical for early diagnosis and treatment planning. This study proposes a hybrid deep learning model that integrates convolutional neural networks (CNNs) with a Vision Transformer (ViT) to jointly capture local texture patterns and global contextual features. The fusion architecture is evaluated on two publicly available datasets: BreakHis and the invasive ductal carcinoma (IDC) dataset. Results demonstrate that the ViT+CNN model consistently outperforms standalone CNN and ViT models, achieving state-of-the-art accuracy while maintaining robustness across datasets. To assess the feasibility of deployment in real-world clinical scenarios, we benchmark inference latency and memory usage under both standard and edge-constrained environments. Although the fusion model has higher computational cost, its latency remains within acceptable thresholds for real-time diagnostic workflows. Furthermore, we enhance interpretability by combining Grad-CAM with attention rollout, allowing for transparent visual explanation of the model’s decisions. The findings support the clinical potential of hybrid transformer-convolutional models for scalable, reliable, and explainable medical image analysis. | |
| dc.description.uri | https://link.springer.com/article/10.1007/s41314-025-00079-0 | |
| dc.format.extent | 13 pages | |
| dc.genre | journal articles | |
| dc.identifier | doi:10.13016/m2zlz4-w01w | |
| dc.identifier.citation | Rahman, Mohammad Ishtiaque. “Fusion of Vision Transformer and Convolutional Neural Network for Explainable and Efficient Histopathological Image Classification in Cyber-Physical Healthcare Systems.” Journal of Transformative Technologies and Sustainable Development 9, no. 1 (2025): 8. https://doi.org/10.1007/s41314-025-00079-0. | |
| dc.identifier.uri | https://doi.org/10.1007/s41314-025-00079-0 | |
| dc.identifier.uri | http://hdl.handle.net/11603/40823 | |
| dc.language.iso | en | |
| dc.publisher | Springer Nature | |
| dc.relation.isAvailableAt | The University of Maryland, Baltimore County (UMBC) | |
| dc.relation.ispartof | UMBC Information Systems Department | |
| dc.relation.ispartof | UMBC Student Collection | |
| dc.rights | Attribution 4.0 International | |
| dc.rights.uri | https://creativecommons.org/licenses/by/4.0/ | |
| dc.subject | CNN-ViT | |
| dc.subject | Histopathology | |
| dc.subject | ViT | |
| dc.subject | XAI | |
| dc.subject | CNN | |
| dc.title | Fusion of Vision Transformer and Convolutional Neural Network for Explainable and Efficient Histopathological Image Classification in Cyber-Physical Healthcare Systems | |
| dc.type | Text | |
| dcterms.creator | https://orcid.org/0009-0003-2392-7028 |
Files
Original bundle
1 - 1 of 1
