Deep Bayesian Active Semi Supervised Learning for Lung Cancer Detection from Computerized Tomography (CT) images

Author/Creator

Author/Creator ORCID

Department

Computer Science and Electrical Engineering

Program

Computer Science

Citation of Original Publication

Rights

Distribution Rights granted to UMBC by the author.
This item may be protected under Title 17 of the U.S. Copyright Law. It is made available by UMBC for non-commercial research and education. For permission to publish or reproduce, please see http://aok.lib.umbc.edu/specoll/repro.php or contact Special Collections at speccoll(at)umbc.edu

Subjects

Abstract

Early lung cancer screening using a low dose Computed Tomography scans are challenging tasks due to its high false positives prediction rate. Besides, label annotations for supervised learning is a monotonous task. This theses uses Deep Bayesian Active Semi-Supervised Learning models of multiple layer 3D-Convolutional Neural Networks (CNN) architectures to achieve high Area under the curve (AUC) performance with less labeled data. The proposed training algorithm starts with a small subset of labeled data at the initial phase. Following the Expectation-Maximization (EM) iterative step where the model produces pseudo labels from the unlabeled data set and being retrained to improve the outcome. In addition, an active learning component in the EM step is kicked off to simulate the human in the loop adding additional ground truth labeled data for better training the models. We have trained, evaluated, and tested models using two publicly available low dose CT lung cancer datasets namely National Lung Screening Trial (NLST) and Kaggle Data Science Bowl 2017 (Kaggle). With the initial use of 50% of the labeled Kaggle datasets, the results show AUC of 0.94 using 3D-Resnet34 architecture. Experiments using NLST Dataset achieve AUC of 0.95 using 3D-Resnet34 with initial use of 50% of labeled data. In conclusion, we have shown that our models give high positive rates and lower false positive rates using less labeled images.