Research on fine-tuning CNN for cancer diagnosis with gene expression data

Author/Creator ORCID

Date

2022-06-21

Department

Program

Citation of Original Publication

Liu, Zhen, et al. "Research on fine-tuning CNN for cancer diagnosis with gene expression data" ICMLC 2022: 2022 14th International Conference on Machine Learning and Computing (ICMLC) (ACM), 2022 pp. 140-145. https://doi.org/10.1145/3529836.3529844.

Rights

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org.

Subjects

Abstract

Convolutional neural networks have been used for cancer type prediction with gene expression data. However, its success is impeded by the lack of large labeled datasets in gene expression data. The class imbalance problem leads to that the model ignores the performance of the minority class. To handle the small sample size problem, fine-tuning CNN is used to transfer the knowledge of pre-trained model for cancer type predicting. The dataset with one cancer is used for training a model. The pre-model is finetuned with the training set of a new cancer type, and the fine-tuned model could be used for identifying the new cancer type. And the SMOTE resampling method is used for handling the class imbalance problem. We carried out experiments on The TCGA datasets with 1D-CNN and 2D-CNN models. The fine-tuned 1D-CNN obtains 97.5% accuracy, 98.6% Fscore of cancer type and 78.1% Fscore of normal type on average, and fine-tuned 2D-CNN obtains 97.4% accuracy, 98.5% Fscore of cancer type and 77.4% of normal type on average. Using fine-tuned CNN with SMOTE, the accuracy, Fscore of cancer type and the one of normal type are respectively increased about 1.5%, 0.5% and 21.5% on average.