Improving the generalization of unsupervised feature learning by using data from diferent sources on gene expression data for cancer diagnosis
dc.contributor.author | Liu, Zhen | |
dc.contributor.author | Wang, Ruoyu | |
dc.contributor.author | Zhang, Wenbin | |
dc.date.accessioned | 2023-03-06T18:13:54Z | |
dc.date.available | 2023-03-06T18:13:54Z | |
dc.date.issued | 2022-02-24 | |
dc.description.abstract | Machine learning techniques have been utilized on gene expression profling for cancer diagnosis. However, the gene expression data sufer from the curse of high dimensionality. Diferent kinds of feature reduction methods have been proposed to decrease the features for specifc cancer diagnosis. However, with the difculty of obtaining the samples of a particular tumor, the lack of training samples may lead to the overftting problem. In addition, the feature reduction model on a specifc tumor may lead to the problem that the model is not scalable and cannot be generalized to new cancer types. To handle these problems, this paper proposes an unsupervised feature learning method to reduce the data dimensionality of gene expression data. This method amplifes the training samples of feature learning by utilizing the unlabeled samples from diferent sources. Two heuristic rules are devised to check if the unlabeled samples could be used for amplifying the training set. The amplifed training set is used to train the feature learning model based on sparse autoencoder. Since the method leverages the knowledge among the expression data from diferent sources, it improves the generalization of unsupervised feature learning and further boosts the cancer diagnosis performance. A series of experiments are carried out on the gene expression datasets from TCGA and other sources. Experimental results prove that our method improves the generalization of cancer diagnosis when unlabeled data are used for latent feature learning. | en_US |
dc.description.sponsorship | This work is supported by the Key Research Platforms and Projects of Colleges and Universities in Guangdong Province [Grant Nos. 2020ZDZX3060 and 2019KZDZX1020], National Natural Science Foundation of China [Grant No. 61501128], fnancial support from China Scholarship Council, and Natural Science Foundation of Guangdong Province [Grant No. 2017A030313345]. Key Laboratory of Microbial Resources and Drug Development in Guizhou Province, 2020ZDZX3060, Zhen Liu, National Natural Science Foundation of China, 61501128, Zhen Liu, Natural Science Foundation of Guangdong Province, 2017A030313345, Ruoyu Wang | en_US |
dc.description.uri | https://link.springer.com/article/10.1007/s11517-022-02522-2 | en_US |
dc.format.extent | 25 pages | en_US |
dc.genre | journal articles | en_US |
dc.genre | postprints | en_US |
dc.identifier | doi:10.13016/m2ej8c-j5nd | |
dc.identifier.citation | Liu, Z., Wang, R. & Zhang, W. Improving the generalization of unsupervised feature learning by using data from different sources on gene expression data for cancer diagnosis. Med Biol Eng Comput 60, 1055–1073 (2022). https://doi.org/10.1007/s11517-022-02522-2 | en_US |
dc.identifier.uri | https://doi.org/10.1007/s11517-022-02522-2 | |
dc.identifier.uri | http://hdl.handle.net/11603/26952 | |
dc.language.iso | en_US | en_US |
dc.publisher | Springer Nature | en_US |
dc.relation.isAvailableAt | The University of Maryland, Baltimore County (UMBC) | |
dc.relation.ispartof | UMBC Information Systems Department Collection | |
dc.relation.ispartof | UMBC Student Collection | |
dc.rights | This version of the article has been accepted for publication, after peer review (when applicable) and is subject to Springer Nature’s AM terms of use, but is not the Version of Record and does not reflect post-acceptance improvements, or any corrections. The Version of Record is available online at: https://doi.org/10.1007/s11517-022-02522-2 | en_US |
dc.rights | Access to this item will begin on 02/24/2023 | |
dc.title | Improving the generalization of unsupervised feature learning by using data from diferent sources on gene expression data for cancer diagnosis | en_US |
dc.type | Text | en_US |