Improving the generalization of unsupervised feature learning by using data from diferent sources on gene expression data for cancer diagnosis

dc.contributor.authorLiu, Zhen
dc.contributor.authorWang, Ruoyu
dc.contributor.authorZhang, Wenbin
dc.date.accessioned2023-03-06T18:13:54Z
dc.date.available2023-03-06T18:13:54Z
dc.date.issued2022-02-24
dc.description.abstractMachine learning techniques have been utilized on gene expression profling for cancer diagnosis. However, the gene expression data sufer from the curse of high dimensionality. Diferent kinds of feature reduction methods have been proposed to decrease the features for specifc cancer diagnosis. However, with the difculty of obtaining the samples of a particular tumor, the lack of training samples may lead to the overftting problem. In addition, the feature reduction model on a specifc tumor may lead to the problem that the model is not scalable and cannot be generalized to new cancer types. To handle these problems, this paper proposes an unsupervised feature learning method to reduce the data dimensionality of gene expression data. This method amplifes the training samples of feature learning by utilizing the unlabeled samples from diferent sources. Two heuristic rules are devised to check if the unlabeled samples could be used for amplifying the training set. The amplifed training set is used to train the feature learning model based on sparse autoencoder. Since the method leverages the knowledge among the expression data from diferent sources, it improves the generalization of unsupervised feature learning and further boosts the cancer diagnosis performance. A series of experiments are carried out on the gene expression datasets from TCGA and other sources. Experimental results prove that our method improves the generalization of cancer diagnosis when unlabeled data are used for latent feature learning.en_US
dc.description.sponsorshipThis work is supported by the Key Research Platforms and Projects of Colleges and Universities in Guangdong Province [Grant Nos. 2020ZDZX3060 and 2019KZDZX1020], National Natural Science Foundation of China [Grant No. 61501128], fnancial support from China Scholarship Council, and Natural Science Foundation of Guangdong Province [Grant No. 2017A030313345]. Key Laboratory of Microbial Resources and Drug Development in Guizhou Province, 2020ZDZX3060, Zhen Liu, National Natural Science Foundation of China, 61501128, Zhen Liu, Natural Science Foundation of Guangdong Province, 2017A030313345, Ruoyu Wangen_US
dc.description.urihttps://link.springer.com/article/10.1007/s11517-022-02522-2en_US
dc.format.extent25 pagesen_US
dc.genrejournal articlesen_US
dc.genrepostprintsen_US
dc.identifierdoi:10.13016/m2ej8c-j5nd
dc.identifier.citationLiu, Z., Wang, R. & Zhang, W. Improving the generalization of unsupervised feature learning by using data from different sources on gene expression data for cancer diagnosis. Med Biol Eng Comput 60, 1055–1073 (2022). https://doi.org/10.1007/s11517-022-02522-2en_US
dc.identifier.urihttps://doi.org/10.1007/s11517-022-02522-2
dc.identifier.urihttp://hdl.handle.net/11603/26952
dc.language.isoen_USen_US
dc.publisherSpringer Natureen_US
dc.relation.isAvailableAtThe University of Maryland, Baltimore County (UMBC)
dc.relation.ispartofUMBC Information Systems Department Collection
dc.relation.ispartofUMBC Student Collection
dc.rightsThis version of the article has been accepted for publication, after peer review (when applicable) and is subject to Springer Nature’s AM terms of use, but is not the Version of Record and does not reflect post-acceptance improvements, or any corrections. The Version of Record is available online at: https://doi.org/10.1007/s11517-022-02522-2en_US
dc.rightsAccess to this item will begin on 02/24/2023
dc.titleImproving the generalization of unsupervised feature learning by using data from diferent sources on gene expression data for cancer diagnosisen_US
dc.typeTexten_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
MBEC.pdf
Size:
7.17 MB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2.56 KB
Format:
Item-specific license agreed upon to submission
Description: