邱 纯,马巧蓉,赵曼曼,苏 强,钟美佐.基于CFS-mRMR特征筛选方法和Adaboost算法的胶质瘤
相关基因筛选及预测模型的建立[J].现代生物医学进展英文版,2019,19(1):26-30. |
基于CFS-mRMR特征筛选方法和Adaboost算法的胶质瘤
相关基因筛选及预测模型的建立 |
Study of Classification of Gliomas PredictionBased on Machine Learning Method |
Received:May 14, 2018 Revised:June 12, 2018 |
DOI:10.13241/j.cnki.pmb.2019.01.006 |
中文关键词: 胶质瘤 特征筛选 差异基因 Adaboost |
英文关键词: Gliomas Feature selection Adaboost Differential express genes |
基金项目: |
Author Name | Affiliation | E-mail | QIU Chun | 1 XiangYa Hospital of South University, Changsha, Hunan, 410008, China 2 Hainan Provincial Peopl's Hospital, Haikou, Hainan, 570311, China | 139762421217@139.com | MA Qiao-rong | Clinical Laboratory, Affiliated Minzu Hospital Of Guangxi Medical University, Nanning, Guangxi, 530001, China | | ZHAO Man-man | Shanghai Key Laboratory of Bio-Energy Crops, College of Life Science, Shanghai University, Shanghai, 200444, China | | SU Qiang | Shanghai Key Laboratory of Bio-Energy Crops, College of Life Science, Shanghai University, Shanghai, 200445, China | | ZHONG Mei-zuo | XiangYa Hospital of South University, Changsha, Hunan, 410008, China | |
|
Hits: 1391 |
Download times: 1393 |
中文摘要: |
摘要 目的:找出胶质瘤病变发生机制相关的基因群,并在此基础上建立预测胶质瘤病变发生的预测模型。方法:收集GEO中胶质瘤芯片数据,使用关联特征选择(Correlation-based Feature Subset, CFS)和最小冗余最大相关性(Minimum Redundancy Maximum Relevance, mRMR)特征选择方法筛选出差异基因,分析这些差异基因的功能,然后使用Adaboost算法建立胶质瘤的预测模型,并对模型的预测能力进行评估。结果:通过特征筛选,得到了19个和胶质瘤病变相关的的基因;以该19个基因建组成特征子集,结合AdaBoost算法建立了胶质瘤的预测模型,经验证,模型的预报准确率可以达到95.59 %。通过对19个差异基因的GO和KEGG分析,发现这些基因和肿瘤的发生发展有一定作用。结论:CFS-mRMR特征筛选方法可以有效地发现与胶质瘤疾病有关的基因,所筛选的19个差异基因具有生物学意义,且以此构建的胶质瘤预测模型,可以有效地对预测胶质瘤的发生。 |
英文摘要: |
ABSTRACT Objective: This study aims to search the genes related to the mechanisms of occurrences of glioma, and try to build the prediction model of glioma. Methods: In this article, the data were collected from GEO database, and the prediction model of gliomas was studied using the mRMR and correlation-based feature subset (CfsSubset)-Adaboost method. Results:After feature selection,19 genes related to the mechanisms of occurrences of glioma were obtained. Based on the 19 genes, a prediction model based on Adaboost were built, which could be applied to predict the occurrence of glioma. The prediction model yields an accuracy rate of 95.59% for the 10-folds cross validation test. T EGFR and MAD2L1 were found related to gliomas based on GO and KEGG analysis. Conclusion: CFS-mRMR is an efficient feature selection method on searching the key genes correlated to gliomas, which also could be employed to build prediction model. |
View Full Text
View/Add Comment Download reader |
Close |
|
|
|