基于加权随机森林的三阴性乳腺癌microRNA组学数据的分类预测
- Title:
-
Classification prediction of triple negative breast cancer based on weighted random forest on microRNA omics data
- 文献标志码:
- A
- 摘要:
-
目的 基于microRNA组学数据,探讨加权随机森林在三阴性乳腺癌分类预测的应用,为疾病诊断提供方法学支撑。方法 以TCGA乳腺癌数据为例,采用加权随机森林建立三阴性乳腺癌的分类预测模型,并与随机森林、logistic回归、支持向量机、LASSO和岭回归五种模型进行比较分析。结果 通过比较六种模型的5个评价指标,加权随机森林模型的预测性能明显优于其他五种机器学习算法,加权随机森林模型的灵敏度为0.852、特异度为0.873、准确度为0.871、AUC值为0.862和G-menas值为0.861。结论 加权随机森林建立的分类预测模型较好地识别了三阴性乳腺癌患者,可为三阴性乳腺癌的诊断提供方法学上的参考。
- Abstract:
-
Objective Based on microRNA omics data, this study explored the application of weighted random forest in the classification prediction of triple negative breast cancer, providing methodological support for disease diagnosis. Methods Take the TCGA breast cancer data as an example, the weighted random forest was used to establish the prediction classification prediction model, and compared with five machine learning (random forest, logistic regression, support vector machine, LASSO and ridge regression). Result By comparing five evaluation indexes of six models, the classification performance of the weighted random forest model was significantly better than the other five machine learning algorithms. The sensitivity, specificity , accuracy, AUC and G -means of weighted random forest model were 0.852, 0.873 , 0.871, 0.862 and 0.861, respectively. Conclusion The classification prediction model established by weighted random forest can identify the patients with triple negative breast cancer well, which can provide a methodological reference for the diagnosis of triple negative breast cancer