抗癌候选药物ERα抑制剂活性预测
Activity prediction of anti-cancer drug candidate ERα inhibitor
查看参考文献20篇
文摘
|
乳腺癌是目前威胁全球女性健康最常见的恶性肿瘤.本研究通过统计分析并采用随机森林方法,确定了雌激素受体α亚型(estrogen receptor alpha subtype,ERα)在乳腺的发育过程中起着重要的作用,被视为乳腺癌治疗的重要靶标,拮抗ERα活性的化合物可作为乳腺癌治疗的候选药物.为有效预测小样本、多特征条件下的乳腺癌治疗靶标ERα的化合物生物活性,提出一种抗乳腺癌药物定量结构-活性关系的集成机器学习预测模型,称为Mul-BHO-Bi-LSTM(multivariate-Bayesian hyperparametric optimization bi-directional long short-term memory)模型.对1 974个化合物的729个分子描述符信息进行描述性统计和多重共线性诊断,采用随机森林方法,筛选20个显著变量的重要性评分大于0.01的变量.构建基于卷积神经网络的二维特征矩阵,采用贝叶斯超参数优化方法,对双向长短期记忆(bi-directional long short-term memory,Bi-LSTM)模型进行超参数寻优.对模型的预测效果进行分析和评价,结果显示,相比梯度提升决策树(gradient boosting decision tree,GBDT)集成学习方法,Mul-BHO-Bi-LSTM模型的预测效果较优,模型误差相关指标均方误差、归一化均方根误差、误差平均值、误差标准差均小于0.15,关联指标R~2和r达0.99以上,表明Mul-BHO-Bi-LSTM的集成机器学习预测模型具有较好鲁棒性和泛化性.该预测模型可为抗乳腺癌药物的筛选与设计提供方法. |
其他语种文摘
|
Breast cancer is the most common malignancy which threats the women's health worldwide. Studies have revealed that the estrogen receptor alpha subtype(ERα) plays an important role in breast development and is considered as an important target for breast cancer treatment. Compounds that can antagonize ERα activity may be candidates for breast cancer treatment. A quantitative structure-activity relationship prediction model is proposed to predict the bioactivity of compounds that can be applied to anti-breast cancer drugs under small samples and multicharacteristic conditions. First, the descriptive statistics and multicollinearity diagnosis are performed on the information of 729 molecular descriptors of 1 974 compounds, and the random forest method is used to screen 20 significant variables with variable importance measure that is greater than 0.01. Then, a CNN-based twodimensional feature matrix is constructed, and a Bayesian hyperparametric optimization(BHO) method is used to perform hyperparametric optimization of the Bi-LSTM model. Finally, the prediction effect of model is analyzed and evaluated. The results show that compared with the GBDT integrated learning method, the prediction effect of Mul-BHO-Bi-LSTM integrated machine learning prediction model is better, and the model error indexes MSE, NRMSE, error mean, and error std are less than 0.15, and the correlated indicators R~2 and r are above 0.99, indicating that the integrated machine learning predictionmodel of Mul-BHO-Bi-LSTM has the good robustness and generalization, and the model can provide a method for the screening and design of anti-breast cancer drugs. |
来源
|
深圳大学学报. 理工版
,2022,39(5):529-537 【核心库】
|
DOI
|
10.3724/SP.J.1249.2022.05529
|
关键词
|
计算机应用
;
集成学习
;
生物活性预测
;
特征筛选
;
超参数优化
;
随机森林
|
地址
|
1.
昆明理工大学交通工程学院, 云南, 昆明, 650504
2.
温州医科大学第一临床医学院, 浙江, 温州, 325006
|
语种
|
中文 |
文献类型
|
研究性论文 |
ISSN
|
1000-2618 |
学科
|
药学;自动化技术、计算机技术 |
基金
|
国家自然科学基金资助项目
|
文献收藏号
|
CSCD:7302259
|
参考文献 共
20
共1页
|
1.
刘宗超. 2020全球癌症统计报告解读.
肿瘤综合治疗电子杂志,2021,7(2):1-13
|
CSCD被引
199
次
|
|
|
|
2.
孙少康. 生物活性多糖抗乳腺癌作用研究进展.
世界中医药,2021,16(18):2798-2805
|
CSCD被引
4
次
|
|
|
|
3.
Kidera A. Statistical analysis of the physical properties of the 20 naturally occurring amino acids.
Journal of Protein Chemistry,1985,4(1):23-55
|
CSCD被引
22
次
|
|
|
|
4.
王青艳. 药物分子设计中定量结构-活性关系计算方法的研究.
广西科学,2014,21(1):6-11
|
CSCD被引
2
次
|
|
|
|
5.
Lavecchia A. Machine-learning approaches in drug discovery: methods and applications.
Drug Discovery Today,2014,20(3):318-331
|
CSCD被引
7
次
|
|
|
|
6.
Stephenson N. Survey of machine learning techniques in drug discovery.
Current Drug Metabolism,2019,20(3):185-193
|
CSCD被引
3
次
|
|
|
|
7.
黄斌. 基于支持向量学习机预测药物透血脑屏障的活性.
计算机与应用化学,2009,26(2):188-190
|
CSCD被引
4
次
|
|
|
|
8.
Sardari S. Artificial neural network modeling of antimycobacterial chemical space to introduce efficient descriptors employed for drug design.
Chemometrics and Intelligent Laboratory Systems,2014,130:151-158
|
CSCD被引
1
次
|
|
|
|
9.
Dutt R. Development and application of novel molecular descriptors for predicting biological activity.
Medicinal Chemistry Research,2017,26(9):1988-2006
|
CSCD被引
1
次
|
|
|
|
10.
陆家兴. 基于LINCS-L1000扰动信号通过SAE-XGBoost算法预测药物诱导下的细胞活性.
生物工程学报,2021,37(4):1346-1359
|
CSCD被引
2
次
|
|
|
|
11.
Bergstra J. Random search for hyperparameter optimization.
Journal of Machine Learning Research,2012,13(1):281-305
|
CSCD被引
154
次
|
|
|
|
12.
李玉娟. 基于改进粒子群算法的深度学习超参数优化方法.
信息通信,2020(1):52-53,55
|
CSCD被引
1
次
|
|
|
|
13.
Wu Jia. Hyperparameter Optimization for Machine Learning Models Based on Bayesian Optimization.
Journal of Electronic Science & Technology,2019,17(1):26-40
|
CSCD被引
20
次
|
|
|
|
14.
朱钰. 统计学意义下的多重共线性检验方法.
统计与决策,2020,36(7):34-36
|
CSCD被引
16
次
|
|
|
|
15.
Breiman L. Random forests, machine learning 45.
Journal of Clinical Microbiology,2001,2:199-228
|
CSCD被引
21
次
|
|
|
|
16.
魏腾飞. 基于改进PSO优化LSTM网络的短期电力负荷预测.
系统仿真学报,2021,33(8):1866-1874
|
CSCD被引
12
次
|
|
|
|
17.
尹诗. 风电机组发电机前轴承健康度预测方法及实现.
系统仿真学报,2021,33(6):1323-1333
|
CSCD被引
6
次
|
|
|
|
18.
周飞燕. 卷积神经网络研究综述.
计算机学报,2017,40(6):1229-1252
|
CSCD被引
635
次
|
|
|
|
19.
Shahriari B. Taking the human out of the loop: a review of Bayesian optimization.
Proceedings of the IEEE,2015,104(1):148-175
|
CSCD被引
95
次
|
|
|
|
20.
Friedman J H. Greedy function approximation: a gradient boosting machine.
Annals of Statistics,2001,29(5):1189-1232
|
CSCD被引
692
次
|
|
|
|
|