引用本文:刘友权,李坤,唐永帆,吴文刚,王道成,张燕,等. 基于随机森林与多元线性回归的咪唑啉衍生物缓蚀剂的构效关系研究[J]. 石油与天然气化工, 2019, 48(1): 62-67.
1.中国石油西南油气田公司天然气研究院 ;2.四川大学化学学院
以15种不同十一烷基咪唑啉衍生物缓蚀剂为研究对象,采用随机森林与多元线性回归相结合考察了分子结构对其缓蚀效率(IE)的影响。首先,从能量、电荷、分子表面与信息量、立体构象与拓扑特征6个方面对15种咪唑啉类缓蚀剂进行了全面表征,共得到55个分子结构参数;然后运用随机森林(RF)与多元线性回归(MLR)分别对特征参数进行了优化,筛选出top10的8个重叠参数。从8个参数中随机挑选3个,使用留一法(LOO)进行多元线性回归模型构建,最终得到了最优的特征组合为分子总能量(Te)、信息含量(Ic)与分子折射率(Mr)。 基于此得到的最优定量结构-缓蚀效率关系模型,其相关系数R2为0.843,关系式表示为IE=-5.517-0.010 1×Te+15.601 7×Ic+0.222×Mr。考察样本后去掉一个奇异样本,其相对误差达到18.9%,剩余14个样本留一法建模,模型效果大大提高,其R2为0.911。结果 表明,Te、Ic、Mr与缓蚀效率具有较高的正相关性,分子结构越稳定、对称性好及折射率高,则其IE值就越高,为设计新型高效的缓蚀剂提供了理论指导。 
关键词:  咪唑啉衍生物缓蚀剂  定量结构-缓蚀效率关系  立体结构参数  随机森林  多元线性回归
Relationship modeling on quantitative structure-inhibitive efficiency of imidazoline inhibitors by combining random forest and multiple linear regression
Liu Youquan1, Li Kun2, Tang Yongfan1, Wu Wengang1, Wang Daocheng1, Zhang Yan1, Sun Chuan1
1. Research Institute of Natural Gas Technology, PetroChina Southwest Oil & Gasfield Company, Chengdu, Sichuan, China;2. College of Chemistry, Sichuan University, Chengdu, Sichuan, China
Focusing on 15 different undecyl imidazoline corrosion inhibitors, a new method of combining random forest (RF) and multiple linear regression (MLR) was proposed to investigate the quantitative structure-inhibitive efficiency (IE) relationship. First, 15 corrosion inhibitors were comprehensively characterized by six aspects, which include energy, charge, molecular surface and information content, spatial and topological features, and 55 molecular structural features were achieved. Then RF and MLR were respectively employed to optimize these 55 features, so 8 overlapped parameters were selected from the top ten. Only 3 from 8 optimal features were randomly selected to construct the MLR model between the relationship of structure-IE. The optimal combination of features were molecular total energy (Te), information content (Ic) and molecular refractive index (Mr). Based on this, the optimal model of quantitative structure-inhibitive efficiency (IE) relationship was obtained, the correlation coefficient (R2) is 0.843, the relational expression is IE=-5.517-0.010 1Te+15.601 7Ic+0.222Mr. A singular sample was removed after the investigation of samples, its relative error reached 18.9%. The remaining 14 samples were modeled, the performance of the model was obviously further improved with the R2 of 0.911. The results indicate that Te, Ic and Mr all show the high positive correlation with IE. When the molecular structure is more stable, the symmetry is good, and the refractive index is high, then the IE value is higher. The model may be used as a theoretical reference for the design of new corrosion inhibitors.
Key words:  imidazoline derivative corrosion inhibitor  quantitative structure-inhibitive efficiency relationship  3-D structural features  random forest  multiple linear regression