基于多种相关性度量的特征选择方法研究
Feature Selection Algorithm Based on Multiple Correlation Measures
查看参考文献22篇
文摘
|
当前的数据挖掘和机器学习技术面临着大样本、高维度数据的挑战,使用特征选择方法作为重要的降维手段得到了极大的关注.然而,许多过滤式特征选择方法仅使用一种相关性度量去除冗余特征和不相关特征,并且没有考虑特征之间的交互性.因此,提出基于多种相关性度量的过滤式特征选择算法,另外,本文提出的算法也考虑了特征之间的交互性.该算法将转化为0-1标准形式的两种相关性度量进行融合,同时引入待选特征与已选特征的补充相关性因子解决特征之间的交互性.基于8个UCI数据集和3个常用分类器的实验验证了本文算法的有效性,同时与五种典型的过滤式特征选择方法相比,本文所提出的方法获得了更好的分类结果. |
其他语种文摘
|
Data mining and machine learning techniques are currently faced with challenges of large-sized and high-dimensional data. Using feature selection method as an important dimension reduction mean has attracted significant attention. However,many existing filter feature selection methods eliminate redundancy and irrelevance by using single correlation measure,and ignore the interaction between features. In this paper,the idea of using multiple correlation measures is adopted for filter feature selection,additionally,the proposed method is also take feature interaction into account. Two correlation measures that converted to the standard form of 0-1 are fused together in the proposed algorithm,while introducing an item to identify complementary correlation between candidate feature and selected features. To illustrate the effectiveness of the proposed method,experiments are developed based on three common classifiers and eight UCI datasets. Classification results verify the superiority of the proposed method compared with five representative filter feature selection methods. |
来源
|
小型微型计算机系统
,2017,38(4):696-700 【扩展库】
|
关键词
|
特征选择
;
过滤式
;
相关性
;
交互特征
|
地址
|
中国科学院沈阳自动化研究所, 中国科学院网络化控制系统重点实验室, 沈阳, 110016
|
语种
|
中文 |
文献类型
|
研究性论文 |
ISSN
|
1000-1220 |
学科
|
自动化技术、计算机技术 |
基金
|
辽宁省科技计划项目
|
文献收藏号
|
CSCD:5964404
|
参考文献 共
22
共2页
|
1.
Galelli S. An evaluation framework for input variable selection algorithms for environmental data-driven models.
Environmental Modelling & Software,2014,62:33-51
|
被引
4
次
|
|
|
|
2.
Zhang Xiangzhou. A causal feature selection algorithm for stock prediction modeling.
Neuro Computing,2014,142(1):48-69
|
被引
2
次
|
|
|
|
3.
Morgado P M. Minimal neighborhood redundancy maximal relevance: application to the diagnosis of Alzheimer' s disease.
Neuro Computing,2015,155:295-308
|
被引
2
次
|
|
|
|
4.
Zhang Huaguang. A comprehensive review of stability analysis of continuous-time recurrent neural networks.
IEEE Transactions on Neural Networks and Learning Systems,2014,25(7):1229-1262
|
被引
1
次
|
|
|
|
5.
吴佳. 一种无监督约简的浮选泡沫图像特征选择方法及应用.
信息与控制,2014,43(3):314-317
|
被引
6
次
|
|
|
|
6.
Isabelle Guyon. An introduction to variable and feature selection.
Journal of Machine Learning Research,2003,3(6):1157-1182
|
被引
13
次
|
|
|
|
7.
姚旭. 特征选择方法综述.
控制与决策,2012,27(2):161-167
|
被引
83
次
|
|
|
|
8.
Deisy C. A novel information theoreticinteract algorithm(IT-IN) for feature selection using three machine learning algorithms.
Expert Systems with Applications,2010,37(12):7589-7597
|
被引
1
次
|
|
|
|
9.
Veronica B C. Data classification using an ensemble of filters.
Neurocomputing,2014,135(135):13-20
|
被引
4
次
|
|
|
|
10.
Veronica B C. An ensemble of filters and classifiers for microarray data classification.
Pattern Recognition,2012,45(1):531-539
|
被引
4
次
|
|
|
|
11.
Jouni Pohjalainen. Feature selection methods and their combinations in high-dimensional classification of speaker likability, intelligibility and personality traits.
Computer Speech and Language,2015,29(1):145-171
|
被引
6
次
|
|
|
|
12.
Pradipta Maji. Rough set based maximum relevancemaximum significance criterion and gene selection from microarray data.
International Journal of Approximate Reasoning,2011,52(3):408-426
|
被引
7
次
|
|
|
|
13.
Aleks Jakulin. Analyzing attribute dependencies.
Computational Statistics and Data Analysis,1999,32(1):1-12
|
被引
1
次
|
|
|
|
14.
Gabor J S. Measuring and testing dependence by correlation of distances.
The Annals of Statistics,2007,35(6):2769-2794
|
被引
63
次
|
|
|
|
15.
Gabor J S. Brownian distance covariance.
The Annals of Applied Statistics,2009,3(4):1236-1265
|
被引
1
次
|
|
|
|
16.
Gavin Brown. Conditional likelihood maximisation: a unifying framework for information theoretic feature selection.
Journal of Machine Learning Research,2012,13:27-66
|
被引
1
次
|
|
|
|
17.
Usama M F. Multi-Interval discretization of continuousvalued attributes for classification learning.
Machine Learning,1993,5(9):1022-1027
|
被引
1
次
|
|
|
|
18.
Yu Lei. Efficient feature selection via analysis of relevance and redundancy.
Journal of Machine Learning Research,2004,5(10):1205-1224
|
被引
138
次
|
|
|
|
19.
Peng Hanchuan. Feature selection based on mutual information criteria of max-dependency,max-relevance, and min-redundancy.
IEEE Transactions on pattern analysis and machine intelligence,2005,27(8):1226-1238
|
被引
79
次
|
|
|
|
20.
Marko R S. Theoretical and empirical analysis of reliefF and RReliefF.
Machine Learning,2003,53(3):23-69
|
被引
4
次
|
|
|
|
|