|
基于近邻噪声处理的KNN缺失数据填补算法
Predicting Missing Values with KNN Based on the Elimination of Neighbor Noise
查看参考文献10篇
文摘
|
在优化算法的研究中,针对KNN算法对缺失数据的填补效果会因为原始数据中存在噪声而受到严重影响的问题,根据待填补缺失数据最近邻的近邻关系,提出了一种新的缺失数据填补算法——ENN-KNN(Eliminate Neighbor Noise k-Nearest Neighbor) 。通过比较待填补缺失数据每个最近邻的真实近邻程度能够有效地识别潜在的噪声最近邻。最后使用所有非噪声最近邻对待填补缺失数据进行填补,从而消除了噪声最近邻对填补结果的影响。通过观察四组UCI数据集的仿真结果,可知ENN-KNN算法的填补准确性总体上要优于KNN算法。 |
其他语种文摘
|
Traditional KNN imputation method for dealing with missing data is severely affected by the noise in the original data. This paper presents a novel imputation method for dealing with missing data,which is based on the relationship of nearest neighbors of missing data——ENN-KNN(Eliminate Neighbor Noise k-Nearest Neighbor). ENN -KNN imputation method can effectively identify potential noise nearest neighbor by comparing each real nearest degree of nearest neighbor of missing data. It uses all nearest neighbors which are not noise nearest neighbor to deal with missing data, for this reason it can eliminate the effect of noise nearest neighbor for dealing with missing data. The experiment results of four groups of UCI data sets show that the ENN-KNN imputation method is overall superior to KNN imputation method on the performance of prediction accuracy. |
来源
|
计算机仿真
,2014,31(7):264-268 【扩展库】
|
关键词
|
缺失数据填补
;
近邻
;
噪声最近邻
|
地址
|
中国科学院沈阳自动化研究所, 辽宁, 沈阳, 110016
|
语种
|
中文 |
文献类型
|
研究性论文 |
ISSN
|
1006-9348 |
学科
|
自动化技术、计算机技术 |
基金
|
北京市自然科学基金
|
文献收藏号
|
CSCD:5206116
|
参考文献 共
10
共1页
|
1.
郭志懋. 数据质量和数据清洗研究综述.
软件学报,2002,13(11):2076-2082
|
被引
71
次
|
|
|
|
2.
李文杰. 基于k-近邻算法的定位方法设计和仿真.
计算机仿真,2009,26(4):194-196
|
被引
6
次
|
|
|
|
3.
Keerin Phimmarin. Cluster-based KNN missing value imputation for DNA microarray data.
Proceedings of IEEE International Conference on Systems, Man and Cybernetics,2012:445-450
|
被引
1
次
|
|
|
|
4.
Liu Huawen. Noisy Data Elimination Using Mutual K-Nearest Neighbor for Classification Mining.
The Journal of Systems and Software,2012,85(5):1067-1074
|
被引
7
次
|
|
|
|
5.
Zhang Shichao. Nearest neighbor selection for iteratively KNN imputation.
The Journal of Systems and Software,2012,85(11):2541-2552
|
被引
7
次
|
|
|
|
6.
潘章明. 基于共享反K近邻的局部离群点检测算法.
计算机仿真,2012,30(2):269-273
|
被引
1
次
|
|
|
|
7.
李稚楹. PageRank算法研究综述.
计算机科学,2011,38(10):185-188
|
被引
8
次
|
|
|
|
8.
Yuan Wei. An improved KNN method and its applica tion to tumor diagnosis.
Proceedings of the Third International Conference on Machine Learning and Cybernetics,2004:2836-2841
|
被引
1
次
|
|
|
|
9.
de Franca F O. Predicting missing values with biclustering : A coherence-based approach.
Pattern Recognition,2013,46:1255-1266
|
被引
1
次
|
|
|
|
10.
王改堂. 基于多K最近邻回归算法的软测量模型.
信息与控制,2011,40(5):639-645
|
被引
4
次
|
|
|
|
|
|