一种基于线性规划的孤立点检测方法
A Linear Programming Framework for Outlier Detection
查看参考文献8篇
文摘
|
孤立点检测是数据挖掘中的重要问题,可以发现不具备一般特性的数据,进而发现潜在的有用信息。现有的孤立点检测算法对于孤立点组成小集群的情形,一般不能正确检出。针对这一问题,提出一种新的基于线性规划的孤立点检测方法,该方法基于一个简单的事实:紧邻的两个数据点,必然同时为孤立点或正常点。首先建立待检测数据点的图模型,通过构造顶点能量模型和边模型,建立孤立点检测问题的马尔科夫模型,之后通过求解线性规划问题,得到该模型的最优解,进而得到孤立点检测结果。最后,使用一个合成数据集和三个真实数据集进行实验,验证本文所提出的算法,实验结果表明,提出的算法对于普通数据集和含有孤立点组成小集群的数据集,都能够正确地检出,且具有较高的检测正确率。 |
其他语种文摘
|
Outlier detection is an important step in many data - mining applications. It can find patterns in data that do not conform to expected behavior, these nonconforming patterns can imply potentially useful information. The disadvantages of current methods are that if the data has outliers that form a small cluster, the technique fails to label them correctly. In this paper, we propose a new method for outlier detection. The essential idea behind this technique is that two neighbor data points must be normal points or outliers in the same time. The paper first create the graph model of the data points to be detected. By constructing energy model of vertices and edges, the Markov model for outlier detection problem is established, followed by solving a linear programming problem,the optimal solution of the model is obtained,and then outlier detection results are provided. Finally, the paper use a synthetic data set and three real data sets experiment to test the proposed algorithm, experiment results show that the proposed algorithm for ordinary data sets and the data sets containing small cluster of data sets are able to correctly detection, and has a high detection accuracy. |
来源
|
控制工程
,2013,20(6):1123-1126,1130 【核心库】
|
关键词
|
线性规划
;
孤立点检测
;
马尔科夫模型
|
地址
|
中国科学院沈阳自动化研究所, 辽宁, 沈阳, 110016
|
语种
|
中文 |
文献类型
|
研究性论文 |
ISSN
|
1671-7848 |
学科
|
自动化技术、计算机技术 |
基金
|
国家973计划
|
文献收藏号
|
CSCD:5010213
|
参考文献 共
8
共1页
|
1.
Chandola V. Anomaly de-tection : A survey.
ACM Computing Surveys,2009,41(3):1-58
|
被引
204
次
|
|
|
|
2.
Hido S. Statistical outlier detection using direct density ratio estimation.
Knowledge and Information Systems,2011,26(2):309-336
|
被引
6
次
|
|
|
|
3.
Solberg H E. Detection of outliers in reference distributions :performance of Horn's al-gorithm.
Clinical chemistry,2005,15(12):2326-2332
|
被引
11
次
|
|
|
|
4.
谢文阁. 一种改进的基于距离的孤立点挖掘算法的研究.
渤海大学学报(自然科学版),2011,32(2):157-161
|
被引
1
次
|
|
|
|
5.
Breunig M. Lof: identifying density-based local outliers.
Sig-mod Record,2000,29(2):93-104
|
被引
104
次
|
|
|
|
6.
张晓. 基于聚类和LOF算法的异常数据检测方法.
伊犁师范学院学报(自然科学版),2011,6(2):48-50
|
被引
1
次
|
|
|
|
7.
Wainwright M. Map estimation via agreement on trees: message-passing and linear programming.
IEEE Transactions on Information Theory,2005,51(11):3697-3717
|
被引
7
次
|
|
|
|
8.
Frank A.
UCI machine learning repository,2010
|
被引
84
次
|
|
|
|
|