文摘
|
传统关联规则挖掘方法通常产生海量杂乱的规则,它们对用户而言是冗余的.为解决该问题,文中提出一种基于信息熵的兴趣度规则挖掘算法.通过变量相关性分析剔除原始规则集中虚假、错误的规则,并在信息熵的基础上提出度量关联规则兴趣度的框架.该算法不依赖用户先验知识,能无偏地表达数据包含的信息.在真实和仿真数据集上的实验验证该算法能有效挖掘兴趣度规则,且性能比传统算法更优. |
其他语种文摘
|
With the development of data collection and storage techniques, excessive and unorderly rules are generated by traditional association rule mining, which can not meet interest of users. To solve this problem, an interestingness measure of association rules based on information entropy is proposed to mine interestingness association rules. Correlation analysis for categorical variables is adopted to eliminate false and erroneous rules from the primitive set, and a framework for evaluating the interestingness degree of rules based on information entropy is proposed. Since the method does not depend on the prior knowledge of users, it can represent the information hidden in the data accurately. Simulation results on both real and synthetic datasets show that the proposed algorithm performs better than the traditional algorithms, and it discovers interestingness rules from large database efficiently. |
来源
|
模式识别与人工智能
,2014,27(6):524-532 【核心库】
|
关键词
|
知识发现
;
关联规则
;
兴趣度度量
;
信息熵
|
地址
|
中国科学院合肥智能机械研究所仿生计算与智能决策实验室, 合肥, 230031
|
语种
|
中文 |
文献类型
|
研究性论文 |
ISSN
|
1003-6059 |
学科
|
自动化技术、计算机技术 |
基金
|
国家自然科学基金项目
|
文献收藏号
|
CSCD:5184596
|