帮助关于我们

返回检索结果

基于Hadoop 平台的相关性权重算法设计与实现
Design and Implementation of Correlation Weight Algorithm Based on Hadoop Platform

查看参考文献17篇

高军黄献策

文摘	传统TF-IDF 算法仅从词频与逆向文档频率的角度计算关键词与文档之间的相关性权重,忽略了用户兴趣对权重计算的影响。为此,以满足用户信息检索目的为研究背景,提出一种基于日志关联的相关性权重算法。从面向用户相关性的角度出发,通过分析用户的搜索日志建立用户兴趣模型,并结合分布式计算的思想,运用 MapReduce 编程框架实现计算任务的并行化处理。实验结果表明,该算法在处理海量数据时,不仅能够提高算法效率,而且可以根据用户的历史检索记录动态地改变检索词的权重,提升用户与系统的交互能力。
其他语种文摘	The traditional TF-IDF algorithm calculates the correlation weights between keywords and documents only by using the perspective of word frequency and reverse document frequency,which ignoes the influence of user interest on weight calculation.In order to meet the purpose of user information retrieval,a correlation weight algorithm based on journal association is proposed.From the perspective of user-oriented comelation,the user interest model is built by analyzing the user's search journal,and combined with the idea of distributed computing,the MapReduce programming framework is used to realize the parallel processing of computing tasks.Experimental results show that it can not only improve the efficiency of the algorithm when dealing with massive data,but also dynamically change the weight of retrieval word according to the user's historical retrieval records,so as to enhance the interaction ability between users and the system.
来源	计算机工程 ,2019,45(3):26-31 【扩展库】
DOI	10.19678/j.issn.1000-3428.0049976
关键词	分布式计算 ; TF-IDF 算法 ; 日志 ; 兴趣模型 ; 信息检索
地址	上海海事大学信息工程学院, 上海, 201306
语种	中文
文献类型	研究性论文
ISSN	1000-3428
学科	自动化技术、计算机技术
基金	国家自然科学基金 ; 上海海事大学研究生创新基金
文献收藏号	CSCD:6504271

参考文献共 17 共1页

引证文献 1 篇

1 李文信一种近似最小有效瓶颈优先的Coflow调度机制计算机工程,2019,45(10):19-25,32
CSCD被引 0 次

显示所有1篇文献

论文科学数据集

PlumX Metrics

相关文献
作者相关关键词相关参考文献相关

版权所有 ©2008 中国科学院文献情报中心制作维护：中国科学院文献情报中心
地址：北京中关村北四环西路33号邮政编码：100190 联系电话：(010)82627496 E-mail:cscd@mail.las.ac.cn 京ICP备05002861号-4 | 京公网安备11010802043238号