帮助 关于我们

返回检索结果

一种分布式用户浏览点击模型算法
A Distributed User Browse Click Model Algorithm

查看参考文献22篇

张浩盛伦 1,2   李翀 1 *   柯勇 1   张士波 1  
文摘 为从海量搜索点击日志中快速挖掘用户行为,提出一种分布式用户浏览点击模型(UBM)算法。原始 UBM 算法求出的检验度参数E 只与搜索结果文档所在排序位置以及上一文档的点击位置有关,且非常稳定,基于此特性,将EM 迭代求解转换为抽样估计检验度以求解吸引度的分布式UBM 算法。在Spark 数据平台上进行仿真,结果表明,与原始UBM 算法相比,该算法能够解决点击日志中存在的严重数据倾斜问题,且运行效率较高。
其他语种文摘 A distributed User Browse Click Model(UBM) algorithm is proposed to quickly mine user behavior from massive search click logs.The validation parameter E derived from the original UBM algorithm is only related to the ranking position of the search results and the click position of the previous document,and is very stable.Based on this characteristic,the EM iteration solution is transformed into a distributed UBM algorithm which estimates the test degree by sampling to solve the attraction degree.Results of simulation on Spark data platform show that compared with the original UBM algorithm,the proposed algorithm can solve the serious data skew problem in click log,and has higher efficiency.
来源 计算机工程 ,2019,45(3):1-6 【扩展库】
DOI 10.19678/j.issn.1000-3428.0050119
关键词 点击日志 ; 点击模型 ; 用户浏览点击模型算法 ; 搜索引擎 ; Spark 平台
地址

1. 中国科学院计算机网络信息中心, 北京, 100190  

2. 中国科学院大学, 北京, 100190

语种 中文
文献类型 研究性论文
ISSN 1000-3428
学科 自动化技术、计算机技术
基金 中国科学院信息化专项
文献收藏号 CSCD:6504267

参考文献 共 22 共2页

1.  Chuklin A. Click models for Web search. Synthesis Lectures on Information Concepts,Retrieval, and Services,2015,7(3):110-115 CSCD被引 2    
2.  Craswell N. An experimental comparison of click position-bias models. Proceedings of 2008 International Conference on Web Search and Data Mining,2008:87-94 CSCD被引 1    
3.  Dupret G E. A user browsing model to predict search engine click data from past observations. Proceedings of International ACM SIGIR Conference on Research and Development in Information Retrieval,2008:331-338 CSCD被引 1    
4.  Guo F. Efficient multiple-click models in Web search. Proceedings of the 2nd ACM International Conference on Web Search and Data Mining,2009:124-131 CSCD被引 4    
5.  Guo F. Click chain model in Web search. Proceedings of International Conference on World Wide Web,2009:11-20 CSCD被引 2    
6.  Chapelle O. A dynamic Bayesian network click model for Web search ranking. Proceedings of International Conference on World Wide Web,2009:1-10 CSCD被引 2    
7.  Skiera B. An analysis of the importance of the long tail in search engine marketing. Electronic Commerce Research and Applications,2010,9(6):488-494 CSCD被引 1    
8.  Dempster A P. Maximum Likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society,1977,39(1):1-38 CSCD被引 975    
9.  Zaharia M. Spark: cluster computing with working sets. Proceedings of USENIX Conference on Hot Topics in Cloud Computing,2010:5-10 CSCD被引 1    
10.  王超. 搜索引擎点击模型综述. 智能系统学报,2016,11(6):711-718 CSCD被引 1    
11.  Richardson M. Predicting clicks: estimating the click-through rate for new ads. Proceedings of International Conference on World Wide Web,2007:521-530 CSCD被引 1    
12.  王爱平. EM算法研究与应用. 计算机技术与发展,2009,19(9):108-110 CSCD被引 14    
13.  Shvachko K. The Hadoop distributed file system. Proceedings of 2010 IEEE Symposium on Mass Storage Systems and Technologies,2010:1-10 CSCD被引 2    
14.  Karau H. Learning spark: lightning-fast big data analytics,2015 CSCD被引 1    
15.  Farook S. Spark is superior to map reduce over big data. International Journal of Computer Applications,2016,133:13-16 CSCD被引 1    
16.  Kwon Y. A study of skew in mapreduce applications,2018 CSCD被引 1    
17.  Rana N. Shuffle performance in apache Spark. International Journal of Engineering Research and Technology,2015,4(2):177-180 CSCD被引 2    
18.  Akbarinia R. An efficient solution for processing skewed mapreduce jobs. Database and Expert Systems Applications,2015,9262:417-429 CSCD被引 1    
19.  Tang Z. A data skew oriented reduce placement algorithm based on sampling. IEEE Transactions on Cloud Computing,2016,15(6):13-16 CSCD被引 1    
20.  Liu G. SP-partitioner: a novel partition method to handle intermediate data skew in Spark streaming. Future Generation Computer Systems,2017,86:1054-1063 CSCD被引 2    
引证文献 1

1 宋匡时 一个轻量级分布式机器学习系统的设计与实现 计算机工程,2020,46(1):201-207
CSCD被引 0 次

显示所有1篇文献

论文科学数据集
PlumX Metrics
相关文献

 作者相关
 关键词相关
 参考文献相关

版权所有 ©2008 中国科学院文献情报中心 制作维护:中国科学院文献情报中心
地址:北京中关村北四环西路33号 邮政编码:100190 联系电话:(010)82627496 E-mail:cscd@mail.las.ac.cn 京ICP备05002861号-4 | 京公网安备11010802043238号