一种分布式用户浏览点击模型算法
A Distributed User Browse Click Model Algorithm
查看参考文献22篇
文摘
|
为从海量搜索点击日志中快速挖掘用户行为,提出一种分布式用户浏览点击模型(UBM)算法。原始 UBM 算法求出的检验度参数E 只与搜索结果文档所在排序位置以及上一文档的点击位置有关,且非常稳定,基于此特性,将EM 迭代求解转换为抽样估计检验度以求解吸引度的分布式UBM 算法。在Spark 数据平台上进行仿真,结果表明,与原始UBM 算法相比,该算法能够解决点击日志中存在的严重数据倾斜问题,且运行效率较高。 |
其他语种文摘
|
A distributed User Browse Click Model(UBM) algorithm is proposed to quickly mine user behavior from massive search click logs.The validation parameter E derived from the original UBM algorithm is only related to the ranking position of the search results and the click position of the previous document,and is very stable.Based on this characteristic,the EM iteration solution is transformed into a distributed UBM algorithm which estimates the test degree by sampling to solve the attraction degree.Results of simulation on Spark data platform show that compared with the original UBM algorithm,the proposed algorithm can solve the serious data skew problem in click log,and has higher efficiency. |
来源
|
计算机工程
,2019,45(3):1-6 【扩展库】
|
DOI
|
10.19678/j.issn.1000-3428.0050119
|
关键词
|
点击日志
;
点击模型
;
用户浏览点击模型算法
;
搜索引擎
;
Spark 平台
|
地址
|
1.
中国科学院计算机网络信息中心, 北京, 100190
2.
中国科学院大学, 北京, 100190
|
语种
|
中文 |
文献类型
|
研究性论文 |
ISSN
|
1000-3428 |
学科
|
自动化技术、计算机技术 |
基金
|
中国科学院信息化专项
|
文献收藏号
|
CSCD:6504267
|
参考文献 共
22
共2页
|
1.
Chuklin A. Click models for Web search.
Synthesis Lectures on Information Concepts,Retrieval, and Services,2015,7(3):110-115
|
CSCD被引
2
次
|
|
|
|
2.
Craswell N. An experimental comparison of click position-bias models.
Proceedings of 2008 International Conference on Web Search and Data Mining,2008:87-94
|
CSCD被引
1
次
|
|
|
|
3.
Dupret G E. A user browsing model to predict search engine click data from past observations.
Proceedings of International ACM SIGIR Conference on Research and Development in Information Retrieval,2008:331-338
|
CSCD被引
1
次
|
|
|
|
4.
Guo F. Efficient multiple-click models in Web search.
Proceedings of the 2nd ACM International Conference on Web Search and Data Mining,2009:124-131
|
CSCD被引
4
次
|
|
|
|
5.
Guo F. Click chain model in Web search.
Proceedings of International Conference on World Wide Web,2009:11-20
|
CSCD被引
2
次
|
|
|
|
6.
Chapelle O. A dynamic Bayesian network click model for Web search ranking.
Proceedings of International Conference on World Wide Web,2009:1-10
|
CSCD被引
2
次
|
|
|
|
7.
Skiera B. An analysis of the importance of the long tail in search engine marketing.
Electronic Commerce Research and Applications,2010,9(6):488-494
|
CSCD被引
1
次
|
|
|
|
8.
Dempster A P. Maximum Likelihood from incomplete data via the EM algorithm.
Journal of the Royal Statistical Society,1977,39(1):1-38
|
CSCD被引
975
次
|
|
|
|
9.
Zaharia M. Spark: cluster computing with working sets.
Proceedings of USENIX Conference on Hot Topics in Cloud Computing,2010:5-10
|
CSCD被引
1
次
|
|
|
|
10.
王超. 搜索引擎点击模型综述.
智能系统学报,2016,11(6):711-718
|
CSCD被引
1
次
|
|
|
|
11.
Richardson M. Predicting clicks: estimating the click-through rate for new ads.
Proceedings of International Conference on World Wide Web,2007:521-530
|
CSCD被引
1
次
|
|
|
|
12.
王爱平. EM算法研究与应用.
计算机技术与发展,2009,19(9):108-110
|
CSCD被引
14
次
|
|
|
|
13.
Shvachko K. The Hadoop distributed file system.
Proceedings of 2010 IEEE Symposium on Mass Storage Systems and Technologies,2010:1-10
|
CSCD被引
2
次
|
|
|
|
14.
Karau H.
Learning spark: lightning-fast big data analytics,2015
|
CSCD被引
1
次
|
|
|
|
15.
Farook S. Spark is superior to map reduce over big data.
International Journal of Computer Applications,2016,133:13-16
|
CSCD被引
1
次
|
|
|
|
16.
Kwon Y.
A study of skew in mapreduce applications,2018
|
CSCD被引
1
次
|
|
|
|
17.
Rana N. Shuffle performance in apache Spark.
International Journal of Engineering Research and Technology,2015,4(2):177-180
|
CSCD被引
2
次
|
|
|
|
18.
Akbarinia R. An efficient solution for processing skewed mapreduce jobs.
Database and Expert Systems Applications,2015,9262:417-429
|
CSCD被引
1
次
|
|
|
|
19.
Tang Z. A data skew oriented reduce placement algorithm based on sampling.
IEEE Transactions on Cloud Computing,2016,15(6):13-16
|
CSCD被引
1
次
|
|
|
|
20.
Liu G. SP-partitioner: a novel partition method to handle intermediate data skew in Spark streaming.
Future Generation Computer Systems,2017,86:1054-1063
|
CSCD被引
2
次
|
|
|
|
|