帮助关于我们

返回检索结果

大规模数据下子抽样模型平均估计理论
Sub-Sampling Model Averaging Theory for Large Scale Data

查看参考文献29篇

宗先鹏 ¹ 王彤彤 ²

文摘	随着信息时代的来临,如何从海量数据中快速、有效地挖掘有用信息是目前面临的新挑战.子抽样方法作为大规模数据分析的有效工具,已经受到国内外学者的广泛关注.不过,传统的子抽样方法通常没有考虑到模型的不确定性.当模型假设不正确时,后面的统计推断将会出现偏差,甚至导致错误的结论.为了解决该问题,文章利用频率模型平均的方法构建了子抽样模型平均估计(简称SSMA估计).理论上,文章证明了SSMA估计是全部数据下模型平均估计的一个渐近无偏且相合的估计.另外,我们基于Hansen (2007)的Mallows模型平均方法提出了SSMA估计的权重选择准则,并证明了方差已知和未知时权重估计的渐近最优性.在这些理论性质的研究中,文章同时考虑了模型和抽样设计带来的双重随机性.最后,数值分析进一步说明了所提出方法的有效性.
其他语种文摘	With the development of information age, how to mine useful information from massive data quickly and effectively is a new challenge. As an effective tool for large scale data analysis, sub-sampling method has attracted extensive attention of scholars at home and abroad. However, the traditional sub-sampling method usually does not take into account the uncertainty of the model. When the assumed model is incorrect, the conclusions may be wrong. In order to solve this problem, a sub-sampling model averaging estimator (SSMA estimator) is constructed by the sampled data. Theoretically, we prove that the SSMA estimator is an asymptotically unbiased and consistent estimator of the model averaging estimator based on full data. In addition, we propose a weight choice criterion for the SSMA estimator, which is based on the Mallows' criterion proposed by Hansen (2007), and derive the asymptotic optimality of the weight estimator. It is worth mentioning that, in the proofs of these theoretical properties, we consider the double randomness brought by the model and sampling design. Finally, numerical analysis further shows the effectiveness of the proposed method.
来源	系统科学与数学 ,2022,42(1):109-132 【核心库】
关键词	大数据分析 ; 子抽样方法 ; 模型平均 ; Mallows准则 ; 渐近最优性
地址	1. 北京工业大学理学部, 北京, 100124 2. 首都师范大学数学科学学院, 北京, 100048
语种	中文
文献类型	研究性论文
ISSN	1000-0577
学科	数学
基金	北京市自然科学基金重点研究专项 ; 国家自然科学基金 ; 首都师范大学交叉科学研究院和生物统计交叉学科研究项目资助课题
文献收藏号	CSCD:7130543

参考文献共 29 共2页

引证文献 2 篇

1 黄淼淇均值模型中多变点问题的平均估计方法系统科学与数学,2023,43(9):2373-2387
CSCD被引 0 次

2 常宝群基于半参数混合效应模型的最优模型平均预测系统科学与数学,2023,43(9):2429-2450
CSCD被引 1 次

显示所有2篇文献

论文科学数据集

PlumX Metrics

相关文献
作者相关关键词相关参考文献相关

版权所有 ©2008 中国科学院文献情报中心制作维护：中国科学院文献情报中心
地址：北京中关村北四环西路33号邮政编码：100190 联系电话：(010)82627496 E-mail:cscd@mail.las.ac.cn 京ICP备05002861号-4 | 京公网安备11010802043238号