帮助 关于我们

返回检索结果

大数据背景下的抽样调查
Sampling Survey in the Context of Big Data

查看参考文献67篇

金勇进 1,2,3   刘晓宇 2 *  
文摘 大数据具有体量大、种类丰富、增长速度快等特点,同时也存在价值密度低、代表性差等问题,为抽样调查带来了机遇与挑战.大数据背景下的抽样如何适应新的变化、具有怎样的发展和应用?文章从三个角度进行了讨论.一是在数据流环境下产生了一些适应性强的新型抽样方法,能够高效、准确地获得有代表性样本,并兼顾存储空间、处理的时间与能力.二是借助网络开展调查或进行社交网络数据的收集,发展出一些无抽样框的非概率抽样方法,能够以低廉的成本在短时间内获得大量分析样本.三是综合大数据与抽样调查的优势,进行线上、线下调查数据的融合,文章针对线上样本是非概率样本、线下样本是概率样本的情况,提出了融合的基本思路:一方面,通过概率样本对非概率样本进行“概率性检验”,另一方面,通过提取概率样本的信息,基于模型或基于伪随机化对总体进行推断.
其他语种文摘 Big data is characterized by large volume, rich types, and rapid growth, but it also has problems such as low value density and poor representativeness, which brings opportunities and challenges to sampling survey. In the context of big data, how does sampling survey adapt to new changes and what kind of development and application does it have? This paper discusses it from three perspectives. First, there are some new sampling methods with strong adaptability in the data stream environment, which can obtain representative samples efficiently and accurately, and take into account the storage space, processing time and ability. Secondly, some non-probability sampling methods without sampling frame have been developed by means of internet survey or social network data collection, which can obtain a large number of analysis samples in a short time at low cost. Third, the advantages of big data and sampling survey are integrated to integrate online and offline survey data. In the case that online sample is non-probability sample and offline sample is probability sample, this article puts forward the basic idea of data integration: On the one hand, probability samples are used to carry out the "probability test" for non-probability samples; on the other hand, the information of probability samples is extracted and make inferences based on model or pseudo-randomization.
来源 系统科学与数学 ,2022,42(1):2-16 【核心库】
关键词 大数据 ; 抽样调查 ; 数据流 ; 非概率抽样 ; 数据融合
地址

1. 中国人民大学应用统计科学研究中心, 北京, 100872  

2. 中国人民大学统计学院, 北京, 100872  

3. 中国人民大学调查技术研究所, 北京, 100872

语种 中文
文献类型 研究性论文
ISSN 1000-0577
学科 数学
文献收藏号 CSCD:7130535

参考文献 共 67 共4页

1.  Viktor M S. Big Data: A Revolution That Will Transform How We Live Work and Think,2013 CSCD被引 1    
2.  Harford T. Big data: A big mistake?. Significance,2014,11(5):14-19 CSCD被引 3    
3.  Tufekci Z. Big questions for social media big data: Representativeness, validity and other methodological pitfalls. International AAAI Conference on Weblogs and Social Media,2014:505-514 CSCD被引 1    
4.  Japec L. Big data in survey research: AAPOR task force report. Public Opinion Quarterly,2015,79(4):839-880 CSCD被引 1    
5.  金勇进. 大数据背景下非概率抽样的统计推断问题. 统计研究,2016,33(3):11-17 CSCD被引 4    
6.  Nagler J. Drawing inferences and testing theories with big data. Political Science & Politics,2015,48(1):84-88 CSCD被引 1    
7.  Bifet A. Mining big data in real time. Informatica,2013,37(1):15-20 CSCD被引 6    
8.  Fan W. Mining big data: Current status, and forecast to the future. ACM SIGKDD Explorations Newsletter,2013,14(2):1-5 CSCD被引 6    
9.  耿直. 大数据时代统计学面临的机遇与挑战. 统计研究,2014,31(1):5-9 CSCD被引 6    
10.  McLeod A. A convenient algorithm for drawing a simple random sample. Journal of the Royal Statistical Society Series C Applied Statistics,1983,32:182-184 CSCD被引 1    
11.  Vitter J S. Random sampling with a reservoir. ACM Transactions on Mathematical Software,1985,11(1):37-57 CSCD被引 48    
12.  Park B H. Reservoir-based random sampling with replacement from data stream. SIAM International Conference on Data Mining,2004 CSCD被引 1    
13.  Efraimidis P. Weighted random sampling with a reservoir. Information Processing Letters,2006,97:181-185 CSCD被引 16    
14.  Al-Kateb M. Stratified reservoir sampling over heterogeneous data streams. Information Systems,2010,39:621-639 CSCD被引 1    
15.  Mohammad M S. A survey of data partitioning and sampling methods to support big data analysis. Big Data Mining and Analytics,2020,3(2):3-19 CSCD被引 16    
16.  Yan T. Bayesian network structure learning from big data: A reservoir sampling based ensemble method. International Conference on Database Systems for Advanced Applications,2016 CSCD被引 1    
17.  Chris K. Imbalanced continual learning with partitioning reservoir sampling. 16th European Conference on Computer Science,2020:411-428 CSCD被引 1    
18.  Cheng K. Hot spot tracking by time-decaying bloom filters and reservoir sampling. 33rd International Conference on Advanced Information Networking and Applications,2020:1147-1156 CSCD被引 1    
19.  Schonlau M. Options for conducting web surveys. Statistical Science,2017,33(2):279-292 CSCD被引 1    
20.  Elliott M R. Inference for nonprobability samples. Statistical Science,2017,33(2):249-264 CSCD被引 3    
引证文献 2

1 赵冰悦 数字健康素养量表的汉化、修订及信效度检验 中华护理教育,2024,21(1):89-95
CSCD被引 2

2 陈茜儒 大数据背景下贝叶斯模型平均辅助抽样估计方法研究 系统科学与数学,2025,45(4):1255-1278
CSCD被引 0 次

显示所有2篇文献

论文科学数据集
PlumX Metrics
相关文献

 作者相关
 关键词相关
 参考文献相关

版权所有 ©2008 中国科学院文献情报中心 制作维护:中国科学院文献情报中心
地址:北京中关村北四环西路33号 邮政编码:100190 联系电话:(010)82627496 E-mail:cscd@mail.las.ac.cn 京ICP备05002861号-4 | 京公网安备11010802043238号