报告简介 Abstract
Subsampling or subdata selection is a useful approach in large-scale statistical learning. Most existing studies focus on model-based subsampling methods which significantly depend on the model assumption. In this paper, we consider the model-free subsampling strategy for generating subdata from the original full data. In order to measure the goodness of representation of a subdata with respect to the original data, we propose a criterion, generalized empirical F-discrepancy (GEFD), and study its theoretical properties in connection with the classical generalized L2-discrepancy in the theory of uniform designs. These properties allow us to develop a kind of low-GEFD data-driven subsampling method based on the existing uniform designs. By simulation examples and a real case study, we show that the proposed subsampling method enjoys the model-free property and is superior to the random sampling method. In practice, such a model-free property is more appealing than the model-based subsampling methods, where the latter may have poor performance when the model is misspecified, as demonstrated in our simulation studies.
嘉宾简介 About the Speaker
周永道,男,南开大学统计与数据科学学院教授、博导,天津市创新类领军人才、天津市131创新型人才、南开大学百名青年学科带头人。研究方向为试验设计和数据挖掘。主持过四项国家自然科学基金、一项天津市自然科学基金重点项目及其它多项纵横向项目。在统计学顶级期刊 JASA、Biometrika 及中国科学等国内外重要期刊发表学术论文40多篇;合作出版了两本中英文专著和两本统计学专业教材。曾获国家统计局统计科学研究优秀成果奖一等奖。曾访问加州大学洛杉矶分校、西蒙菲莎大学、曼彻斯特大学、香港大学等高校。现为中国数学会均匀设计分会秘书长、泛华统计协会永久会员、美国《数学评论》评论员。
讲座海报 Poster