统计数科学术讲座

Cross-Validation for Optimal and Reproducible Statistical Learning

演讲者:Yuhong Yang(明尼苏达大学)

时间:2019-05-31 10:30-11:30

地点:慧园3栋 415报告厅

In data mining and statistical learning, we frequently encounter the task of comparing different methods/algorithms to reach a final choice for pure prediction or a scientific understanding/interpretation of a regression relationship. Cross-validation provides a powerful tool to address the matter. Unfortunately, there are seemingly widespread misconceptions on its use, which can lead to unreliable conclusions. In this talk, we will address the subtle issues involved and present results of minimax optimal regression learning and consistent selection of the best method for the data. In addition, we will propose proper cross-validation tools for model selection diagnostics that will cry foul at an impressive-looking but not really reproducible outcome from a sparse-pattern-hunting method in the wild west of learning with a huge number of covariates.