We will discuss the problem of finding principal components to the multivariate datasets, that lie on an embedded nonlinear Riemannian manifold within the higher-dimensional space. Our aim is to extend the geometric interpretation of PCA, while being able to capture the nongeodesic form of variation in the data. We introduce the concept of a principal sub-manifold, a manifold passing through the center of the data, and at any point of the manifold, it moves in the direction of the highest curvature in the space spanned by the eigenvectors of the local tangent space PCA. We show the principal sub-manifold yields the usual principal components in Euclidean space. We illustrate how to find, use and interpret the principal sub-manifold, with which a classification boundary can be defined for data sets on manifolds.
About the Speaker
Zhigang Yao is an assistant professor in the Department of Statistics and Applied Probability at the National University of Singapore (NUS). He is also a Faculty Affiliate of the Institute of Data Science (IDS) at NUS. His main research area is statistical inference for complex data. Dr. Yao is an elected member of ISI. His recent work is primarily focused on manifold learning involving an interaction between statistics and geometry. These works include developing new statistical methods that can be used in the analysis of data with respect to their relevant geometry. Traditionally, statistical methodology has been deeply rooted in methods resting upon linearity, not necessarily in the sense that the methods are linear in themselves, but in that they exploit the structure of the ambient sample space in a fundamental way. He has published his results in several leading statistics journals including AOS and JASA.