Jian HUANG: Statistical Deep Learning with Applications to Biomedical Data Analysis

2022-10-25

On Oct 20, 2022, Prof. Jian HUANG from the Department of Applied Mathematics, The Hong Kong Polytechnic University was invited to the 87^{th} Science Lecture of College of Science, SUSTech. He gave a lecture themed “Statistical Deep Learning with Applications to Biomedical Data Analysis”, which was chaired by Prof. Qi-man SHAO and Prof. Guoliang TIAN of the Department of Statistics and Data Science, SUSTech. More than 200 audience participated in this lecture online.

**About the speaker****：**

Jian Huang is Chair Professor of Data Science and Analytics in the Department of Applied Mathematics at The Hong Kong Polytechnic University. He obtained his Ph.D. degree in Statistics from the University of Washington in Seattle. His research interests include machine learning, high-dimensional statistics, computational statistics, biostatistics, and bioinformatics. He has published widely in the fields of Statistics, Biostatistics, Machine Learning, Bioinformatics and Econometrics. He was designated a highly cited researcher in the field of Mathematics from 2015 to 2019 by the Web of Science group at Clarivate and included in the list of top 2% of the world's most cited scientists by Elsevier BV and Stanford University (2021). Professor Huang is a fellow of the American Statistical Association and a fellow of the Institute of Mathematical Statistics.

**Lecture review:**

In this lecture, Prof. HUANG introduced the remarkable achievements of deep learning in the past ten years. Through a simple MNIST dataset, he described the problem encountered when traditional non-parametric statistical methods deal with high-dimensional data: the curse of dimensionality. Then Prof. HUANG presented a result of processing this data with a deep neural network, which can be seen that the data set is ideally classified into the correct group. He explained from the principle of the model why the deep neural network can overcome the problem of the curse of dimensionality and can fit the complex objective function very well. Then, he focused on applying deep learning methods to three types of data (single-cell, Digital Health, and Medical Imaging data). He said that the deep neural network model needs the help of statistical thinking to develop better when it develops to a certain extent. At the end of lecture, Prof. HUANG encouraged teachers and students to learn more cutting-edge knowledge about computer science while studying traditional statistical methods.

**Interaction part: **

During the interaction part, the audience asked questions about the preparation of learning data science, the applications of image integration for medical imaging data and so forth. Prof. HUANG answered the questions one by one.

**1. In the era of data science, how to turn to data science as a statistics student. What to prepare for?**

It would help if you learned primary computer languages, such as Python, C++, etc. Using Tensorflow, PyTorch, and other tools to build neural networks would be best. It usually takes a month or two to master.

**2. Under the teacher's guidance, whether the students can make some achievements in about two years.**

The threshold for entering deep learning theory is much higher than learning computer languages. There are no shortcuts, and learning and reading many articles takes time. But once you have a statistical foundation, learning the theory of deep learning is relatively simple.

**3. How to judge the credibility of the image results generated by auto-GAN?**

There is currently no solution to this problem from the model. But we can still use numerical values to measure the quality of the generated data or let professionals judge the quality of the results.

**4. What are the applications after image integration for medical imaging data?**

It can assist doctors in making a diagnosis.

**5. Does conditional GAN have any assumptions and requirements for the conditional distribution of Y|X?**

Basically no. Distributions can be continuous, discrete, or mixed. But they can't be too weird, and there need to be smooth.

**6. How to train students' data processing ability from the perspective of the statistics department to better adapt to society?**

Learning computer languages must be combined with practical problems. For example, learning programming by completing a project can develop students' computer programming ability faster and more effectively. At the same time, students also need to know some basic knowledge of computer operation logic.