Skip to main content

Advanced Statistics Summer Workshop

The Department of Biostatistics and Health Data Science is offering a two-week comprehensive summer workshop featuring three courses of advanced statistical topics commonly encountered in medical research. The workshop is tailored for physicians, scientists and researchers actively engaged in biomedical research and taught by biostatistics and data science faculty who bring real-world experience.

Registration fees

Non-IU registrant
One course: $400
Two courses: $300 each

IU registrant
One course: $300
Two courses: $200 each

IU student registrant
$200 each

Request more information

June 2–13, 2025

This workshop will be held on Zoom.

Registration is now open!

Course Objectives

At the end of the workshop, participants should be able to:

  • Understand and apply advanced study designs across all phases of clinical trials and observational studies.
  • Gain proficiency in advanced statistical models, particularly those used for categorical, longitudinal and survival outcomes.
  • Gain understanding of reliability and validity assessment methods, and study designs to evaluate such results.
  • Understand key statistical concepts related to causal inference and mediation analyses, and apply these methods effectively in analyzing research data.
  • Acquire knowledge of advanced statistical learning methods for handling high-dimensional data utilized in micro-array, sequencing and neuroimaging data analyses.
  • Develop critical statistical thinking through case studies from real-world medical research data.

Course 1: Advanced Study Design

Meet the Instructors


Sujuan Gao

Sujuan Gao, PhD

Professor of Biostatistics & Health Data Science

Dr. Gao is the director of the Biostatistics and Data Management Core of the Indiana Alzheimer’s Disease Research Center. Dr. Gao’s research interest is in developing statistical methods for the analysis of longitudinal and mixed type data. Her major collaborative areas are in Alzheimer’s disease and aging research.

Read Bio

26956-Zang, Yong

Yong Zang, PhD

Associate Professor of Biostatistics & Health Data Science

Dr. Zang is the associate director of the Biostatistics and Data Management Core of the IU Simon Comprehensive Cancer Center. His research focuses on theory, algorithm and software development for adaptive clinical trial design and analysis, and statistical genetics.

Read Bio

2705-Tu, Wanzhu

Wanzhu Tu, PhD

Professor of Biostatistics & Health Data Science

Dr. Tu is an applied statistician who has designed and led multiple clinical trials and observational studies. He has hands-on experience in conducting pragmatic trials. His statistical methodological research is mostly in causal inference, Bayesian inference and nonparametric regression.

Read Bio

3770-Li, Xiaochun

Xiaochun Li, PhD

Professor of Biostatistics & Health Data Science

Dr. Li specializes in clinical trials, leveraging longitudinal health care data to assess safety and effectiveness of treatments, identification of patient subpopulations for personalized medicine, and risk prediction models for clinical decision support. Dr. Li leads research collaborations in Biostatistical Cores in Cardiology, Emergency Medicine, and Nephrology.

Read Bio

2775-Monahan, Patrick

Patrick O. Monahan, PhD

Professor of Biostatistics & Health Data Science

Dr. Monahan is the director of Biostatistics and Data Management Core of the IU Simon Comprehensive Cancer Center. Dr. Monahan’s independent research is in quantitative psychometrics, including latent dimensionality assessment, item response theory, item bias detection, and generalizability theory for reliability assessment. His collaborative research focuses on behavioral medicine and mental health with applications involving analyses of patient-reported symptoms and quality of life across a wide range of disciplines such as primary care, cancer prevention, aging, dementia, psychiatry, diabetes and juvenile justice.

Read Bio

2660-Perkins, Susan

Susan M. Perkins, PhD

Professor of Biostatistics & Health Data Science

Dr. Perkins is the director of the Biostatistics, Epidemiology, and Research Design Program of the Indiana Clinical and Translational Science Institute. Dr. Perkins’ collaborative interests include all aspects of health services research and quality of life research. Her research interests are in the area of categorical data analysis.

Read Bio

special guest
portrait of christine caldwell

Christine Caldwell, JD

Regulatory Knowledge and Support Program Manager
Indiana Clinical and Translational Science Institute

Ms. Caldwell advises researchers on regulatory matters including IRB and FDA submissions and manages the CTSI clinical research monitoring service, single IRB project management service, and Scientific Review Committee.

Learn More

Workshop Program

June 2–13, 9 a.m.–12 p.m.

Week 1

June 2

9–9:30 a.m.

Welcome and introduction Kun Huang, PhD

June 2

9:30 a.m.–12 p.m.

Review
Statistical hypothesis testing
Types of study design
Sujuan Gao, PhD

June 3

9 a.m.–12 p.m.

Randomized Trial Design
Study phases
Comparative treatment study design
Sujuan Gao, PhD

June 4

9 a.m.–12 p.m.

Randomized Trial Design
Study monitoring
Case studies
Analysis of clinical trial results

Sujuan Gao, PhD

June 5

9 a.m.–12 p.m.

Early Phase Trials
Adaptive Design
Bayesian method
Dose optimization
Targets agents and immunotherapies

Yong Zang, PhD

June 6

9 a.m.–12 p.m.

Early Phase Trials
Biomarker guided clinical trials
Master protocol trials
Basket trial
Umbrella trial
Platform trial

Yong Zang, PhD

 

Week 2

June 9

9 a.m.–12 p.m.

Pragmatic trials
Commonly used pragmatic trial designs
Analyzing data in pragmatic trials
Implementation and reporting
Case studies

Wanzhu Tu, PhD

June 10

9 a.m.–12 p.m.

Causal inference
Counterfactuals
Directed acyclic graphs (DAG)

Xiaochun Li, PhD

June 11

9–10:30 a.m.

Causal inference
Mendelian randomization
Mediation Analysis

Xiaochun Li, PhD

June 11

10:30 a.m.–12 p.m.

Psychometric methods
Reliability: Domain-sampling, Test-retest
Interrater

Patrick Monahan, PhD

June 12

9 a.m.–12 p.m.

Psychometric methods
Validity: Content, construct, factorial,
predictive, known-groups, and sensitivityto-
change
Design and analysis for assessing reliability
and validity of scales that measure
psychosocial constructs

Patrick Monahan, PhD

June 13

9–10 a.m.

Regulatory issues in research studies
Trial protocol development
Institutional Review Board (IRB)
ClinicalTrials.gov Registration
FDA IND application

Christine Caldwell, JD
June 13

10 a.m.–12 p.m.
Study implementation
Data quality, reproducibility of results
DSMB, interim monitoring
Publication of study results
Collaborating with biostatistics team
Susan Perkins, PhD

Course 2: Advanced Statistical Models

Meet the Instructors


Sujuan Gao

Sujuan Gao, PhD

Professor of Biostatistics & Health Data Science

Dr. Gao is the director of Biostatistics and Data Management Core of the Indiana Alzheimer’s Disease Research Center. Dr. Gao’s research interest is in developing statistical methods for the analysis of longitudinal, mixed type data. Her major collaborative areas are in Alzheimer’s disease and aging research.

Read Bio

1219-Daggy, Joanne

Joanne K. Daggy, PhD

Associate Professor of Biostatistics & Health Data Science

Dr. Daggy’s methodological research areas include multivariate modeling of semi-continuous data, latent class models with conditional dependence as applied to the area of record linkage, and joint modeling of medical costs and survival with data from complex surveys. Dr. Daggy’s collaborative research areas include non-pharmacological intervention studies, cancer and areas in health service research.

Read Bio

48220-Ren, Jie

Jie Ren, PhD

Assistant Professor of Biostatistics & Health Data Science

Dr. Ren’s research focuses on Bayesian sparse learning, variable selection for high-dimensional data and Bayesian integrative model for multi-omics data. Her major collaborative areas are traumatic brain injury, infectious diseases, genomics and bioinformatics studies.

Read Bio

4304-Liu, Ziyue

Ziyue Liu, M.D.

Associate Professor of Biostatistics & Health Data Science

Dr. Liu’s collaboration areas include oncology, anesthesiology, kidney stones, HIV and musculoskeletal diseases. His methodological research focuses state space models, functional data analysis, and time series data analysis. He is mostly interested in understanding complex dynamics over time and utilizing them for classification and forecasting.

Read Bio

22827-Bakoyannis, Giorgos

Giorgos Bakoyannis, PhD

Associate Professor, School of Public Health

Dr. Bakoyannis’ methodological research is focused on the nonparametric and semiparametric analysis of complex survival, competing risks, and multistate process data, with a special emphasis on issues commonly arising in biomedical and clinical research, such as missing data, misclassification, and interval censoring. His research interests also include methodology development for precision medicine, and in particular, methods for the estimation of optimal individualized treatment rules. His major collaborative research areas are in HIV/AIDS and cancer.

Read Bio

44484-Zhao, Yi

Yi Zhao, PhD

Associate Professor of Biostatistics & Health Data Sciences

Dr. Zhao’s research focus is on causal mediation analysis, decomposition methods, multi-view data integration, density object analysis, high-dimensional data analysis, neuroimaging data analysis, and proteomics and metabolomics studies. Dr. Zhao has been involved in research projects on neuroscience and pulmonology. Dr. Zhao is a faculty member of the Indiana Alzheimer's Disease Research Center (IADRC) and a faculty member of the Indiana University School of Medicine Alzheimer’s Disease Drug Discovery Center (ADDDC). Dr. Zhao has collaborations in topics of AD, neurodevelopmental impact of substance exposure/abuse, and neurodevelopmental disorders, such as attention-deficit hyperactivity disorder (ADHD) and autism. Dr. Zhao is leading the Pulmonary Biostatistics Core of a large P01 grant and has been involved in research on asthma, cystic fibrosis (CF), and primary ciliary dyskinesia (PCD).

Read Bio

Workshop Program

June 2–13, 1–4 p.m.

Week 1

June 2

1–2:30 p.m.

Review
Course overview
Basic statistical concept
Linear regression models

Sujuan Gao, PhD

June 2

2:30–4 p.m.

Categorical Data Analysis
Binomial and multinomial Inference
Two-way tables
Statistical tests

Joanne Daggy, PhD

June 3

1–4 p.m.

Categorical Data Analysis
Generalized linear models
Logistic regression
Loglinear models
Mixture models for count data

Joanne Daggy, PhD

June 4

1–4 p.m.

Mixed effects models with application to longitudinal and clustered data
Mixed effects models

Jie Ren, PhD

June 5

1–2:30 p.m.

Longitudinal and clustered data
Multilevel models
GEE

Jie Ren, PhD

June 5

2:30–4 p.m.

Functional Data Analysis Ziyue Liu, PhD

June 6

1–2:30 p.m.

Time Series Data Analysis

Ziyue Liu, PhD

June 6

2:30–4 p.m.

Random effects models for meta-analysis Sujuan Gao, PhD

 

Week 2

June 9

1–4 p.m.

Survival Models
Survival data, censoring
Regression models for survival data

Giorgos Bakoyannis, PhD

June 10

1–4 p.m.

Survival Models
Competing risk
Multi-state models
Predictive accuracy

Giorgos Bakoyannis, PhD

June 11

1–2:30 p.m.

Joint Models

Sujuan Gao, PhD

June 11

2:30–4 p.m.

Statistical Methods for Missing Data

Sujuan Gao, PhD

June 12

1–4 p.m.

Causal inference
Randomized inference
Regression Adjustment
Propensity scores and matching
Instrumental variables
Sensitivity analysis and double robustness

Yi Zhao, PhD

June 13

1–4 p.m.

Mediation Analysis
Introduction & Mediation with single variable
Multilevel mediation
Mediation with multiple variables
Longitudinal mediations
Mediation in other data types
Introducing Course 3

Yi Zhao, PhD

Course 3: High-Dimensional Data Analysis

Meet the Instructors

62984-McCabe, Sean

Sean D. McCabe, PhD

Assistant Professor of Biostatistics & Health Data Science

Prior to joining Indiana University, Dr. McCabe was a postdoctoral research fellow in the Department of Biostatistics at the Harvard TH Chan School of Public Health. Dr. McCabe's research is on developing statistical methods for high-dimensional genomic data and specializes in integrative analyses. His collaborative interests include bioinformatics, cancer research and clinical trials.

Read Bio

44484-Zhao, Yi

Yi Zhao, PhD

Associate Professor of Biostatistics & Health Data Sciences

Dr. Zhao’s research focus is on causal mediation analysis, decomposition methods, multi-view data integration, density object analysis, high-dimensional data analysis, neuroimaging data analysis, and proteomics and metabolomics studies. Dr. Zhao has been involved in research projects on neuroscience and pulmonology. Dr. Zhao is a faculty member of the Indiana Alzheimer's Disease Research Center (IADRC) and a faculty member of the Indiana University School of Medicine Alzheimer’s Disease Drug Discovery Center (ADDDC). Dr. Zhao has collaborations in topics of AD, neurodevelopmental impact of substance exposure/abuse, and neurodevelopmental disorders, such as attention-deficit hyperactivity disorder (ADHD) and autism. Dr. Zhao is leading the Pulmonary Biostatistics Core of a large P01 grant and has been involved in research on asthma, cystic fibrosis (CF), and primary ciliary dyskinesia (PCD).

Read Bio

64178-Sun, Dayu

Dayu Sun, PhD

Assistant Professor of Biostatistics & Health Data Science

Dr. Dayu Sun’s research focuses on developing novel methods for high-dimensional data analysis, particularly in neuroimaging applications such as functional MRI and brain connectivity studies related to Alzheimer’s disease and post-traumatic stress disorder. His work also involves the joint analysis of intermittently observed longitudinal data and complex time-to-event data, aiming to capture the time-dynamic nature of underlying biological mechanisms, with applications to HIV, COVID-19, and nutrition studies. Dr. Sun is an affiliated scientist of the Center for Aging Research at Regenstreif Institute, focusing on improving patient care for Alzheimer's disease. Additionally, he collaborates extensively with researchers in pediatrics, psychiatry, mental health, and criminology. He has significant experience analyzing clinical and observational data on juvenile prisoners, focusing on mental health, substance use, suicide trends and recidivism records.

Read Bio

51902-Johnson, Travis

Travis S. Johnson, PhD

Agnes Beaudry Investigator in Myeloma Research

Dr. Johnson works at the intersection of data science, medicine and genetics. His research is focused both on methods development for omics data and collaborative applied omics data analysis. His group develops and applies novel machine learning techniques to single cell RNA sequencing, spatial transcriptomics, bulk RNA sequencing and multi-omic data. His methodological interests are centered around deep transfer learning based and risk inference. In addition, Dr. Johnson collaborates with research groups at Indiana University School of Medicine, Indiana Biosciences Research Institute and Eli Lilly and Company by providing bioinformatics and data science support.

Read Bio

38697-Huang, Kun

Kun Huang, PhD

Chair, Department of Biostatistics & Health Data Science

Dr. Huang leads the Precision Health Initiative Data Science and Informatics group, and is the director of the Bioinformatics and Computational Biology Core for the TREAT-AD program for Alzheimer’s therapy development. He also serves as associate director for data science at the IU Simon Comprehensive Cancer Center and leads the data science and informatics service for the National Cancer Institute Pediatric Cancer SPORE grant-funded research program.

Read Bio

64933-Zhou, Laura

Laura Y. Zhou, PhD

Assistant Professor of Biostatistics & Health Data Science

Before joining Indiana University, Dr. Zhou was a postdoctoral research fellow in the Department of Genetics at the University of North Carolina at Chapel Hill. Her research focuses on developing statistical methods to address key challenges in biomedical research, particularly in immunomics, genomics and precision medicine. This includes work on neural networks and polygenic risk scores. Her collaborative interests span cancer research, chronic kidney disease and studies involving underrepresented populations.

Read Bio

Workshop Program

June 2–13, 9 a.m.–12 p.m.

Week 1

June 2

9 a.m.–12 p.m.

Introduction
Types of high-dimensional data
Review of statistical concepts
Multivariate normal distribution
Defining distances
Regression models

Sean McCabe, PhD
Yi Zhao, PhD

June 3

9:30 a.m.–12 p.m.

Dimension Reduction
Principal component analysis (PCA)
Factor Analysis
Canonical correlation
Partial least square
Independent component analysis (ICA)
Nonlinear dimension reduction (uniform
manifold approximation and projection)

Dayu Sun, PhD

June 4

9 a.m.–12 p.m.

Regularization in regression models
Ridge regression
LASSO and extensions
Non-convex penalties
LASSO based inference

Yi Zhao, PhD

June 5

9 a.m.–12 p.m.

Classification
Logistic regression
Linear discriminant analysis
Support vector machine
K-nearest neighbors

Sean McCabe, PhD

June 6

9 a.m.–12 p.m.

Clustering
K-Means
Hierarchical clustering
Density Models
Centroid Models
Distribution Models

Sean McCabe, PhD
Kun Huang, PhD

 

Week 2

June 9

9 a.m.–12 p.m.

Machine Learning
Decision Trees
Random Forests
Gradient Boosting
Neural networks

Travis Johnson, PhD
Kun Huang, PhD

June 10

9–10:30 a.m.

Machine Learning continued
DL/TL/RNN/MTL/CNN survey
Large language models

Travis Johnson, PhD
Kun Huang, PhD

June 10

10:30–12 p.m.

Model Assessment
Cross-validation
Bootstrap

Laura Zhou, PhD

June 11

9 a.m.–12 p.m.

Model Inference
Bootstrap
MCMC
Bagging
Model averaging
Multiple Testing

Laura Zhou, PhD
Yi Zhao, PhD

June 12

9 a.m.–12 p.m.

Data Integration
Data driven
Hypothesis driven
Meta analysis
Multi-view
Data harmonization

Laura Zhou, PhD
Yi Zhao, PhD

June 13

9 a.m.–12 p.m.

Special Topics
Graphic models
Network analysis
Optimal transport
Generative models

Yi Zhao, PhD