Statistical Methods in Imaging Conference Program

Wednesday, May 29

8:30–10:00 a.m. Tutorial: Complex-time representation of spatiotemporal processes and spacekime analytics
Instructor: Ivo Dinov
Location: Room 101
Tutorial: Deep learning and generative AI
Instructor: Haoda Fu
Location: Room 102
10:00–10:10 a.m. Break
10:10–11:40 a.m. Tutorial: NiChart: a software tool for building machine learning oriented neuroimaging brain chart
Instructors: George Aidinis, Haochang Shou and Ren Zheng
Location: Room 101
Tutorial: Deep learning and generative AI
Instructor: Haoda Fu
Location: Room 102
11:40 a.m.–1:10 p.m. Lunch break
1:10–2:40 p.m.
Student Paper Competition Winners
Location: Room 101/102
2:40–2:50 p.m. Break
2:50–3:50 p.m. Keynote: Dr. Bin Yu
Location: Room 101/102
3:50–6:00 p.m. Poster Presentations/Mixer
Location: Room 103/104


Thursday, May 30

8:00–8:30 a.m. Breakfast 
8:30–10:10 a.m. Longitudinal imaging and biostatistical methods
Organizer: Ivo Dinov
Speakers: Sharmistha Guha, Hossein Moradi, Ranjan Maitra, Dan Rowe
Location: Room 103/104
Expanding neuroimaging research: integrating insights from biomedical sciences
Organizer: Jun Young Park
Speakers: Sarah M. Weinstein, Andrew An Chen, Bingxin Zhao, Jun Young Park
Location: Room 102
10:10–10:20 a.m. Break
10:20–11:20 a.m. Keynote: Dr. Andrew J. Saykin
Location: Room 103/104
11:20 a.m.–1:00 p.m. Lunch break
1:00–2:15 p.m. Statistical learning methods for neuroscience
Organizer: Shuheng Zhou
Speakers: Jian Kang, Chunming Zhang, Shuheng Zhou
Location: Room 103/104
Recent developments in statistical methodology for neuroimaging data analysis
Organizer: Dayu Sun
Speakers: Xin Ma, Shuo Chen, Joshua Lukemire
Location: Room 102
2:15–2:25 p.m. Break
2:25–4:05 p.m. Statistical inference in neuroimaging
Organizer: Eardi Lila
Speakers: Benjamin Risk, Raphiel Murden, Daniel Kessler, Simon Vandekar
Location: Room 103/104
New developments for harmonization, processing and modeling for imaging data
Organizer: Yize Zhao
Speakers: Dana Tudorascu, Selena Wang, Zhengwu Zhang, Tsung-Hung Yao
Location: Room 102
4:05–4:15 p.m. Break
4:15–5:30 p.m. Invariance and distribution/density objects in neuroimaging studies
Organizer: Yi Zhao
Speakers: Bonnie Smith, Changbo Zhu, Yi Zhao
Location: Room 103/104
Advances in statistical methods for neuroimaging data
Organizer: Selena Wang
Speakers: Dayu Sun, Yaotian Wang, Zhiling Gu
Location: Room 102


Friday, May 31

8:00–8:30 a.m. Breakfast
8:30–10:10 a.m. Graph-based network connectomes analysis
Organizer: Simon Vandekar
Speakers: Eardi Lila, Sean L. Simpson, Panpan Zhang, Tingting Zhang
Location: Room 103/104
Statistical methods for dissecting tumor microenvironment based on spatial proteomics datasets
Organizer: Souvik Seal
Speakers: Thao Vu, Jiangmei Xiong, Julia Wrobel, Junsouk Choi
Location: Room 102
10:10–10:20 a.m. Break
10:20–11:20 a.m. Keynote: Dr. Robert E. Kass
Location: Room 103/104
11:20 a.m.–1:00 p.m. Lunch break
1:00–2:40 p.m. When machine learning and generative models meet imaging, network and point cloud data
Organizer: Zhengwu Zhang
Speakers: Mingxia Liu, Maoran Xu, Yuexuan Wu, Xinyi Li
Location: Room 103/104
Novel statistical inference methods with applications
Organizer: Julia Fisher
Speakers: Fatma Parlak, Daniel Adrian, Yueyang Shen, Jose Rodriguez-Acosta
Location: Room 102
2:40–2:50 p.m. Break
2:50–4:30 p.m. Frontiers in medical imaging: harnessing artificial intelligence and statistical analysis for breakthrough insights
Organizer: Lei Liu
Speakers: Yize Zhao, Yifan Peng, Lei Liu, Haoda Fu
Location: Room 103/104
4:30–4:40 p.m. Closing remarks
Location: Room 103/104


Tutorial Information

Complex-time representation of spatiotemporal processes and spacekime analytics

Ivo D. Dinov, University of Michigan

This tutorial will describe the novel complex-time (kime) representation of repeated measurement longitudinal processes, which underlies advanced space-kime statistical inference and space-kime artificial intelligence (AI) applications. By translating fundamental quantum mechanics principles into statistical inference models of time-varying processes, we generalize the classical 4D spatiotemporal sampling to a 5D space-kime manifold, where the phase of complex-time encodes repeated random drawings at xed spatiotemporal locations. Many AI applications and statistical inference techniques involving temporal data can be formulated in a Bayesian space-kime analytics framework. We explore alternative strategies for translating time-series observations into kime-surfaces, which are richer, computationally tractable, objects amenable to tensor-based linear modeling and model-free inference. Simulated and observed neuroimaging and macroeconomics data will be used to demonstrate applications of space-kime analytics. As time permits, we may discuss the space-kime analytic duality between theoretical model inference, based on generalized functions (distributions), and experimental data inference, based on replicated nite samples as proxy measures of the underlying probability distributions. Several theoretical, experimental, computational, and data-analytic open problems will be presented.

NiChart: A software tool for building machine learning oriented neuroimaging brain chart

Haochang Shou, Ren Zhang and George Aidinis, University of Pennsylvania

Brain magnetic resonance imaging (MRI) has been widely adopted by studies of brain aging, neurologic disorders, and neurodegenerative diseases, which have collectively generated a tremendous data resource for understanding and quantitatively describing normal and pathologic brain aging. However, modest diversity and sample sizes of individual studies, as well as variations of MRI scanners and imaging protocols across studies, often limit the power and generalizability of results and derived models. We have developed the neuroimaging brain aging chart (NiChart), a set of modular but integrated software tools for neuroimaging research including processing, harmonization, visualization and derivation and application of machine learning models for individualized multi-variate imaging signatures, based on a diverse harmonized dataset of 50,000+ diverse participants across 23 studies. To enhance accessibility, we also provide cloud access and software tools with a point-and-click graphical user interface. A plugin system allows addition of community-derived models of imaging signatures, enabling the dynamic enhancement and enrichment of NiChart with new dimensions of brain structure that are the focus of other studies. In this tutorial, we will introduce and show case different components of NiChart including multimodal imaging processing and harmonization, and discuss the methods behind the software.

Tutorial on Deep Learning and Generative AI

Haoda Fu, Eli Lilly

Designed specifically for individuals possessing a strong foundation in statistics and biostatistics, this course seeks to bridge the gap into the realm of deep learning and generative AI. Beginning with fundamental knowledge of deep learning, patients will be guided through hands-on implementations using the PyTorch framework. As we delve deeper, the course will unpack popular architectures that have reshaped the landscape of artificial intelligence, including CNN, GNN, ResNet, U-net, attention mechanisms, and transformers. Given the increasing importance of AI in healthcare, special emphasis will be laid on techniques tailor-made for medical imagery and drug discovery, such as SE(3) equivariant machine learning. As a culmination, participants will be introduced to the various facets of generative AI, encompassing GANs, VAEs, DDPM, and score-based generative models. Whether you're seeking to apply these technologies in healthcare, research, or any other domain, this tutorial promises a comprehensive insight into the world of generative AI and deep learning. For this short course, we are going to use Python and necessary packages such as PyTorch, NumPy are needed. All the soXware and packages used in this short course are free.

Keynote Abstracts

Sparse dictionary learning and deep learning in practice and theory

Bin Yu, PhD
Chancellor's Distinguished Professor
Class of 1936 Second Chair
Department of Statistics and Electrical Engineering and Computer Sciences
University of California, Berkeley

Chair: Tingting Zhang, University of Pittsburgh

Sparse dictionary learning has a long history and produces wavelet-like lters when fed with natural image patches, corresponding to the V1 primary visual cortex of the human brain. Wavelets as local Fourier Transforms are interpretable in physical sciences and beyond. In this talk, we will first describe adaptive wavelet distillation (AWD) to turn black-box deep learning models interpretable in cosmology, cellular biology and climate science problems while improving predictive performance. Then we present theoretical results that, under simple sparse dictionary models, gradient descent in auto-encoder fitting converges to one point on a manifold of global minima, and which minimum depends on the batch size. In particular, we show that when using a small batch-size as in stochastic gradient descent (SGD) a qualitatively different type of "feature selection" occurs.

Identification of interacting neural populations: ideas, issues, and personal experience

Robert E. Kass, PhD
Maurice Falk Professor of Statistics and Computational Neuroscience
Department of Statistics & Data Science, Machine Learning Department, and Neuroscience Institute
Carnegie Mellon University

Chair: Daniel B. Rowe, Marquette University

I will discuss the primary statistics-in-neuroscience topic my trainees and I have worked on since the time of my 2017 Fisher Lecture (renamed in 2020 to Distinguished Achievement Award and Lectureship), where I articulated the problem in general terms: it is the problem of identifying interactions among neural populations from large-scale electrophysiological recordings. I will describe statistically natural approaches we have used to document interactions among brain areas based on neural spike trains (using latent variable point process models) and oscillating field potentials (by defining an exponential family on a 24-dimensional torus). I will also try to articulate what I think are strategies for working in neuroscience effectively, as well as some of the big lessons I've learned about the nature of statistical inference in science, and the ways we as statisticians can continue to improve the scientific process.

Decoding Alzheimer's disease: a journey from imaging genetics to systems biology en route to precision medicine

Andrew J. Saykin, PsyD
Raymond C. Beeler Professor of Radiology
Director, Center for Neuroimaging and Indiana Alzheimer's Disease Research Center (IADRC)
Departments of Radiology and Imaging Sciences, Psychiatry, Neurology, and Medical and Molecular Genetics
Indiana University School of Medicine
Adjunct Professor of Psychology, College of Arts and Sciences, Indiana University Bloomington

Chair: Sujuan Gao, Indiana University School of Medicine

In the pursuit of understanding Alzheimer's Disease (AD), imaging genetics has emerged as a pivotal field, connecting imaging of brain structure, function, and pathology, with genetic information to elucidate the biological pathways implicated in AD. This talk highlights the transformative journey of imaging genetics from its inception to its current state and explores future possibilities powered by advanced computational techniques including AI and network science. Neuroimaging and other quantitative endophenotypes add statistical power, richness, and mechanistic potential to genetic association studies. Statistical methods have evolved and facilitated discoveries in the genetic underpinnings of AD, often leveraging data from the Alzheimer's Disease Neuroimaging Initiative (ADNI), generated by the Genetics and Systems Biology Core and many partners and collaborators. ADNI's open science approach has helped enable a global effort by data scientists and biomedical researchers to extract informative disease-relevant features from high dimensional data that can provide insights into AD pathophysiology. Rapidly evolving technologies including high throughput multi-omics including cell specific transcriptomics, epigenetics, proteomics and metabolomics/lipidomics provide opportunities to relate imaging phenotypes to basic biological processes. CLEAR-AD is a new NIA-sponsored program designed to identify molecular signatures linking brain and peripheral biomarkers, with a multiethnic framework for broader generalization. The grand challenge for the field is how to optimally employ advances in data science, such as deep learning and biological network analysis, to identify actionable molecular signatures enabling earlier detection and personally tailored therapeutic strategies needed for a robust precision medicine approach to AD and other neurodegenerative diseases.

Student Paper Competition Winners

Chair: Yi Zhao, Indiana University School of Medicine

Presenter: Yueyang Shen, University of Michigan

Title: Statistical foundations of invariance and equivariance in deep artificial neural network learning

Abstract: This article describes a mathematical-statistics framework for representing, modeling, and utilizing invariance and equivariance properties of deep neural networks. By drawing direct parallels between topological characterizations of invariance and equivariance principles, probabilistic symmetry, and statistical inference, we explore the foundational properties underpinning reliability in deep learning models. We examine the group-theoretic invariance in a number of deep neural networks including, multilayer perceptrons, convolutional networks, transformers, variational autoencoders, and steerable neural networks. A number of examples of each type of exact and approximate invariance and equivariance are presented throughout the manuscript. Several biomedical and imaging application are discussed at the end. Understanding the theoretical foundation underpinning deep neural network invariance is critical for reliable estimation of prior-predictive distributions, accurate calculations of posterior inference, and consistent AI prediction, classification, and forecasting.

Presenter: James Buenfil, University of Washington

Title: Asymmetric canonical correlation analysis of Riemannian and high-dimensional data

Abstract: In this paper, we introduce a novel statistical model for the integrative analysis of Riemannian-valued functional data and high-dimensional data. We apply this model to explore the dependence structure between each subject's dynamic functional connectivity — represented by a temporally indexed collection of positive definite covariance matrices — and high dimensional data representing lifestyle, demographic, and psychometric measures. Specifically, we employ a reformulation of canonical correlation analysis that enables efficient control of the complexity of the functional canonical directions within a Riemannian framework, using tangent space sieve approximations, and that of the high-dimensional canonical directions via a sparsity-promoting penalty. The proposed method shows improved empirical performance over alternative approaches and comes with theoretical guarantees. Its application to data from the Human Connectome Project reveals a dominant mode of covariation between dynamic functional connectivity and lifestyle, demographic, and psychometric measures. This mode aligns with results from static connectivity studies but reveals a unique temporal non-stationary pattern that such studies fail to capture.

Presenter: Yixin Chen, Virginia Tech

Title: Disentangled adversarial flow for multi-source learning

Abstract: Diffusion magnetic resonance imaging has advanced our understanding of the brain's structural connectome and its cognitive function roles. However, the heterogeneity across different neuroimaging studies, combined with limited labeled samples in specialized cohorts, poses challenges in developing accurate predictive models for cognitive abilities. This paper introduces a novel approach to address these challenges by leveraging information from large-scale, multi-source datasets to enhance predictive accuracy in smaller-scale neuroimaging studies. We propose the Disentangled Adversarial Flow (DAF), a flow-based generative model that generates domain-invariant representations of brain connectomes while preserving their essential features. DAF employs a bidirectional-generative architecture and a kernel-based dependence measure to quantify and minimize the dependence between brain networks and their associated domain labels. Furthermore, we introduce an ensemble-based DAF regression framework that utilizes a weighted, data-adaptive approach to integrate information from multiple large-scale source datasets, effectively mitigating information loss when dealing with multi-domain data. The proposed method is validated on three brain imaging studies: the Adolescent Brain Cognitive Development (ABCD) study, the Human Connectome Project (HCP), and the Alzheimer's Disease Neuroimaging Initiative (ADNI). Results show DAF reduces discrepancies in brain connectomes across domains and improves prediction performance, especially with limited labeled samples in the target domain. Our findings highlight the potential of transfer learning techniques in enhancing the understanding of brain-behavior relationships and improving predictive modeling in neuroscience research.

Presenter: Yi Tang Chen, The Ohio State University

Title: Assessment of glioblastoma multiforme tumor heterogeneity via MRI-derived shape and intensity features

Abstract: In this work, we use a geometric approach to jointly characterize tumor shape and intensity along the tumor contour, as captured in magnetic resonance images, in the context of glioblastoma multiforme. Key properties of the proposed shape+intensity representation include invariance to translation, scale, rotation and reparameterization, which enable objective characterization and comparison of these crucial image-derived tumor features. The representation further allows the user to tune the emphasis of the shape and intensity components during registration, comparison and statistical summarization (averaging, computation of overall variance and exploration of variability via principal component analysis). In addition, we define a composite distance that allows us to integrate shape and intensity information from two imaging modalities. The proposed framework can be easily integrated with distance-based clustering for the purpose of discovering groups of subjects with distinct survival prognosis. When applied to a cohort of subjects with glioblastoma multiforme, we discover groups with large median survival differences. We further relate the subjects' cluster memberships to tumor heterogeneity. Our results suggest that tumor shape variation plays an important role in disease prognosis.

Invited Session Abstracts

Longitudinal imaging and biostatistical methods

Organizer: Ivo Dinov, University of Michigan
Chair: Julia Fisher, University of Arizona, BIO5 Institute, Statistics Consulting Laboratory

Presenter: Sharmistha Guha, Texas A&M University
Title: Supervised modeling of heterogeneous networks: investigating functional connectivity across various cognitive control tasks
Abstract: We present a novel Bayesian approach to address limitations in current methods for studying the relationship between functional connectivity across cognitive control domains and cognitive phenotypes. Our integrated framework jointly learns heterogeneous networks with vector-valued predictors, overcoming the constraints of treating each network independently in regression analysis. By assuming shared nodes across networks with varying interconnections, our method captures complex relationships while offering uncertainty quantification. Theoretical analysis demonstrates convergence to the true data-generating density, supported by empirical studies showcasing superior performance over existing approaches.

Presenter: Hossein Moradi, South Dakota State University
Title: Tensor regression for brain imaging data
Abstract: Multidimensional array data, also called tensors, are used in neuroimaging and other big data applications. In this paper, we propose a parsimonious Bayesian Tensor linear model for neuroimaging study with brain image as a response and a vector of predictors. Our method provides estimates for the parameters of interest by using an Envelope method. The proposed method characterizes different sources of uncertainty and the inference is performed using Markov Chain Monte Carlo (MCMC). We demonstrate posterior consistency and develop a computationally efficient MCMC algorithm for posterior computation using Gibbs sampling. The effectiveness of our approach is illustrated through simulation studies and analysis of alcohol addiction's effect on brain connectivity.

Presenter: Ranjan Maitra, Iowa State University
Title: Reduced-Rank Tensor-on-Tensor Regression and Tensor-Variate Analysis of Variance
Abstract: Fitting regression models with many multivariate responses and covariates can be challenging, but such responses and covariates sometimes have tensor-variate structure. We extend the classical multivariate regression model to exploit such structure in two ways: first, we impose four types of low-rank tensor formats on the regression coefficients. Second, we model the errors using the tensor-variate normal distribution that imposes a Kronecker separable format on the covariance matrix. We obtain maximum likelihood estimators via block-relaxation algorithms and derive their computational complexity and asymptotic distributions. Our regression framework enables us to formulate tensor-variate analysis of variance (TANOVA) methodology. This methodology, when applied in a one-way TANOVA layout, enables us to identify cerebral regions significantly associated with the interaction of suicide attempters or non-attemptor ideators and positive-, negative- or death-connoting words in a functional Magnetic Resonance Imaging study. Another application uses three-way TANOVA on the Labeled Faces in the Wild image dataset to distinguish facial characteristics related to ethnic origin, age group and gender. A R package totr implements the methodology.

Presenter: Daniel Rowe, Marquette University
Title: Bayesian k-space estimation for fMRI
Abstract: In fMRI, as voxel sizes decrease, there is less tissue to produce a signal, resulting in a decrease in the signal-to-noise ratio and contrast-to-noise ratio. In fMRI, there have been many attempts to decrease the noise in an image in order to increase activation, but most lead to blurrier images. An alternative is to develop methods in spatial frequency space, which have unique benefits. This work proposes a Bayesian approach that quantifies available a priori information about measured complex-valued frequency coefficients. This prior information is incorporated with observed spatial frequency coefficients, and the spatial frequency coefficients estimated a posteriori. The posterior estimated spatial frequency coefficient are inverse Fourier transform reconstructed into images with reduced noise and increased detection power.

Expanding neuroimaging research: integrating insights from biomedical sciences

Organizer: Jun Young Park, University of Toronto
Chair: Haochang Shou, University of Pennsylvania

Presenter: Sarah M. Weinstein, Temple University
Title: Testing network specificity of brain-phenotype associations
Abstract: Evaluating topological similarities between canonical functional networks and maps of brainphenotype associations can add to our understanding of mechanisms underlying psychopathology. However, methods for integrating information about functional network topology with spatial maps of brain-phenotype associations have varied in terms of scientific rigor and underlying assumptions. While some approaches have relied on subjective interpretations, others have made unrealistic assumptions about spatial properties of imaging data, leading to inflated false positive rates. We seek to address this gap in existing methodology by borrowing insight from a method widely used in genomics research. We propose Network Enrichment Significance Testing (NEST), a flexible framework for testing the specificity of brain-phenotype associations to functional networks (or other subregions) of interest. We apply NEST to study associations with structural and functional brain imaging data from a large-scale neurodevelopmental cohort study.

Presenter: Andrew An Chen, Medical University of South Carolina
Title: Batch adjustments in location, scale, and shape for complex multi-site neuroimaging studies
Abstract: Neuroimaging studies increasingly collect complex measurements across multiple study sites to diagnose and assess neurological disorders. These multisite studies can acquire a larger and generalizable sample; however, they are also well-known to be biased by differences across scanners. Previous approaches, including the widely-used ComBat method, address batch effects in the location of scale of measurements while assuming normality. While effective for certain neuroimaging measures, these methods are unable to handle zero-in ation, skewness, and non-negativity which are observed in neurological studies. Here, we introduce Batch adjustments in Location, Scale, and Shape (BatLSS) which removes batch effects from any parameters in a large class of distributions, while flexibly modeling covariates. We first show that BatLSS adjusts for batch in data simulated from distributions relevant to neuroimaging including beta, generalized gamma, and several skewed distributions. We then demonstrate that BatLSS effectively harmonizes zero-in ated and rightskewed white matter lesion volumes in a large multi-site multi-study dataset from the imaging-based SysTem for  AGing and NeurodeGenerative diseases (iSTAGING) consortium.

Presenter: Bingxin Zhao, University of Pennsylvania
Title: Multi-organ imaging-derived polygenic indexes for brain and body health
Abstract: The UK Biobank (UKB) imaging project is a crucial resource for biomedical research, but is limited to 100,000 participants due to cost and accessibility barriers. One solution is to use genetic data to predict heritable imaging-derived phenotypes (IDPs) for a larger cohort. Here we developed and evaluated 4,375 IDP genetic scores (IGS) derived from UKB brain and body images. When applied to non-imaging UKB participants, IGS revealed links to numerous phenotypes and stratified subjects at increased risk for both brain and body diseases. For example, IGS burden scores identified individuals at higher risk for Alzheimer's disease (AD) and neuropsychiatric disorders (e.g., bipolar and schizophrenia), offering additional insights beyond traditional polygenic risk scores of these diseases. When applied to non-UKB subjects in the Alzheimer's Disease Neuroimaging Initiative study, IGS also stratified those at high risk for dementia. Our results demonstrate that the UKB imaging study, with its largely healthy participant base holds immense potential for stratifying the risk of various brain and body diseases in broader external genetic cohorts.

Presenter: Jun Young Park, University of Toronto
Title: Integrating multimodal neuroimaging with GWAS for identifying modality-level causal pathways to Alzheimer's disease
Abstract: The UK Biobank has produced thousands of (brain) imaging-driven phenotypes (IDPs) collected from more than 40,000 genotyped individuals, which facilitated the investigation of genetic and imaging biomarkers for brain disorders. Motivated by the efforts in genetics to integrate gene expression levels with genome-wide association studies (GWASs), recent methods in imaging genetics adopted an instrumental variable approach to identify causal IDPs for brain disorders. In this talk, we first discuss several methodological challenges of existing methods in achieving causality in imaging genetics, including horizontal pleiotropy and high dimensionality of candidate instrumental variables. We then propose testing the causality of each brain modality (structural, functional, and diffusion MRI) for each gene as a useful alternative, which offers flexibility in interpretation while maintaining reasonable statistical power and controlling for the pleiotropic effects of IDPs from other imaging modalities. We demonstrate the utility of the proposed method by using summary statistics data from the UK Biobank and the International Genomics of Alzheimer's Project (IGAP) study.

Statistical learning methods for neuroscience

Organizer: Shuheng Zhou, University of California, Riverside
Chair: Yize Zhao, Yale University

Presenter: Jian Kang, University of Michigan, Ann Arbor
Title: Deep kernel learning based Gaussian processes for Bayesian image regression analysis
Abstract: Regression models are widely used in neuroimaging studies to learn complex associations between clinical variables and image data. Gaussian process (GP) is one of the most popular Bayesian nonparametric methods and has been widely used as prior models for the unknown functions in those models. However, many existing GP methods need to pre-specify the functional form of the kernels, which often suffer less flexibility in model fitting and computational bottlenecks in large-scale datasets. To address these challenges, we develop a scalable Bayesian kernel learning framework for GP priors in various image regression models. Our approach leverages deep neural networks (DNNs) to perform low-rank approximations of GP kernel functions via spectral decomposition. With Bayesian kernel learning techniques, we achieve improved accuracy in parameter estimation and variable selection in image regression models. We establish large prior support and posterior consistency of the kernel estimations. Through extensive simulations, we demonstrate our model outperforms other competitive methods. We illustrate the proposed method by analyzing multiple neuroimaging datasets in different medical studies.

Presenter: Chunming Zhang, University of Wisconsin-Madison
Title: Learning network-structured dependence from non-stationary multivariate point process data
Abstract: Understanding sparse network dependencies among nodes from multivariate point process data has broad applications in information transmission, social science, and computational neuroscience. This paper introduces new continuous-time stochastic models for conditional intensity processes, revealing network structures within non-stationary multivariate counting processes. Our model's stochastic mechanism is crucial for inferring graph parameters relevant to structure recovery, distinct from commonly used processes like the Poisson, Hawkes, queuing, and piecewise deterministic Markov processes. This leads to proposing a novel marked point process for intensity discontinuities. We derive concise representations of their conditional distributions and demonstrate cyclicity pf the counting processes driven by recurrence time points. These theoretical properties enable us to establish statistical consistency and convergence properties for proposed penalized M-estimators in graph parameters under mild regularity conditions. Simulation evaluations showcase the method's computational simplicity and improved estimation accuracy compared to existing approaches. Real neuron spike train recordings are analyzed to interconnectivity in neuronal networks.

Presenter: Shuheng Zhou, University of California, Riverside
Title: Concentration of measure bounds for matrix-variate data with missing values
Abstract: We consider the following data perturbation model, where the covariates incur multiplicative errors. For two random matrices U, X, we denote by (U ○ X) the Hadamard or Schur product, which is defined as (U ◦ X)i;j = (Ui,j)(Xi,j). In this paper, we study the subgaussian matrix variate model, where we observe the matrix variate data through a random mask U : X = U ◦ X, where X = B1/2ZA1/2, where Z is a random matrix with independent subgaussian entries, and U is a mask matrix with either zero or positive entries, where [Uij] ⋲ [0, 1] and all entries are mutually independent. Under the assumption of independence between X and U, we introduce componentwise unbiased estimators for estimating covariance A and B, and prove the concentration of measure bounds in the sense of guaranteeing the restricted eigenvalue (RE) conditions to hold on the unbiased estimator for B, when columns of data matrix are sampled with different rates. We further develop multiple regression methods for estimating the inverse of B and show statistical rate of convergence. Our results provide insight for sparse recovery for relationships among entities (samples, locations, items) when features (variables, time points, user ratings) are present in the observed data matrix X with heterogeneous rates. Our proof techniques can certainly be extended to other scenarios. We provide simulation evidence illuminating the theoretical predictions.

Recent developments in statistical methodology for neuroimaging data analysis

Organizer: Dayu Sun, Indiana University School of Medicine
Chair: Xinyi Li, Clemson University

Presenter: Xin Ma, Columbia University Irving Medical Center
Title: High-dimensional measurement error models with application to brain functional connectivity
Abstract: Recently emerging large-scale biomedical data pose exciting opportunities for scientific discoveries. However, the ultrahigh dimensionality and nonnegligible measurement errors in the data create difficulties in estimation. There are limited methods for high-dimensional covariates with measurement errors, that usually require moments of the noise distribution to t the working model and are restricted to generalized linear models (GLM). In this work, we develop measurement error models involving high-dimensional covariates with correlated sub-Gaussian measurement errors for a class of Lipschitz loss functions that go beyond GLM family, and encompass logistic regression, hinge loss and quantile regression. Our estimator is designed to minimize the L1 norm among all estimators in suitable feasible sets, without requiring any knowledge of the noise distribution. Subsequently, we generalize these estimators to a lasso analog version that is computationally scalable to higher dimensions. We derive theoretical guarantees of finite sample statistical error bounds and sign consistency, even when the dimensionality increases exponentially with the sample size. Extensive simulation studies demonstrate superior performance compared to existing methods in classification and quantile regression problems. We apply the approach to a gender classification task based on functional connectivity and identify significant network edges that reveal gender differences.

Presenter: Shuo Chen, University of Maryland School of Medicine
Title: "Machine learning" to the mean and its correction: an application to imaging-based brain age prediction
Abstract: Machine learning models for continuous outcomes are more likely to yield biased predictions for outcomes with very large and small values. The predicted biases for large-valued outcomes are negative, while for small-valued outcomes, they are positive. We refer to this phenomenon as "machine learning to the mean." We first demonstrate this scenario across multiple applications and then attempt to explain the phenomenon from a theoretical perspective. We propose a general constrained optimization strategy to correct the bias and develop a computationally efficient algorithm for implementing the proposed method. The simulation results show that the predicted outcomes are unbiased by our correction method. We apply this new approach to predicting brain age using neuroimaging data, specifically addressing the issue of predicted age being highly correlated with chronological age which is the "machine learning to the mean" phenomenon in brain age prediction.

Presenter: Joshua Lukemire, Emory University
Title: Bayesian non-parametric factor models for estimating covariances across multiple subjects with repeated imaging runs
Abstract: Many fMRI studies require estimation of brain functional networks across multiple subjects with repeated measures of either same task condition or multiple different task conditions. However, most approaches to this problem either estimate the functional networks for each subject/session individually, or perform some form of group-level estimation. In this work we propose a Bayesian latent factor model that pools information across subjects and sessions to estimate subject/session specific connectivity matrices. The approach is based on a product of Dirichlet process mixtures (PDPM) prior that clusters latent factor loadings separately for each node in the brain, but that restricts sessions within subject to share the same cluster. Through simulations, we show that this approach is highly effective for both clustering subjects with similar connectivity patterns and estimating the overall brain network. An application is provided to the Human Connectome Project fMRI data.

Statistical inference in neuroimaging

Organizer: Eardi Lila, University of Washington
Chair: Shuo Chen, University of Maryland School of Medicine

Presenter: Benjamin Risk, Emory University
Title: Nonparametric motion adjustment in functional connectivity studies in children with autism spectrum disorder
Abstract: Autism Spectrum Disorder (ASD) is a neurodevelopmental condition associated with difficulties with social interactions, communication, and restricted or repetitive behaviors. To characterize ASD, investigators often use functional connectivity derived from resting-state functional magnetic resonance imaging of the brain. However, participants' head motion during the scanning session can induce motion artifacts. Many studies remove scans with excessive motion, which can lead to drastic reductions in sample size and introduce selection bias. To avoid such exclusions, we propose an estimand using causal inference methods that quantifies the difference in average functional connectivity in autistic and non-ASD children while standardizing motion relative to the low motion distribution in scans that pass motion quality control. We introduce a nonparametric estimator for motion control, called MoCo, that uses all participants and flexibly models the impacts of motion and other relevant features using an ensemble of machine learning methods. We establish large-sample efficiency and multiple robustness of our proposed estimator. The framework is applied to estimate the difference in functional connectivity between 132 autistic and 245 non-ASD children, of which 34 and 126 pass motion quality control. MoCo appears to dramatically reduce motion artifacts relative to no participant removal, while more efficiently utilizing participant data and accounting for possible selection biases relative to the naive approach with participant removal.

Presenter: Raphiel Murden, Emory University
Title: Probabilistic JIVE for brain morphometry and cognition
Abstract: Collecting multiple types of data on the same set of subjects is common in modern scientific applications including genomics, metabolomics, and neuroimaging. Joint and Individual Variation Explained (JIVE) seeks a low-rank approximation of the joint variation between two or more sets of features captured on common subjects and isolates this variation from that unique to each set of features. We propose a probabilistic model for the JIVE framework with subject random effects and develop an expectation-maximization (EM) algorithm to estimate the parameters of interest. Our model extends probabilistic PCA to the setting of multiple data sets, simultaneously estimating joint and individual components, which can lead to greater accuracy compared to other methods. We apply Pro- JIVE to measures of brain morphometry and cognition from the Alzheimer's Disease Neuroimaging Initiative. ProJIVE learns biologically meaningful sources of variation in brain morphometry and cognition. The joint morphometry and cognition subject scores are strongly related to expensive existing biomarkers.

Presenter: Daniel Kessler, University of Washington
Title: Computational Inference for Directions in Canonical Correlation Analysis
Abstract: Canonical Correlation Analysis (CCA) is a method for analyzing pairs of random vectors; it learns a sequence of paired linear transformations such that the resultant canonical variates are maximally correlated within pairs while uncorrelated across pairs. CCA outputs both canonical correlations as well as the canonical directions which define the transformations. While inference for canonical correlations is well developed, conducting inference for canonical directions is more challenging and not well-studied, but is key to interpretability. We propose a computational bootstrap method (combootcca) for inference on CCA directions. We conduct thorough simulation studies that range from simple and well-controlled to complex but realistic and validate the statistical properties of combootcca while comparing it to several competitors. We also apply the combootcca method to a brain imaging dataset and discover linked patterns in brain connectivity and behavioral scores.

Presenter: Simon Vandekar, Vanderbilt University
Title: Scalable FDR controlled functional confidence sets for arbitrary effect size images
Abstract: The field of neuroimaging research has acknowledged the limitations of hypothesis testing-based inference. As a solution, colleagues in biostatistics have developed procedures to construct spatial confidence sets for images that can be used to identify regions with target effect sizes above a given threshold with a specified probability. These confidence sets represent a paradigm shift in group-level inference for neuroimaging data, however, there is no generalized approach to estimate and construct confidence regions on a unitless scale. We derive the asymptotic distribution of the robust effect size index and use recently developed approaches to construct confidence sets from simultaneous confidence intervals to establish a confidence set procedure for effect sizes of arbitrary model parameters. Commonly used reliable inference procedures rely on bootstrapping or permutations, so can be slow in large samples. In contrast, our approach uses closed-form procedures so are scalable to large datasets. We evaluate their finite sample and use the methods to identify regions associated with diagnostic differences in the ABIDE dataset.

New developments for harmonization, processing and modeling for imaging data

Organizer: Yize Zhao, Yale University
Chair: Jun Young Park, University of Toronto

Presenter: Dana Tudorascu, University of Pittsburgh
Title: Data harmonization methods and analysis for Positron Emission Tomography (PET) imaging studies of Alzheimer's disease
Abstract: Multisite imaging studies increase statistical power and enable the generalization of research outcomes; however, due to the variety of imaging acquisition, different PET tracer properties and inter-scanner variability hinders the direct comparability of multi-scanner PET data. The PET imaging field is lacking behind in terms of harmonization methods due to the complexity associated with combination of different tracers and different scanners. In this study we investigate samples of cognitively normal participants, mild cognitive impaired and Alzheimer's disease subjects in two major multisite studies of Alzheimer's disease. We present challenges and solutions associated with different PET tracers analysis and harmonization techniques including simple imaging standardization, Combat and deep learning methods. We show regions of interest differences in PET outcome measures before and after the harmonization in multisite studies of Alzheimer's Disease.

Presenter: Selena Wang, Yale University
Title: Sex-specific topological structure associated with dementia and MCI via latent space estimation
Abstract: Statistical network analysis has transformed neuroimaging research in recent years by enabling flexible and intuitive integration of multiple data types and preserving the topological brain connectivity structure while uncovering mechanism of degenerative aging. In this study, we apply a novel latent space joint network model to perform a case-control comparison using the functional connectivity data together with region-specific cortical volume, cortical thickness, surface area and PET information from the third release of the ADNI study. By preserving complex network structures during imaging biomarker detection, we find sex-specific topological structures associated with dementia. For female subjects, areas of connectivity edges that are impacted by dementia and MCI tend to follow the organizational topological structure of the brain. In contrast, areas of connectivity edges that are impacted by dementia and MCI for the male subjects do not follow such structures. For female subjects, the core brain regions with connectivity across the whole brain are most impacted by the development of dementia, which is not true for male subjects.

Presenter: Zhengwu Zhang, University of North Carolina Chapel Hill
Title: CoCoNest: a continuous structural connectivity-based nested parcellation of the human cerebral cortex
Abstract: Despite the widespread exploration and availability of parcellations for the functional connectome, parcellations designed for the structural connectome are comparatively limited. Current research suggests that there may be no single 'correct' parcellation and that the human brain is intrinsically a multi-resolution entity. In this work, we propose the CoCoNest family of parcellations — a fully data-driven, multi-resolution family of parcellations constructed from structural connectome data. The CoCoNest family is constructed using agglomerative (bottom-up) clustering and error-complexity pruning, which strikes a balance between the complexity of the parcellation and how well it preserves patterns in vertex-level high-resolution connectivity data. We draw on an intensive battery of internal and external evaluation metrics to show that the CoCoNest family is competitive with or outperforms widely used parcellations in the literature. Additionally, we show how the CoCoNest family can serve as an exploratory framework for researchers to investigate the organization of the structural connectome across various resolutions.

Presenter: Tsung-Hung Yao, MD Anderson
Title: Bayesian nonparametric product mixtures for multi-resolution clustering of functions
Abstract: There is a rich literature on clustering functional data with applications to time-series modeling, trajectory data, and even spatio-temporal applications. However, existing methods assume replicated clusters that enforce identical atom values for all members allocated to the same cluster. While such an assumption may be acceptable for clustering scalar or lowdimensional vectors, it may not be meaningful when clustering high-dimensional functions observed at thousands of instances for each sample. A prominent example of this type of problem pertains to the clustering of high-dimensional images derived from neuroimaging applications or even spatial transcriptomics problems involving a large number of spots in the tissue. For such problems, units are expected to cluster based on a subset of informative regions in the image only, with the remaining imaging regions not being instrumental in the clustering process. In order to tackle such problems, we propose a non-parametric Bayesian approach for multi-resolution clustering of high-dimensional functions. In particular, we express the random functions in terms of a wavelet basis expansion coupled with an additive noise term and impose independent Dirichlet process priors on coefficients corresponding to varying wavelet resolutions. The proposed model results in a product of DPM priors imposed on the wavelet coefficients and is shown to result in posterior consistency in recovering the true density of the random functions, as the number of samples grows to infinity while keeping the number of observed instances for each function fixed. We apply the proposed approach to clustering high-dimensional images in neuroimaging applications in order to infer heterogeneous subsets of subjects, as well as spatial transcriptomics applications where the goal is to infer clusters of genes with distinct transcriptomics mechanisms. The operating characteristics of the model are also evaluated via extensive simulations that reveal the considerable advantages in performance under the proposed methods over classical clustering methods.

Invariance and distribution/density objects in neuroimaging studies

Organizer: Yi Zhao, Indiana University School of Medicine
Chair: Eardi Lila, University of Washington

Presenter: Bonnie Smith, Johns Hopkins Bloomberg School of Public Health
Title: Regression models for partially localized fMRI connectivity analyses
Abstract: We propose the use of subject-level regression models for brain functional connectivity. Covariates can include characteristics such as geographic distance between two brain regions, symmetry between the regions, and functional networks to which the two regions belong. Connectivity regression models can be used either with data that have been normalized to a common template, or in settings where each subject's data is left in its own geometry. This style of analysis allows us to characterize the relative importance of each type of predictor, and also provides a parsimonious way of summarizing each subject's connectivity that can be used in group-level comparisons. We apply our approach to Human Connectome Project data, and we investigate data repeatability using our model versus using two alternative approaches.

Presenter: Changbo Zhu, University of Notre Dame
Title: Geodesic optimal transport regression
Abstract: Classical regression models do not cover non-Euclidean data that reside in a general metric space, while the current literature on non-Euclidean regression by and large has focused on scenarios where either predictors or responses are random objects, i.e., non-Euclidean, but not both. In this paper we propose geodesic optimal transport regression models for the case where both predictors and responses lie in a common geodesic metric space and predictors may include not only one but also several random objects. This provides an extension of classical multiple regression to the case where both predictors and responses reside in non-Euclidean metric spaces, a scenario that has not been considered before. It is based on the concept of optimal geodesic transports, which we de- ne as an extension of the notion of optimal transports in distribution spaces to more general geodesic metric spaces, where we characterize optimal transports as transports along geodesics. The proposed regression models cover the relation between non-Euclidean responses and vectors of non-Euclidean predictors in many spaces of practical statistical interest. These include one-dimensional distributions viewed as elements of the 2-Wasserstein space and multidimensional distributions with the Fisher-Rao metric that are represented as data on the Hilbert sphere. Also included are data on finite-dimensional Riemannian manifolds, with an emphasis on spheres, covering directional and compositional data, as well as data that consist of symmetric positive definite matrices. We illustrate the utility of geodesic optimal transport regression with data on summer temperature distributions and human mortality.

Presenter: Yi Zhao, Indiana University School of Medicine
Title: Density-on-density regression
Abstract: In this study, a density-on-density regression model is introduced, where the association between densities is elucidated via a warping function. The proposed model has the advantage of a being straightforward demonstration of how one density transforms into another. Using the Riemannian representation of density functions, which is the square-root function (or half density), the model is defined in the correspondingly constructed Riemannian manifold. To estimate the warping function, it is proposed to minimize the average Hellinger distance, which is equivalent to minimizing the average Fisher-Rao distance between densities. An optimization algorithm is introduced by estimating the smooth monotone transformation of the warping function. Asymptotic properties of the proposed estimator are discussed. Simulation studies demonstrate the superior performance of the proposed approach over competing approaches in predicting outcome density functions. Applying to a proteomic-imaging study from the Alzheimer's Disease Neuroimaging Initiative, the proposed approach illustrates the connection between the distribution of protein abundance in the cerebrospinal uid and the distribution of brain regional volume. Discrepancies among cognitive normal subjects, patients with mild cognitive impairment, and Alzheimer's disease (AD) are identified and the findings are in line with existing knowledge about AD.

Advances in statistical method for neuroimaging data

Organizer: Selena Wang, Yale University
Chair: Xin Ma, Columbia University Irving Medical Center

Presenter: Dayu Sun, Indiana University School of Medicine
Title: Sparse partial generalized tensor regression
Abstract: Tensor data, often characterized as multidimensional arrays, have become increasingly prevalent in biomedical studies, particularly in neuroimaging applications. Analyzing these complex datasets can be challenging due to the high-dimensionality and inherent structures within tensors. In this work, we propose the Sparse Partial Generalized Tensor Regression (SPGTR) method for modeling general types of outcomes involving both tensor and vector/scalar predictors. Our novel mode-wise penalized manifold optimization techniques enable us to achieve dimension reduction and sparsity in tensor coefficient estimation, improving the overall prediction performance. We establish the asymptotic behavior of the proposed estimation. We demonstrate the effectiveness of the SPGTR through extensive simulation studies and showcase its application in investigating the association between posttraumatic stress disorder (PTSD) and brain connectivity matrices derived from functional magnetic resonance imaging (fMRI) data.

Presenter: Yaotian Wang, Emory University
Title: An empirical-topology-informed Bayesian blind source separation for investigating whole-brain functional connectivity
Abstract: Blind source separation (BSS) is one of the major methods for functional magnetic resonance imaging (fMRI) analysis. From voxel-level fMRI data to region-of-interest (ROI) - level functional connectivity (FC) matrices (e.g., Pearson correlations of fMRI data), various BSS methods have been developed to decompose these data into scientifically meaningful and insightful latent sources. These methods generally do not utilize any empirical topology information, such as the spatial information between ROIs. However, existing studies and theories suggest that spatial distance is an important factor that influences the property of FC. For example, the compensatory theory suggests that aging has different effects on long- and short-distance connections. Results from an aging brain study that neglects spatial information may fail to capture scientifically important nuances in the brain. Furthermore, without taking into account the brain's empirical topology, ROIs are typically treated as exchangeable, leading to less reliable findings. Therefore, to produce scientifically meaningful and reliable blind source separation, an empirical-topology-informed method is called for. In this talk, I will present a novel BSS method that integrates empirical topology information in a unified Bayesian framework and the identified latent sources underlying the functional connectome in fMRI data.

Presenter: Zhiling Gu, Iowa State University
Title: Statistical learning and inference of surface-based functional data with applications in neuroimaging analysis
Abstract: Surface-based neuroimaging analysis has gained significant attention in recent years due to its ability to capture fine-grained spatial information and provide insights into brain structure and function. In this paper, we present an advanced nonparametric method for learning and inferring for surface-based functional data, facilitating accurate estimation of underlying signals and efficient detection and localization of significant effects. We propose a framework that leverages advanced statistical modeling approaches, including spherical splines on triangulations and next-generation function data analysis, to handle the challenges associated with surface-based data, such as irregular sampling and spatial dependencies. Furthermore, we propose a novel approach for constructing simultaneous confidence corridors (SCCs), which effectively quantify estimation uncertainty. These SCCs provide a comprehensive representation of the uncertainty in the estimated functional patterns and facilitate reliable inference. Furthermore, the procedure is extended to accommodate comparisons between groups of samples, enabling the analysis of group differences or treatment effects. We establish the asymptotic properties of the proposed estimators and SCCs, and provide a computationally efficient procedure for constructing the SCCs. To evaluate the finite-sample performance, we conduct numerical experiments and apply the methods to real-data analysis using the cs-fMRI data provided by the Human Connectome Project Consortium (HCP).

Graph-based network connectomes analysis

Organizer: Simon Vandekar, Vanderbilt University
Chair: Zhengwu Zhang, University of North Carolina Chapel Hill

Presenter: Eardi Lila, University of Washington
Title: Integrative analysis of dynamic functional connectomes and high-dimensional data
Abstract: We introduce a novel statistical method for the integrative analysis of neuroimaging and high-dimensional data. The motivating application is the exploration of the dependence structure between each subject's dynamic functional connectivity — represented by a temporally indexed collection of positive definite covariance matrices — and high-dimensional data representing lifestyle, demographic, and psychometric measures. To this purpose, we employ a regression-based reformulation of canonical correlation analysis that allows us to control the complexity of the connectivity canonical directions within a Riemannian framework, using tangent space principal components analysis, and that of the high-dimensional canonical directions via a sparsity-promoting penalty. The proposed method shows improved empirical performance over alternative approaches. Its application to data from the Human Connectome Project reveals a dominant mode of covariation between dynamic functional connectivity and lifestyle, demographic, and psychometric measures. This mode aligns with results from static connectivity studies but reveals a unique temporal non-stationary pattern that such studies fail to capture.

Presenter: Sean L. Simpson, Wake Forest University
Title: Regression Frameworks for Brain Network Distance Metrics
Abstract: Brain network analyses have exploded in recent years, and hold great potential in helping us understand normal and abnormal brain function. Network science approaches have facilitated these analyses and our understanding of how the brain is structurally and functionally organized. However, the development of statistical methods that allow relating this organization to health outcomes has lagged behind. We have attempted to address this need by developing regression frameworks for brain network distance metrics that allow relating system-level properties of brain networks to outcomes of interest. These frameworks serve as synergistic fusions of statistical approaches with network science methods, providing needed analytic foundations for whole-brain network data. Here we delineate these approaches that have been developed for single-task, multi-task/multi-session, and multilevel brain network data. These tools help expand the suite of analytical tools for whole-brain networks and aid in providing complementary insight into brain function.

Presenter: Panpan Zhang, Vanderbilt University Medical Center
Title: Graph-based methods for functional brain network analysis
Abstract: Functional magnetic resonance imaging (fMRI) has been widely used to discover the neural underpinnings of cognition decline caused by neurological disorders. Graph-based methods are prevalent for the analysis of brain networks constructed from fMRI data. The precise construction of functional brain networks is critical when using network-based measures as predictors in downstream analyses. This talk will discuss popular approaches to functional brain network construction. The assessment is done through both simulations and an application to a longitudinal Alzheimer's Disease study.

Presenter: Tingting Zhang, University of Pittsburgh
Title: Analysis of functional brain network changes from childhood to old age: a study using HCP-D, HCPYA, and HCP-A datasets
Abstract: We present a new clustering-enabled regression approach designed to investigate how whole-brain functional connectivity (FC) in healthy subjects changes from childhood to old age. By applying this method to aggregated fMRI data from three Human Connectome Projects, we identify clusters of brain regions that share similar trajectories of FC changes with age. Our findings reveal that age affects FC in a varied manner across different brain regions. Most brain connections experience minimal yet statistically significant FC changes with age. Only a tiny proportion of connections exhibit substantial age-related changes in FC. Among these connections, FC between brain regions in the same functional network tends to decrease over time, while FC between regions in different networks demonstrates diverse patterns of age-related changes, underscoring the intricate nature of brain aging processes. Moreover, our research uncovers sex-specific trends in FC changes; while average FC is comparable in childhood for both sexes, it becomes increasingly different with aging. Elderly females show much higher FC within the default mode network and in certain between-network connections of the somatomotor network, whereas elderly males display higher FC across multiple brain networks. Furthermore, our study suggests that the relationship between cognitive behavior and FC is nuanced, being most influenced by age and sex during childhood, less influenced in older adults, and to the least extent in young adults.

Collaborative case study: Statistical methods for dissecting tumor microenvironment based on spatial proteomics datasets

Organizer: Souvik Seal, Medical University of South Carolina
Chair: Selena Wang, Yale University

Presenter: Thao Vu, University of Colorado Anschutz Medical Campus
Title: FunSpace: A functional and spatial analytic approach to cell imaging data using entropy measures
Abstract: Spatial heterogeneity in the tumor microenvironment (TME) plays a critical role in gaining insights into tumor development and progression. Conventional metrics typically capture the spatial differential between TME cellular patterns by either exploring the cell distributions in a pairwise fashion or aggregating the heterogeneity across multiple cell distributions without considering the spatial contribution. As such, none of the existing approaches has fully accounted for the simultaneous heterogeneity caused by both cellular diversity and spatial configurations of multiple cell categories. In this article, we propose an approach to leverage spatial entropy measures at multiple distance ranges to account for the spatial heterogeneity across different cellular organizations. Functional principal component analysis (FPCA) is applied to estimate FPC scores which are then served as predictors in a Cox regression model to investigate the impact of spatial heterogeneity in the TME on survival outcome, potentially adjusting for other confounders. Using a non-small cell lung cancer dataset (n = 153) as a case study, we found that the spatial heterogeneity in the TME cellular composition of CD14+ cells, CD19+ B cells, CD4+ and CD8+ T cells, and CK+ tumor cells, had a significant non-zero effect on the overall survival (p = 0.027). Furthermore, using a publicly available multiplexed ion beam imaging (MIBI) triple- negative breast cancer dataset (n = 33), our proposed method identified a significant impact of cellular interactions between tumor and immune cells on the overall survival (p = 0.046). In simulation studies under different spatial configurations, the proposed method demonstrated a high predictive power by accounting for both clinical effect and the impact of spatial heterogeneity.

Presenter: Jiangmei Xiong, Vanderbilt University
Title: GammaGateR: semi-automated marker gating for single-cell multiplexed imaging
Abstract: Multiplexed immunofluorescence (mIF) is an emerging assay for multichannel protein imaging that can decipher cell-level spatial features in tissues. However, existing automated cell phenotyping methods, such as clustering, face challenges in achieving consistency across experiments and often require subjective evaluation. As a result, mIF analyses often revert to marker gating based on manual thresholding of raw imaging data. To address the need for an evaluable semi-automated algorithm, we developed Gamma- GateR, an R package for interactive marker gating designed specifically for segmented cell-level data from mIF images. Based on a novel closed-form gamma mixture model, GammaGateR provides estimates of marker-positive cell proportions and soft clustering of marker-positive cells. The model incorporates user-specified constraints that provide a consistent but slidespeci c model fit. We compared GammaGateR against the newest unsupervised approach for annotating mIF data, employing two colon datasets and one ovarian cancer dataset for the evaluation. We showed that GammaGateR produces highly similar results to a silver standard established through manual annotation. Furthermore, we demonstrated its effectiveness in identifying biological signals, achieved by mapping known spatial interactions between CD68 and MUC5AC cells in the colon and by accurately predicting survival in ovarian cancer patients using the phenotype probabilities as input for machine learning methods. GammaGateR is a highly efficient tool that can improve the replicability of marker gating results, while reducing the time of manual segmentation.

Presenter: Julia Wrobel, Emory University
Title: A scalable robust K-statistic for quantifying immune-cell clustering in multiplex imaging data
Abstract: The tumor microenvironment (TME), which characterizes the tumor and its surroundings, plays a critical role in understanding cancer development and progression. Recent advances in imaging techniques, including multiplex immunofluorescence (mIF) imaging, enable researchers to study spatial structure of the TME at a single-cell level. The most relevant approaches for analyzing spatial relationships between cell types in mIF data are based on point process theory, and among these Ripley's K statistic and its derivatives are both extremely popular and highly effective. In this framework, the location of cells in mIF data are treated as following a point process, realizations of a point process are called "point patterns", and these models seek to understand correlations in the spatial distributions of cells. Under the assumption that the rate of a cell is constant over an entire region of interest a point pattern will exhibit complete spatial randomness (CSR), and it is often of interest to model whether cells deviate from CSR either through clustering or repulsion. In mIF data estimation issues can arise when the sample has holes due to the shape of the tissue, folds, or tears, resulting in patches of areas on the slide where no cells are present. This can bias the estimation of Ripley's K due to violation of the CSR assumption of spatial homogeneity. One correction of this violation accounts for regions where no cells were present by permuting an empirical value of complete spatial randomness, and then comparing observed spatial summary statistic values to that obtained by this empirical null distribution. This x works well in small samples, but is computationally infeasible as the number of cells per image increases. To improve on this, we derived a closed form representation of the permuted null distribution for Ripley's K which is fast and easy to implement using existing software. We examine the performance of this statistic in simulations and open-source mIF data.

Presenter: Junsouk Choi, University of Michigan
Title: Gaussian process spatial topic modeling for unsupervised discovery of spatial tissue architecture in multiplexed imaging
Abstract: Recent development of technologies such as multiplexed imaging and spatial transcriptomics allows for direct observation of cellular phenotypes and cellular interactions in intact tissues, enabling highly resolved spatial characterization of cellular phenotypes. A common research question in analyzing such data is identifying higher-order patterns of tissue organization, which holds systematic implications for disease pathology and clinical outcomes. To address this, we propose a novel topic modeling approach to identify the higher-order architecture of tissues and recover signatures of characteristic cellular microenvironments that are potential determinants of patient outcomes. Our method infers the local distribution of cell types as a representation of cellular microenvironment and incorporates spatial information through Gaussian processes to ensure spatial coherence among neighboring microenvironments. By applying the proposed topic model to publicly available multiplexed imaging data, we uncover higher-order architectures within lung cancer tissues and identify tertiary lymphoid structures, which are closely linked to the patient survival.

When machine learning and generative models meet imaging, network and point cloud data

Organizer: Zhengwu Zhang, University of North Carolina Chapel Hill
Chair: Benjamin Risk, Emory University

Presenter: Mingxia Liu, University of North Carolina Chapel Hill
Title: Enhancing multi-site multi-modal neuroimage analysis through advanced AI techniques
Abstract: Multi-site multi-modal neuroimaging data, such as magnetic resonance imaging (MRI) and positron emission tomography (PET), are critical to expanding the diversity of subject populations and enhancing the statistical robustness of predictive models in neuroscience research. Despite their potential, the field faces substantial challenges, notably the heterogeneity of data across imaging sites and modalities. Addressing these complexities, my research focuses on creating machine learning and deep learning methodologies to analyze multi-modal imaging data from multiple sites, with the goal of uncovering imaging biomarkers associated with neurodegenerative disorders. This talk will delineate our progress in address three long- standing challenges: neuroimage representation learning, multimodality neuroimage fusion, and multi-site data adaptation. Key highlights will include our latest advances in the representation learning of MRI, capturing both structural and functional dimensions. Subsequently, I will elucidate our strategies for the effective integration of multi-modal neuroimaging data, which promises the accurate synthesis of MRI and PET scans, particularly beneficial in cases plagued by missing or incomplete data modalities. Concluding the talk, I will introduce our comprehensive suite of multi-site neuroimage harmonization techniques and unveil DomainATM, our open-source toolbox specifically designed for medical data adaptation.

Presenter: Maoran Xu, Duke University
Title: Identifiable and interpretable nonparametric factor analysis
Abstract: Factor models have been widely used to summarize the variability of high-dimensional data through a set of factors with much lower dimensionality. Gaussian linear factor models have been particularly popular due to their interpretability and ease of computation. However, in practice, data often violate the multivariate Gaussian assumption. To characterize higher-order dependence and nonlinearity, models that include factors as predictors in flexible multivariate regression are popular, with GP-LVMs using Gaussian process (GP) priors for the regression function and VAEs using deep neural networks. Unfortunately, such approaches lack identifiability and interpretability and tend to produce brittle and nonreproducible results. To address these problems by simplifying the nonparametric factor model while maintaining flexibility, we propose the NIFTY framework, which parsimoniously transforms uniform latent variables using one dimensional nonlinear mappings and then applies a linear generative model. The induced multivariate distribution falls into a flexible class while maintaining simple computation and interpretation. We prove that this model is identifiable and empirically study NIFTY using simulated data, observing good performance in density estimation and data visualization. We then apply NIFTY to bird song data in an environmental monitoring application.

Presenter: Yuexuan Wu, University of Washington
Title: Topological network analysis of protein aggregates in Alzheimer's disease using PET imaging data
Abstract: Alzheimer's disease (AD) is characterized by the accumulation of beta-amyloid (Aβ) and tau proteins in the brain. Understanding the interplay between these proteins and their spatial distribution could provide insights into disease progression. In this study, we introduce a novel approach to investigate the topological features of Aβ and tau networks across different cognitive groups using positron emission tomography (PET) images. We construct networks via partial correlation matrices between the standardized uptake value ratio in specific regions of interest (ROI's). We employ the bi-filtered persistent homology to explore these networks' topological characteristics for Aβ and tau modalities. We further examine networks' hierarchical tree structures, focusing on comparing consistent pairs of regions positive for Aβ and tau presence across different cognitive groups. The results unveil complex structures in PET images by pinpointing consistent patterns in ROI's associated with Aβ and tau localization, which serve as potential biomarkers for AD progression. The study also highlights tau's more complex aggregate behavior and its stronger association with AD.

Presenter: Xinyi Li, Clemson University
Title: Nonparametric Learning from 3D Point Clouds
Abstract: In recent years, there has been an exponentially increased amount of point clouds collected with irregular shapes in various areas. Motivated by the importance of solid modeling for point clouds, we develop a novel and efficient smoothing tool based on multivariate splines over the triangulation to extract the underlying signal and build up a 3D solid model from the point cloud. The proposed method can denoise or deblur the point cloud effectively, provide a multi-resolution reconstruction of the actual signal, and handle sparse and irregularly distributed point clouds to recover the underlying trajectory. In addition, our method provides a natural way of numerosity data reduction. We establish the theoretical guarantees of the proposed method, including the convergence rate and asymptotic normality of the estimator, and show that the convergence rate achieves optimal nonparametric convergence. We also introduce a bootstrap method to quantify the uncertainty of the estimators. Through extensive simulation studies and a real data example, we demonstrate the superiority of the proposed method over traditional smoothing methods in terms of estimation accuracy and efficiency of data reduction.

Novel statistical inference methods with applications

Organizer: Julia Fisher, University of Arizona, BIO5 Institute, Statistics Consulting Laboratory
Chair: Simon Vandekar, Vanderbilt University

Presenter: Fatma Parlak, Indiana University Bloomington
Title: A robust multivariate, non-parametric outlier identification method for scrubbing in fMRI
Abstract: FMRI data are prone to noise and artifacts, requiring their removal for reliable analysis. Traditional scrubbing methods rely on head motion or ad hoc signal changes, but these may be insufficient. Our innovative approach treats scrubbing as outlier detection, viewing volumes with artifacts as multidimensional outliers. Existing methods assume Gaussianity, but fMRI data violate these assumptions. We present a robust outlier detection method applicable to non-Gaussian data, aiming to establish thresholds based on robust distances. Two threshold options cater to researchers' preferences for data retention or sensitivity. Our procedure involves dimension reduction, robust univariate outlier imputation, and threshold estimation based on upper quantiles. Threshold choices include the empirical distribution of robust distances and nonparametric bootstrap estimate. Comparative analysis with existing scrubbing methods highlights the efficacy and versatility of our approach in addressing non-Gaussian data and improving outlier detection in fMRI studies.

Presenter: Daniel Adrian, Grand Valley State University
Title: Improved activation detection from magnitude and phase functional MRI data
Abstract: Functional MRI is a popular noninvasive technique for mapping brain regions activated by specific brain functions. FMRI data consist of both magnitude and phase components (i.e., it is complex-valued), but in the vast majority of statistical analyses, only the magnitude data is utilized and modeled based on a Gaussian approximation. We show that using the correct Ricean distribution for the magnitudes, as well as the entire complex-valued data, results in improved activation detection — for activation in the magnitude component. Further, as fMRI measures brain activity indirectly through blood flow, the so-called "brain or vein" problem refers to the difficulty in determining whether measured activation corresponds to (desired) brain tissue or (undesired) large veins, which may be draining blood from neighboring regions. Previous work has demonstrated that activation in the phase component "discriminates" between the two: phase activation occurs in voxels with large, oriented vessels but not in voxels with small, randomly oriented vessels immediately adjacent to brain tissue. Following this motivation, we have developed a model that allows for activation in the phase and magnitude components.

Presenter: Yueyang Shen, University of Michigan
Title: Imaging statistics, invariance and spacekime analytics
Abstract: This talk will present the theoretical foundations of symmetries with an emphasis on neural network modeling and statistical inference in imaging. We will demonstrate roto-translational, scaling, and reparametrization symmetries, along with invariant and equivariant computational statistical metrics using imaging data. Neural networks realize such symmetries through weight-sharing or emergent data augmentation invariance. Information compression and (minimal) sufficient statistics are dual to identifying symmetries and quotienting out irrelevances.

We plan to show empirical results from two fMRI datasets: finger tapping and the music stimuli. The former neuroimaging data is collected from patients switching between resting and finger tapping tasks and the latter examines the emerging neural network activation responding to different music genres. Complex-time (kime) representation of longitudinal data leads to novel spacekime analytics, which enables peering into repeated-measurement information contained in low signal-to-noise ratio fMRI data. Specifically, the kime phases are coupled to the random variability in the repeated sampling. The observed time-courses are transformed as kimesurfaces encoding the distribution of the temporal information into richer computable data objects (manifolds). This allows us to characterize and analyze fMRI data using voxel-based or ROI- based approaches, as well as to synthetically generate realistic neuroimaging data.

Presenter: Jose Rodriguez-Acosta, Texas A&M University
Title: A novel classification framework using a multilayer network predictor
Abstract: We introduce a novel statistical framework for exploring the correlation between brain stimulation and regional brain activation. Using a generalized linear modeling framework, we predict binary outcomes, such as regional brain activation during external stimuli. Our predictive model utilizes multilayer networks to capture interactions among brain network nodes. Traditional regression methods with multilayer network predictors often struggle to effectively utilize information across graph layers, leading to less accurate inference, especially with smaller sample sizes. To address this, our method models edge coefficients at each network layer using bilinear interactions between latent effects associated with connected nodes. We also employ a variable selection framework to identify influential nodes linked to observed outcomes. Importantly, our framework is computationally efficient and provides uncertainty quantification in node identification, coefficient estimation, and binary outcome prediction. Simulation studies demonstrate the superior performance of our approach in inference and prediction.

Frontiers in medical imaging: harnessing artificial intelligence and statistical analysis for breakthrough insights

Organizer: Lei Liu, Washington University in St. Louis
Chair: Dayu Sun, Indiana University School of Medicine

Presenter: Yize Zhao, Yale University
Title: Bayesian mixed model inference for genetic association under related samples with brain network phenotype
Abstract: Genetic association studies for brain connectivity phenotypes have gained prominence due to advances in non-invasive imaging techniques and quantitative genetics. Brain connectivity traits, characterized by network configurations and unique biological structures, present distinct challenges compared to other quantitative phenotypes. Furthermore, the presence of sample relatedness in most imaging genetics studies limits the feasibility of adopting existing network-response modeling. Here, we fill this gap by proposing Bayesian network-response mixed-effect models that consider a network-variate phenotype and incorporates either population structures or sample relatedness. To accommodate the inherent topological architecture associated with the genetic contributions to the phenotype, we model the effect components via a set of effect network configurations and impose an inter-network sparsity and intra-network shrinkage to dissect the phenotypic network configurations affected by the risk genetic variant. We evaluate the performance of our model through extensive simulations. We also study the genetic bases for brain structural connectivity using data from Human Connectome Project and Adolescent Brain Cognitive Development studies, and obtain plausible and interpretable results.

Presenter: Yifan Peng, Cornell University
Title: Image-based primary open-angle glaucoma diagnosis and prognosis
Abstract: Primary open-angle glaucoma (POAG) is one of the leading causes of blindness globally and in the US, potentially affecting an estimated 111.8 million people by 2040. Among these patients, 5.3 million may be bilaterally blind. POAG remains asymptomatic until it reaches an advanced stage, leading to visual field loss. However, early diagnosis and treatment can avoid most blindness caused by POAG. Therefore, accurately identifying individuals with glaucoma is critical to clinical decision-making. In recent years, developments in artificial intelligence have offered the potential for automatic POAG diagnosis and prognosis using fundus photographs. In this talk, I will review our research on image-based POAG diagnosis and prognosis. I will also discuss how we are working to ensure model fairness across protected groups in deep learning models. Our proposed approach aims to alleviate concerns about the fairness and reliability of image-based computer-aided diagnosis.

Presenter: Lei Liu, Washington University in St. Louis
Title: Deep learning models to predict primary open-angle glaucoma
Abstract: Glaucoma is a major cause of blindness and vision impairment worldwide, and visual field (VF) tests are essential for monitoring the conversion of glaucoma. While previous studies have primarily focused on using VF data at a single time point for glaucoma prediction, there has been limited exploration of longitudinal trajectories. Additionally, many deep learning techniques treat the time-to-glaucoma prediction as a binary classification problem (glaucoma Yes/No), resulting in the misclassification of some censored subjects into the nonglaucoma category and decreased power. To tackle these challenges, we propose and implement several deep-learning approaches that naturally incorporate temporal and spatial information from longitudinal VF data to predict time-to- glaucoma. When evaluated on the Ocular Hypertension Treatment Study (OHTS) dataset, our proposed convolutional neural network (CNN)-long short term memory (LSTM) emerged as the top-performing model among all those examined. The implementation code can be found on GitHub.

Presenter: Haoda Fu, Eli Lilly
Title: LLM is not all you need. Generative AI on smooth manifolds
Abstract: Generative AI is a rapidly evolving technology that has garnered significant interest lately. In this presentation, we'll discuss the latest approaches, organizing them within a cohesive framework using stochastic differential equations to understand complex, high-dimensional data distributions. We'll highlight the necessity of studying generative models beyond Euclidean spaces, considering smooth manifolds essential in areas like robotics and medical imagery, and for leveraging symmetries in the de novo design of molecular structures. Our team's recent advancements in this blossoming field, ripe with opportunities for academic and industrial collaborations, will also be showcased.

Poster Abstracts

Title: Phase-amplitude modeling of functional data from gold nanoparticle tomography images and model evaluation
Presenter: Chen Mu
Abstract: The goal of this poster is to develop a probabilistic model for functional data based on tomographic images. We identify two functional data sets of interest (from gold nanoparticle tomography images), apply the techniques for phase-amplitude separation, model these component using individual functional principal component analysis (FPCA) and perform some analysis of these components, and evaluate goodness of the model.

Title: Mediation analysis with graph mediator
Presenter: Yixi Xu
Co-authors: Yi Zhao
Abstract: This study introduces a mediation analysis framework when the mediator is a graph. A Gaussian covariance graph model is assumed for graph representation. Causal estimands and assumptions are discussed under this representation. With a covariance matrix as the mediator, parametric mediation models are imposed based on matrix decomposition. Assuming Gaussian random errors, likelihood-based estimators are introduced to simultaneously identify the decomposition and causal parameters. An efficient computational algorithm is proposed and the asymptotic consistency of the estimators is investigated. Via simulation studies, the performance of the proposed approach is evaluated. Applying to a resting-state fMRI study, a brain network is identified within which functional connectivity mediates the sex difference in the performance of a motor task.

Title: Inter-/intra-patient variability in multiplex imaging
Presenter: Grant Carr
Co-authors: Maria Masotti, Veera Baladandayuthapani, Timothy Frankel
Abstract: Multiplex imaging has become increasingly used to examine the tumor microenvironment (TME). Multiplex imaging allows researchers to measure protein expressions in a biological sample at the cell level, while also preserving the location of cells in the tissue. One important decision when handling multiplex imaging data is choosing how many regions of interest (ROIs) to analyze for each patient. There is little guidance available in choosing these ROIs to ensure a properly powered study. Here, we use two lung cancer datasets to quantify between- versus within-patient variability in the prevalence of cellular phenotypes. We show that most of the variability in phenotypic prevalence is found within rather than between patients. Sample size calculations based on existing formulas incorporating the estimated intraclass correlation coefficient (ICC) are also discussed.

Title: Multi-view multivariate mediation analysis
Presenter: Wei Wang
Co-authors: Sandra E. Safo
Abstract: Many biomedical studies generate data from multiple sources or views with a main goal of integrating these diverse but complementary data for deeper biological insights. Most existing integrative analysis methods only consider associations among the views and an outcome without inferring potential causal relationships. Mediation analysis explores causal relationships between exposures and an outcome through including a mediator as an intermediate variable. Existing mediation analysis methods consider only single variate and single view exposures, and none incorporates multi-view exposures. We propose Multi-view Multivariate Mediation Analysis (MMM), which considers both multivariate exposures and mediators and incorporates multi-view exposures. MMM integrates multi-view exposures by identifying disentangled common drivers accounting for indirect effects via a multivariate mediator, and direct effects to be estimated separately. Simulation studies are used to demonstrate the effectiveness of MMM. MMM is applied to data from the ADNI study to explore underlying mechanisms of Alzheimer's Disease.

Title: A CAIPI approach for simultaneous multi-slice technique to increase activation detection in fMRI
Presenter: Ke Xu
Co-authors: Daniel Rowe
Abstract: FMRI has been a powerful and safe medical imaging tool to study the function of the brain by demonstrating the spatial and temporal changes in brain metabolism in recent decades. To capture brain functionality more efficiently, efforts have been made to accelerate the number of images acquired per unit of time that create each volume image without losing full anatomical structure. The Simultaneous Multi-Slice (SMS) technique provides a reconstruction method where multiple slices are acquired and aliased concurrently. Traditional imaging techniques such as SENSE and GRAPPA can reconstruct an image from less measured data but have their drawbacks. The Controlled Aliasing in Parallel Imaging (CAIPI) technique achieves slice-wise image shift to decrease the influence of the geometry factor (g-factor) of coil sensitivities and prevents the singular problem of the design matrix. In this project, a CAIPI approach for multi-coil separation of parallel encoded complex-valued slices (mSPECS-CAIPI), a novel SMS approach is presented, combined with two slice-wise imaging shift techniques. Our proposed approach was applied to a simulation study with preliminary results showing a decrease in the influence of the g-factor while increasing the brain activation detection rate. The signal-to-noise ratio and the contrast-to-noise ratio are also improved by our approach.

Title: Using intrinsic properties of data to improve image classifier learning curves
Presenter: Zenas Huang
Co-authors: Adam Alessio
Abstract: Image classification in medical imaging has high costs associated to training data acquisition, this work introduces an approach to modeling classifier learning curves that uses intrinsic properties of data to yield more accurate estimates of minimal effective sample sizes and the expected marginal benefit of additional training data to help guide more efficient data collection efforts. In medical imaging, a critical concern is determining the required amount of training data needed for an image classifier to achieve a desired target performance level. This is because the annotation of new samples is often time consuming and requires expert level medical domain knowledge. Although it is known that classifier performance follows an inverse power law as a function of the amount of training data, this law often misestimates actual performance. Moreover, generating sufficiently precise and complete empirical learning curves to estimate effective sample sizes in many medical image classification problems remains computationally intensive and often infeasible. Recent work suggests that a classifier's generalization error can be predicted by considering not only the quantity of training data but also certain intrinsic properties of the data such as the data's complexity — as measured by a dataset's intrinsic dimensionality which is computationally efficient to estimate. This work leverages these insights to enhance a learning curve model for a ResNet18 classifier applied to a binary classification task in the MedMNIST dataset and demonstrates how the incorporation of these factors leads to improved robustness and predictive performance of the learning curve model as evidenced by lower out-of-sample mean squared error when comparing models fitted to subsets of the classifiers' empirical learning curve. These findings suggest that these intrinsic dataset properties have predictive value that can be useful for optimizing data collection and annotation efforts.

Title: Pipelines for Extracting Features from "Cell Painting" Fluorescence Imaging Data
Presenter: Elizabeth Sweeney
Co-authors: Anil Bodepudi, Sagar Ksheera, Sudhir Sornapudi, Erin Weisbart, Beth Cimini, Zachary Sutake, Enrica Bianchi, Wei Chen, and Eric Sherer
Abstract: Chemical safety data traditionally relies on animal models and human epidemiological studies, greatly limiting the number of compound toxicity pro- les that are studied and well understood. Exciting new advances in high-throughput imaging hold promise for supplementing or even replacing these methods and will allow for a broader range of compounds to be screened earlier in the development pipeline. Cell Painting is one such imaging assay, which performs morphological profiling using multiplexed fluorescent dyes. In Cell Painting experiments, cells are often exposed to compounds of interest at different concentrations to determine the doses which cause morphological changes in the cells. We will describe the Cell Painting assay and two pipelines for extracting features from this data to assess these morphological changes. These pipelines are based on software developed at the Broad Institute, namely CellProfiler and DeepProfiler. The first pipeline uses CellProfiler and extracts traditional size, shape, intensity, and texture-based imaging features at the cell-level. The second pipeline uses DeepProfiler, which produced convolutional neural network-based features from a pre-trained model, also at the cell-level. One of the major challenges of implementing these pipelines is the size of the Cell Painting data, as just one experiment can produce upwards of 500 gigabytes of data. We will also describe our distributed AmazonWeb Service based solution for feature extraction, based largely off work from the Broad Institute. The features extracted from these two pipelines will be used to estimate compound phenotypic altering concentrations as well as for machine learning applications such as predicting in-vivo responses from Cell Painting Data.

Title: Multiple latent structure models for statistical learning with applications to neuroimaging data
Presenter: John Koo
Co-authors: Xi Luo, Brian Caffo, Yi Zhao
Abstract: The random dot product graph (RDPG) has become a powerful modeling tool in uncovering latent structures within graphs. In particular, it has been shown that the RDPG describes a wide range of popular random graph models with rigid latent structures. More recently, joint modeling of multiple random graphs that share common properties or structures across graphs have been introduced, such as the multilayer RDPG, multiple RPDG, and multilayer stochastic block model. In this work, we use these joint random graph models in the context of statistical learning, such as classification and regression, by introducing the multiple latent structure model, in which the graphs share a common latent structure with different parameters that correspond to different response variables. Then we propose various estimation techniques involving manifold learning to estimate these parameters and in turn predict the responses, with theorems guaranteeing convergence of the predictions. Simulations, as well as applications on brain connectivity networks, verify the performance of our methods.

Title: Formal Bayesian Approach to a fused GRAPPA and SENSE parallel imaging technique augmenting task detection power
Presenter: Chase Sakitis
Co-authors: Daniel Rowe
Abstract: In fMRI, capturing brain activity during a physical or cognitive task is dependent on how quickly each volume k-space array is obtained. Acquiring the full k-space arrays can take a considerable amount of time. Under-sampling k-space reduces the acquisition time, but results in aliased, or \folded," images after applying the inverse Fourier transform (IFT). GeneRalized Autocalibrating Partial Parallel Acquisition (GRAPPA) and SENSitivity Encoding (SENSE) are parallel imaging techniques that yield full images from subsampled arrays of k-space. With GRAPPA operating in the spatial frequency domain and SENSE in image space, these techniques can be fused to reconstruct the subsampled k-space arrays more accurately. Here, we propose a Bayesian approach to this combined model where prior distributions for the unknown parameters are assessed from a priori k-space arrays and images. The prior information is utilized to estimate the missing spatial frequency values, unalias the voxel values from the posterior distribution, and reconstruct into full field-of-view images. Our Bayesian technique successfully reconstructed a simulated and experimental fMRI time series with no aliasing artifacts while decreasing temporal variation, increasing SNR, and improving the power of detecting task activation.

Title: A Bayesian Gaussian graphical model approach to combined neural functional connectivity and subnetwork structure estimation
Presenter: Julia M. Fisher
Co-authors: Antonio M. Rubio, Edward J. Bedrick
Abstract: The past few decades have seen a surge of interest in better understanding a) which regions of the human brain activate together (i.e., which regions are functionally connected) and b) how functionally connected regions organize into sub-networks of the full network that is the human brain. Functional connectivity between specific regions and in specific subnetworks (e.g., the default mode network) have the potential to serve as disease biomarkers or to provide insight into the neurological underpinnings of cognitive effects associated with treatments such as transcranial magnetic stimulation. Functional connectivity can be estimated from functional magnetic resonance imaging data, which measures the blood oxygen level dependent (BOLD) signal in the brain approximately every two seconds over intervals of often around 5-15 minutes. The correlations of BOLD signal between different neurological regions then serve as a proxy for functional connectivity. Analytical approaches for such data have often focused on the estimation of either (partial) correlations between specific regions of interest or subnetwork structure but not both. However, both aspects of the data are often of interest, and their estimation would ideally occur within the same modeling framework.

We propose a Bayesian Gaussian graphical model approach to combined connectivity and sub-network structure estimation. We consider de-meaned, pre-whitened multivariate timeseries data from a set number of neurological regions of interest. We model the data as having zero mean and place a Wishart prior on the precision matrix of the data; the conjugacy of that prior with the Gaussian likelihood allows direct and easy sampling from the posterior Wishart distribution of the precision matrix. The model gives posterior distributions over every partial correlation between brain regions. Moreover, we propose estimation of subnetwork structure by conducting Louvain community detection on posterior samples of the partial correlation matrix — a function of the precision matrix — thus producing a posterior distribution of sub-network structures. The Louvain algorithm non-deterministically optimizes a modularity score to find the partition of the network into sub-networks that results in higher connections between regions within a sub-network and lower connections between regions from different subnetworks.

We conduct initial explorations of the model by simulating data under a variety of conditions: different numbers of sub-networks, different densities of active edges within sub-networks, different numbers of active edges between sub-networks, and different methods of selecting active edges within sub-networks (e.g., Barabási-Albert preferential attachment, Erdös-Rényi fixed probability of attachment). We sample from the posterior distribution of the precision matrix for each data set and compare the Louvain-detected sub-network structures to ground truth via the Rand Index, which varies from zero to one with one indicating perfect identification of sub-network structure. We also calculate the average mean squared error of the unique partial correlation values for each sample. We find a sharp increase in Rand Index values across sub-network densities for lower numbers of sub-networks and more gradual increases for higher numbers of sub-networks, indicating success in estimating more coarsely-grained and dense sub-network structures but challenges as the network becomes more finely divided. For example, simulations of a 100-region network with five disjoint, equally-sized, Barabási-Albert sub-networks with approximately 75% of possible sub-network edges included in the ground-truth network produced an average Rand Index of 0.99, practically perfect identification of the sub-network structure. In contrast, ten sub-networks with the other parameters held constant produced an average Rand Index of 0.72. Comparisons of performance in other conditions and to other models will be made.

Title: Segmenting 4D flow MRI using the variance of the cosine of the angle of flow direction
Presenter: Samuel J Eschker
Co-authors: Zehao Shao, Abhishek Singh, Vitaliy L. Rayz, Pavlos P. Vlachos, and Bruce A. Craig
Abstract: We propose a nonparametric steady state segmentation algorithm for 4D flow magnetic resonance imaging (MRI) based on the variance of the cosine of the angle (VCA) of flow direction. Using 4D flow MRI measurements of the flow in in vitro cerebral aneurysm models and in vivo 4D flow MRI acquired in cerebral arteries, we demonstrate that VCA falls close to zero in areas of flow and separably away from zero in tissue. Additionally, we show that VCA is robust under mild assumptions and that VCA produces precise segmentations before including voxels' spatial information. We then detail a Bayesian post-processing procedure for improving recall of VCA detected vessels with the initial VCA segmentation serving as an informative prior. Next, we present a comparison of post-processed VCA segmentations to our previously published segmentation algorithm based on Standardized Difference of Means (SDM) velocity using the aforementioned 4D flow MRI datasets. Finally, we outline a framework for utilizing VCA jointly with alternative segmentation techniques.

Title: SHAKER: A complete tool to simulate and explore functional MRI k-space time series data
Presenter: John Bodenschatz
Co-authors: Daniel Rowe
Abstract: To run experiments in an MRI machine is extremely expensive, costing both time and money. In addition, the entire imaging process can slow down the development of new statistical techniques to model and interpret fMRI data. Consequently, researchers evaluate new methods on simulated data as a cost-effective way of measuring potential. Currently, simulated fMRI data are largely developed in-house for each researcher using a variety of methods. This aim of this work is to present investigators with a single tool to generate realistic fMRI time series data based on the underlying physics of the machine and NMR phenomenon. All parameters of interest in an fMRI experiment are easily configurable, such as: T1 and T2* values, echo and repetition time (TE and TR), basic pulse sequences (including acceleration factors), task design and activation ROIs, etc. The tool will output simulated complex-valued k-space data (and, optionally, reconstructed magnitude and phase images) that can be internally analyzed or exported for further analysis using the researcher's preferred method and programming language.

Title: Analysis of simultaneous eye-tracking and fMRI data collected in children with ASD
Presenter: Xucheng (Fred) Huang
Co-authors: Sarah Robillard Shultz, Benjamin B. Risk
Abstract: Functional magnetic resonance imaging (fMRI) is a useful tool for understanding the complexities of brain activity, particularly in children with autism spectrum disorder (ASD). While recent studies have examined dynamic connectivity in ASD using resting-state fMRI, analyzing dynamic connectivity of task-based fMRI may offer novel insights. To enhance our understanding of brain connectivity in ASD, this thesis has two aims: 1) analyze the relationship between eye-tracking data and movie-watching tasks in children with and without ASD using the general linear model; 2) develop a novel model for analyzing dynamic connectivity during a task. For aim 1, we develop an analytic pipeline for convolving eye-blink and eye-fixation events with the Hemodynamic Response Function (HRF), which is then analyzed using conventional task-based modeling approach. For aim 2, we propose a novel covariance regression in which we estimate the association between time-varying correlations between brain regions and the eye-tracking data. We analyzed 12 ASD children and 22 non-ASD children collected in the Brain Connectivity Study at Emory University. Brain activation was significantly lower in ASD during eye-fixation events in regions associated with sensory processing, attention networks, auditory processing, executive functions, and language processing. The covariance regression analysis further identified large individual variability in functional connectivity among the ASD group. Our two-stage modeling approach extends beyond studies of ASD, providing an analytical framework to complement traditional task-based fMRI analyses with dynamic connectivity modeling.

Title: Reliable multivariate deep regression using moment-matching prior networks
Presenter: Qingyi Pan
Co-authors: Ruqi Zhang
Abstract: When deep neural networks are deployed in high-stakes applications, uncertainty estimation is crucial for reliable predictions and decision-making. Despite rich studies in univariate deep regression, multivariate deep regression with accurate uncertainty estimation, especially concerning the covariance matrix, remains largely unexplored. In this paper, we propose a scalable evidential prior to capturing both aleatoric and epistemic uncertainty, including the correlation of the multivariate response vector. Our method formulates a hierarchical probabilistic framework where the evidential prior is fitted using samples generated by a neural network based on moment-matching. Extensive empirical results on real-world multivariate regression tasks demonstrate that our method provides accurate prediction and uncertainty estimation with minimal computational overhead, significantly outperforming existing methods.

Title: Multi-task learning for brain network analysis in the ABCD study
Presenter: Xuang Kan
Co-authors: Hejie Cui, Keqi Han, Ying Guo, Carl Yang
Abstract: The Adolescent Brain Cognitive Development study provides a rich data resource for exploring the associations between brain connectome (network) and cognitive, personality, and mental health measures in adolescents. To leverage this rich dataset, we propose a novel multi-task learning framework that predicts these measures from multi-view brain network data using a graph transformer architecture. Our approach learns shared representations across tasks while allowing for task-specific predictions, improving performance compared to single-task learning. Ablation studies reveal the importance of our proposed techniques of Batch-Wise Loss Balancing and Target Standardization in ensuring successful multi-task learning. Furthermore, we develop innovative visualization techniques based on integrated gradients to interpret the learned task correlations and identify influential brain network edges for each task. Our findings contribute to understanding the complex relationships between brain connectome and behavioral outcomes, highlighting the potential of multi-task learning in this domain. View the implementation.

Title: Opposing effects of plasma LDL on white matter integrity in older APOE4 carriers
Presenter: Zhenyao Ye
Co-authors: Yezhi Pan, Rozalina G. McCoy, Chuan Bi, Mo Chen, Li Feng, Jiaao Yu, Tong Lu, Song Liu, Si Gao, Yizhou Ma, Chixiang Chen, Braxton D. Mitchell, Paul M. Thompson, L. Elliot Hong, Peter Kochunov, Tianzhou Ma, Shuo Chen
Abstract: Background: APOE4 is a strong genetic risk factor of Alzheimer's disease and metabolic dysfunction. However, whether APOE4 and markers of metabolic dysfunction synergistically impact the deterioration of white matter integrity in older adults remains unknown.

Methods: In the UK Biobank data, we conducted a multivariate analysis to investigate the moderation effects of APOE4 on the relationship between 249 plasma metabolites (measured using nuclear magnetic resonance spectroscopy) and whole-brain white matter (WM) integrity (measured by diffusion-weighted magnetic resonance imaging) in a cohort of 1,917 older adults (aged 65.0-81.0 years; 52.4% female).

Results: Of the examined biomarkers, higher concentrations of LDL and VLDL were associated with a lower level of WM integrity (b=-0.12, CI=[-0.14,- 0.10]) among APOE4 carriers. Conversely, among noncarriers, they were associated with a higher level of WM integrity (b=0.05, CI=[0.04,0.07]), demonstrating a significant moderation effect of APOE4 (b =-0.18, CI=[-0.20,0.15], P¡0.00001).

Conclusions: Altered lipid metabolism differentially affects APOE4 carriers compared to non-carriers. These findings support precision medicine approaches to mitigate impaired cognitive function among APOE4 carriers.

Title: GAMing the brain: investigating the cross-modal relationships between functional connectivity and structural features using generalized additive models
Presenter: Arunkumar Kannan
Co-authors: Archana Venkataraman, Brian Caffo
Abstract: Functional connectivity, reflecting synchronized brain activity across distinct regions, is crucial for understanding cognitive processes and neurological disorders. Despite the recent interest in exploring the relationship between functional connectivity and structural brain features, understanding the precise link remains challenging. We propose a novel analysis method that integrates structural factors — such as anatomical morphology summaries, voxel intensity, diffusion-weighted information, and geographic distance to elucidate variation in functional connectivity. Our method employs generalized additive model (GAM), leveraging region-pair or voxel-pair information, while accommodating individual subject differences in both template and subject spaces. Furthermore, we assess repeatability via the so called discriminability of subjects under our approach, quantifying the probability of similarities between measurements for the same subject versus different subjects. Utilizing data from the Human Connectome Project, we analyze brain connectivity in twin pairs and non-twin pairs to evaluate the heritability of model-based connectivity patterns estimated via GAMs. Our findings suggest that direct structure/function regression models enhances our understanding of functional connectivity variation, providing insights into underlying mechanisms and heritability of brain connections.

Title: Study design features that improve effect sizes in cross-sectional and longitudinal brain-wide association studies
Presenter: Kaidi Kang
Co-authors: Jakob Seidlitz, Richard A.I. Bethlehem, Jiangmei Xiong, Megan T. Jones, Kahini Mehta, Arielle S. Keller, Ran Tao, Anita Randolph, Bart Larsen, Brenden Tervo-Clemmens, Eric Feczko, Oscar Miranda Dominguez, Steve Nelson, Jonathan Schildcrout, Damien Fair, Theodore D. Satterthwaite, Aaron Alexander-Bloch, Simon Vandekar
Abstract: Brain-wide association studies (BWAS) are a fundamental tool in discovering brain-behavior associations. Several recent studies showed that thousands of study participants are required to improve the replicability of BWAS because actual effect sizes are much smaller than those reported in smaller studies. Here, we perform analyses and meta-analyses of a robust effect size index (RESI) using 63 longitudinal and cross-sectional magnetic resonance imaging studies from the Lifespan Brain Chart Consortium (77,695 total scans) to demonstrate that optimizing study design is critical for improving standardized effect sizes and replicability in BWAS. A meta-analysis of brain volume associations with age indicates that BWAS with larger covariate variance have larger effect size estimates and that the longitudinal studies we examined have systematically larger standardized effect sizes than cross-sectional studies. We propose a cross-sectional RESI to adjust for the systematic difference in effect sizes between cross-sectional and longitudinal studies that allows investigators to quantify the benefit of conducting their study longitudinally. Analyzing age effects on global and regional brain measures from the United Kingdom Biobank and the Alzheimer's Disease Neuroimaging Initiative, we show that modifying longitudinal study design through sampling schemes can increase between-subject variability and adding a single additional longitudinal measurement per subject improves effect sizes. However, evaluating these longitudinal sampling schemes on cognitive, psychopathology, and demographic associations with structural and functional brain outcome measures in the Adolescent Brain and Cognitive Development dataset shows that commonly used longitudinal models can, counterintuitively, reduce effect sizes. We demonstrate that the benefit of conducting longitudinal studies depends on the strengths of the between- and within-subject associations of the brain and non-brain measures. Explicitly modeling between- and within-subject effects avoids conflating the effects and allows optimizing effect sizes for them separately. These findings underscore the importance of considering study design features to improve the replicability of BWAS.

Title: BSNMani: Bayesian scalar-on-network regression with manifold learning
Presenter: Yijun Li
Co-authors: Ying Guo, Jian Kang
Abstract: Brain connectivity analysis is crucial for understanding brain structure and neurological function, shedding light on the mechanisms of mental illness. To study the association between individual brain connectivity networks and the clinical characteristics, we develop BSNMani: a Bayesian scalar-on-network regression model with manifold learning. BSNMani comprises two components: the network manifold learning model for brain connectivity networks, which extracts shared connectivity structures and subject-specific network features, and the joint predictive model for clinical outcomes, which studies the association between clinical phenotypes and subject-specific network features while adjusting for potential confounding covariates. For posterior computation, we develop a novel twostage hybrid algorithm combining Metropolis- Adjusted Langevin Algorithm (MALA) and Gibbs sampling. Our method is not only able to extract meaningful subnetwork features that reveal shared connectivity patterns, but can also reveal their association with clinical phenotypes, further enabling clinical outcome prediction. We demonstrate our method through simulations and through its application to real resting- state fMRI data from a study focusing on Major Depressive Disorder (MDD). Our approach sheds light on the intricate interplay between brain connectivity and clinical features, offering insights that can contribute to our understanding of psychiatric and neurological disorders, as well as mental health.

Title: Deep learning of sparse irregularly-observed multivariate longitudinal data
Presenter: Yunyi Li
Co-authors: Sujuang Gao, Hao Liu
Abstract: There is a growing interest in the analysis of multivariate longitudinal data across various scientific fields, including high dimensional imaging data that changes dynamically over time. In many studies, these types of data are observed sparsely and at randomly spaced follow-up times, which poses unique challenges for statistical modeling and data analysis. Dynamical changes in multivariate longitudinal data can be flexibly modeled using ordinary di erential equations (ODEs). There is considerable interest in estimating ODEs using deep learning techniques, such as neural networks, to infer the underlying mechanisms directly from the data. However, the existing methods have not been designed to handle sparsely observed multivariate longitudinal data with irregularly spaced follow-up times. In this paper, we present a novel method that utilizes ODEs with neural networks to address the analysis of longitudinal data that is sparsely observed at irregularly spaced follow-up times. Our proposed method combines longitudinal observations and employs stochastic optimization to estimate the initial values of ODEs. Our approach is straightforward and computationally efficient. Extensive simulation studies showed that the proposed method performed well in various situations. We demonstrated the proposed method by analyzing a data set of multivariate longitudinal data including imaging data from the Alzheimer's Disease Neuroimaging Initiative study.

Title: Longitudinal principal manifold estimation
Presenter: Robert Zielinski
Co-authors: Kun Meng, Ani Eloyan
Abstract: Alzheimer's disease (AD) is a neurogenerative disorder affecting, among others, the structure of the brain. Longitudinal magnetic resonance imaging (MRI) data is used to model trajectories of change in brain regions of interest to identify areas more susceptible to atrophy. Most methods for extracting surfaces of brain regions are applied to individual scans from study participants independently. As a result, there is a wide variability of shape and volume estimates of brain regions of interest over time in longitudinal studies resulting in major implications for biomarker estimation and modeling, especially when used in therapeutic clinical trials. To address this problem, we propose a longitudinal principal manifold estimation method, with the goal of recovering smooth, longitudinally meaningful manifold estimates of shapes over time. The proposed approach uses a smoothing spline to smooth over the coefficients of the principal manifold embedding function estimated at each time point. This smoothing mitigates the effects of random disturbances to the manifold between time points. A novel data augmentation approach is used to allow the use of principal manifold estimation on self-intersecting manifolds. We use simulation studies with several classes of manifolds to demonstrate performance improvements over naive applications of principal manifold estimation and principal curve/surface methods. These improvements persist when considering varying between-time-point noise levels and the types and magnitudes of systematic change between time points. We apply the proposed method to estimate the volumes of hippocampi and thalamuses of participants in the Alzheimer's Disease Neuroimaging Initiative dataset.

Title: A novel interpretable deep ordinal classification framework for multi-class grading of pneumoconiosis
Presenter: Meiqi Liu
Co-authors: Ling Wang, Adam Alessio
Abstract: This study proposes an interpretable deep ordinal classification framework to automatically classify pneumoconiosis based on the International Labour Office (ILO) classification system. Through collaboration with the National Institute for Occupational Safety and Health (NIOSH), this study curated a custom dataset of chest radiographs with and without pneumoconiosis. The four-point major category scale of profusion (concentration) of small opacities (0, 1, 2, or 3) were considered in this study. We consider ResNet method with different loss functions for multiclassification problem. Specifically, 1) cross-entropy loss, 2) Mean-Squared Error (MSE) loss and 3) multitask conditional loss and 4) ResNet with 2-layer cross entropy loss. The above mentioned experiments inspire us to propose a new interpretable deep ordinal classification framework for multi-class grading of pneumoconiosis, which outperforms all the existing methods, and it can be generalized to a lot of other tasks.


Organizing Committee

  • Amanda F. Majia, Indiana University Bloomington
  • Simon Vandekar, Vanderbilt University
  • Tingting Zhang, University of Pittsburgh
  • Yi Zhao (chair), Indiana University School of Medicine
  • Yize Zhao, Yale University

Student Paper Competition Review Panel

  • Andrew Chen, Medical University of South Carolina
  • Jian Kang, University of Michigan
  • Dehan Kong, University of Toronto
  • Suprateek Kundu, MD Anderson Cancer Center
  • Xinyi Li, Clemson University
  • Eardi Lila, University of Washington
  • Meimei Liu, Virginia Tech
  • Josh Lukemire, Emory University
  • Tianwen Ma, Emory University
  • Tianzhou Ma, University of Maryland
  • Xin Ma, Columbia University
  • Jun Young Park, University of Toronto
  • Benjamin Risk, Emory University
  • Haochang Shou, University of Pennsylvania
  • Sean L. Simpson, Wake Forest University School of Medicine
  • Dayu Sun (chair), Indiana University School of Medicine
  • Selena Wang, Yale University
  • Yaotian Wang, Emory University
  • Shan Yu, University of Virginia
  • Panpan Zhang, Vanderbilt University Medical Center
  • Siyu Zhou, Emory University


  • Shari Stansbery
  • Thomas Inskeep

Student Volunteers

  • Gertrude Osei
  • Minmin Pan
  • Yi Shi
  • Ziyan Song
  • Yixi Xu