HotCSE Seminar
Computational Science & Engineering
Wednesday Oct 26, 12pm-1pm, [Meeting Link]

scDisInFact: The Disentangled Learning for Integration and Prediction of multi-batch multi-condition Single-Cell RNA sequencing data

Ziqi Zhang
Advisor: Dr. Xiuwei Zhang


Single-cell RNA-sequencing (scRNA-seq) is able to measure the expression level of genes in each cell of an experimental batch. scRNA-seq has been widely used for disease studies, where samples are collected from donors at different stages of the disease. As a result, each sample's scRNA-seq count matrix is associated with one or more biological conditions which can be age, gender, drug treatment, disease severity, etc. On the other hand, samples from different donors are often obtained in different experimental batches, which introduce technical confounders that are also termed ''batch effects''. Often seen in practice are samples from different conditions and different batches, and the differences among their count matrices are caused by a mixture of technical batch effect and condition effect. Computational methods should remove the batch effect while keeping the biological variations caused by condition effects. Existing batch effect removal methods remove all systematic differences among samples, including both batch effect and condition effect. In contrast, existing perturbation prediction methods treat the differences among samples solely as condition effects, and predict gene expression data that are inaccurate as they ignore batch effects. Here we propose scDisInFact, a computational framework based on variational autoencoders that models both batch effect and condition effect among samples in scRNA-seq data. scDisInFact simultaneously performs three tasks including batch effect removal, condition-associated key gene detection, and perturbation prediction. We tested scDisInFact on both simulated and real datasets, and compared it with baseline methods for each task. The results show that by jointly performing these three tasks, scDisInFact shows superior performance compared to existing methods that work on each task.


Ziqi Zhang is a Ph.D. student in the School of Computational Science and Engineering, at Georgia Institute of Technology. His advisor is Dr. Xiuwei Zhang. He is generally interested in developing machine learning algorithms to study cell regulatory mechanisms. His main research focuses are integrating biological information from single-cell multi-omics datasets and single-cell datasets across experimental conditions, and obtaining new biological insight from the integration study, and studying cell regulatory mechanisms including gene regulatory network, cross-modalities association network with graph learning algorithms.