Our overall research aim is to understand how genetic, epigenetic and gene expression variation cause variation in health and disease traits in human, using statistical and machine learning approaches. Current projects in the group are divided along the following themes:

Theme 1: Gene regulatory networks

I am interested in the development of models, algorithms and software to infer causal gene regulatory networks from multiple omics data. The key challenge here is causal inference: to transform patterns of co-expression among transcripts, proteins, metabolites and phenotypes into truly predictive models of biological systems.

In genetics, the random segregation of alleles effectively results in massively parallel randomized experiments, where the direction of causality between co-expressed genes can be inferred from their joint genetic linkage to cis-regulatory DNA sequences. While this basic principle is well-established, important challenges remain. My group is building an expanded toolkit for causal inference in systems genetics, that will include:

  1. efficient software for handling deep RNA-sequencing data from hundreds to thousands of individuals,
  2. more sensitive statistical models to account for multiple levels of known and unknown confounding factors and noise in the data,
  3. methods for the inference of global causal networks involving thousands of genes,
  4. methods for identifying causal regulatory variants using allele-specific eQTL mapping.

Theme 2: Imaging genetics

A new research direction in the group is to expand from using only molecular traits (epigenetic and gene expression variation) for understanding the functional impact of genetic variation, to also include images from MRI and other techniques. In particular, we are developing convolutional neural network models to simultaneously annotate image features and discovering the genetic factors that explain their inter-individual variation.

Theme 3: Graph representation learning

Yet another new research direction, where we are exploring the use of graph convolutional network models and reinforcement learning for representing and optimizing graph structures, particularly for applications in large-scale Bayesian network structure learning.

 

Our group develops algorithms and software to facilitate the analysis and interpretation of large-scale genomic datasets. A complete list of tools is available at https://lab.michoel.info/software/, or you can follow us on GitHub: