Joris Mooij (University of Amsterdam)
Title: Can we predict the effects of a gene knockout in a purely data-driven way?
Since the pioneering work by Peirce and Fisher, the gold standard for causal discovery is a randomized experiment. An intriguing alternative approach to causal discovery was proposed in the nineties, based on conditional independence patterns in the data. Over the past decades, numerous causal discovery methods based on that idea have been proposed. These methods have been demonstrated to work on simulated data when all their assumptions are met. However, demonstrating their usefulness on real data has been a challenge. In this talk, I will discuss some of our recent attempts at validating causal discovery methods on yeast gene expression data collected under large-scale genetic perturbations. I will discuss two micro-array gene expression data sets that seem perfectly suited for validation of causal discovery methods at first sight. As it turns out, however, both causal discovery on these data and the validation of such methods is more challenging than one might think initially. We find that even sophisticated modern causal discovery algorithms are outperformed by simple non-causal baselines on these data sets.
Joint work with Philip Versteeg