Causal models and study design

A causal model

Science aims to find and understand causal relations in Nature. Causal inference refers to drawing conclusions on the effects of causes on the basis of experimental and observational data and expert knowledge. Understanding the study design used to collect the data is an essential element in causal inference. As the resources for research are limited, it is important to design data collection cost-efficiently.

The research belongs to the thematic research area Decision analytics utilizing causal models and multiobjective Optimization (DEMO) of University of Jyväskylä

Main Researchers:

Some Collaborators

Elias Bareinboim (Purdue U.), Antti Hyttinen (U. of Helsinki), Jarno Vanhatalo (U. of Helsinki), Samu Mäntyniemi (Luke), Kari Auranen (U. of Turku), Sangita Kulathinal (THL), Jaakko Reinikainen (THL), Mikko Sillanpää (U. of Oulu), Olli Saarela (U. of Toronto), Santtu Mikkonen (UEF)


Causal models

(Research team: J. Karvanen, S. Tikka)

The causal inference can be divided into three sub-areas: discovering the causal model from the data, identifying the causal effect when the causal structure is known and estimating an identifiable causal effect from the data. Randomized controlled trials are the gold standard for causal inference (Fisher, 1935). In an ideal experiment, the experimental units are randomized into two or more treatment groups and the group averages of the response variable estimate the average causal effects. However, running an experiment may be time-consuming, expensive or practically or ethically impossible. Therefore there has been growing interest towards causal inference from observational data. A causal effect is called identifiable if it can be uniquely determined from the causal structure on the basis of the observations only.

We have studied the identifiability of causal effects and implemented the key algorithms of the field as open source software.

We have also developed the concept Causal models with design which describe the study design and the missing data mechanism together with the causal structure. The flow of the study is visualized by ordering the nodes of the causal diagram in two dimensions by their causal order and the time of the observation.

Selected publications

S. Tikka, J. Karvanen, Enhancing identification of causal effects by pruning, Journal of Machine Learning Research, 18(194):1-23, 2018.

S. Tikka, J. Karvanen, Simplifying probabilistic expressions in causal inference, Journal of Machine Learning Research, 18(36), 1-30, 2017.

S. Tikka, J. Karvanen, Identifying causal effects with the R package causaleffect. Journal of Statistical Software, Volume 76, 2017.

J. Karvanen, Study design in causal models. Scandinavian Journal of Statistics, Volume 42, Issue 2, pages 361-377, DOI: 10.1111/sjos.12110, 2015.


Optimal design of observational studies

(Research Team: J. Karvanen)

The objective of our research is to interpret the design problems encountered in real life research work in the framework of Bayesian optimal design, derive guidelines for cost-efficiency and carry out efficient analysis for the data collected according to the selected design.

In observational studies, the design decisions are related to the data collection itself. In survey sampling, the questions on the sample size and sample stratification are fundamental. Unequal sampling probabilities often improve the efficiency but also complicate the data analysis. In epidemiology, designs such as case-control design and case-cohort design, are used improve the cost-efficiency. The basic principle behind these designs is to enrich the data with the cases (e.g. death due to heart attack), which are relatively rare in the population to be studied. Compared to simple random samples, this leads to significantly smaller sample sizes. In two-stage or multi-stage designs, subsamples of individuals are selected for expensive or time-consuming measurements such as genotype or biomarker specification or brain imaging on the basis of variables measured at the first stage of the study. The multi-stage observational design resembles the batch sequential design of experiments but there are also important differences when causality is considered.


Selected publications

J. Karvanen, J. Vanhatalo, S. Kulathinal, K. Auranen, S. Mäntyniemi, Optimal design of observational studies, Submitted, arXiv:1609.08347, 2017.

J. Reinikainen, J. Karvanen, H. Tolonen, Optimal selection of individuals for repeated covariate measurements in follow-up studies. Statistical Methods in Medical Research, Volume 25 issue 6, pages 2420-2433, 2016.


J. Reinikainen, J. Karvanen, H. Tolonen, How many longitudinal covariate measurements are needed for risk prediction?. Journal of Clinical Epidemiology, Volume 69, pages 114-124, doi:10.1016/j.jclinepi.2015.06.022, 2016.


1) Academy of Finland: Decision analytics utilizing causal models and multiobjective optimization