# Statistics seminar

## Next talk

Friday May 17th at 10:15 in MaD380

**Speaker:** Tiina Manninen (Tampere University)

**Title: **Challenges in computational neuroscience

**Abstract:** This talk will give an overview of challenges in experimental and computational neuroscience, focusing mostly on the computational part. I will first present some of the deterministic and stochastic methods developed for modeling dynamical behavior of interacting molecules and ions in neurons and non-neuronal cells. I will show how we analyzed these different stochastic methods in time and frequency domain and give an idea about computational time using these different methods in simulation. I will address reproducibility and replicability of in silico neuroscience studies. This means testing if the original simulation results produced by others are reproducible or replicable when implementing a model ourselves based on the information in the original article or when simulating an existing model code taken from a database, respectively. In addition, I will show if these models are comparable to each other so that they could be used to explain same experimental finding. Last, I will show our recent work where a computational model was fine-tuned to produce experimental data.

Upcoming talks

Friday October 25th (Tentative)

**Speaker:** Iasonas Lamprianou (University of Cyprus)

**Title:** TBA

## Spring 2019

Wednesday April 17th at 10:15 in MaD302

(joint Statistics and DEMO seminar)

**Speaker:** Juha Karvanen (University of Jyväskylä)

**Title:** Causal inference and decision making

**Abstract: **The talk starts with an introduction to causality. Do-calculus and ID-algorithm are tools for checking the identifiability of causal effects from observational data. The connections between causality and decision making are natural: A decision maker optimizes the consequences of actions and the estimation of these consequences, i.e., the causal effects of actions, from the data is a problem of causal inference. In the second part of the talk, counterfactuals are formally defined. A counterfactual definition of fairness in artificial intelligence and decision making is discussed. Finally, some new results on the causal inference from multiple experimental and observational studies are presented.

Friday March 29th at 14:15 in MaD380

**Speaker:** Tuomas Rajala (Natural Resources Institute Finland Luke)

**Title:** Spatial inter-species interactions in plant communities: Independence, or lack of data?

**Abstract:** In this talk I will take a closer look at a K-function-based independence test for bivariate point patterns often used for discovering interactions in rainforest ecology. I will study the test's reliability, particularly its power, under some simplifying assumptions. I will illustrate how the power depends on the sample sizes and the strength of the true interaction. I will also show results that suggest that the power to detect spatial interactions at previously published sample sizes can be quite low, and that the observed positive relationship between the frequency of interspecific spatial independence and community species richness could be influenced by the statistical power of the tests being used and the abundances of the species within the forests.

Friday March 15th at **15:00 in MaA210**

**Speaker:** Tuomas Virtanen (Tampere University)

**Title:** Making Computers to Recognize Sounds – The Statistical Approach

**Abstract:** Sound carries lots of information, which can be automatically analyzed by computational analysis methods. This has several applications, for example in voice-user interfaces, acoustic monitoring, context-aware devices, and recommendation systems.

This talk will given an overview of the general methodology for computational sound analysis. We will first discuss technical and scientific problems related to the task. Since the methods used are heavily based on machine learning, we describe the core pattern classification methods used. We present state of the art methods based on convolutional recurrent neural networks that have been shown to produce good results in a wide range of analysis tasks. The use of convolutional layers allows learning automatically a suitable high-level representation to recognize different types of sound sources, and the use of recurrent layers enables modeling long-term temporal context which is required for reliable sound analysis.

We will use audio and video examples to demonstrate the methods in everyday sound recognition tasks. We will also analyze and discuss the performance of humans vs. machines in environmental sound recognition.

Friday February 1st at 14:15 in MaD380

**Speaker:** Santeri Karppinen (Jyväskylä)

**Title:** Improved leukocyte concentration predictions in paediatric acute lymphoblastic leukaemia maintenance therapy with Bayesian nonlinear state space models

**Abstract:** Acute lymphoblastic leukaemia is the most common cancer in childhood. During the last phase of its treatment, maintenance therapy, clinicians adjust the dosage of chemotherapy drugs to reach a target range in patient leukocyte concentration. Making good dosage decisions is important, as inadequate dosage will result in poor treatment outcomes. Due to substantial interindividual variability in the pharmacokinetics of the chemotherapy drugs, and a delay in reaching steady state response in the leukocyte concentration, deciding on the right dosage for a patient is difficult.

To ease decision making, models predicting patient leukocyte concentrations based on administered chemotherapy have been suggested in the literature. In this presentation, I present two novel Bayesian nonlinear state space models, which simplify some aspects of previously proposed models, but allow for some extra flexibility. I will discuss the models and compare their predictive performances against a model from the literature using time series cross-validation with real patient data. The results show that the new models outperform the model from the literature and appear more robust

Friday 25th January at 14:15-16 in MaA210

**Speaker:** Ville Leinonen (UEF)

**Title**: Causal Model as a Tool for Analyzing Dependence Structure of Variables in the Evolution Process of Wood Combustion Emission.

**Abstract:** Residential wood combustion and other combustion processes have a major impact on climate change. Instead of just constraining amount of fresh emissions from the emission source, we should better take into account also the evolution of emissions in the atmosphere. It is important to understand the importance of different factors (e.g. qualities of fuel, burning device, and burning conditions) affecting the evolution of atmospheric and health effects of emission.

I will discuss our attempt to model the evolution of combustion emissions in atmospheric-like conditions. We have modeled the evolution process by using time series of measured variables and causal model. Causal discovery algorithms have been applied to form the dependence structure between variables measured from the emission. Variables that could explain the change in a variable of interest between time points have been connected in the dependence structure, following the example of chemical reactions, where amount of sources determines amount of product formed during specific time interval. Obtained dependence structure and estimated effects are evaluated comparing observed data to model simulations.

In this presentation, focus will be on different solutions we have made regarding data analysis and modeling. In addition, the presentation will introduce common challenges related to atmospheric data, which are relevant for this study.

## Fall 2018

Friday December 14th at 10:15-12 in MaD381

**Speaker:** Arno Solin (Aalto)

**Title**: A Pictorial Tour of Recent Advances in Probabilistic Sensor Fusion and Real-Time Inference

**Abstract:** One of the exciting trends in machine learning is combining structured (white-box) models, challenging estimation tasks, and probabilistic techniques. With the additional requirement of real-time computation, these tasks become demanding. This talk gives a brief (and rather pictorial) introduction to real-time inference using Gaussian processes, a powerful machine learning paradigm for learning latent functions. The application examples range from simultaneous localisation and mapping (SLAM) and sensor fusion to electricity consumption prediction.

Friday November 16th at 14:15-16 in MaA210

**Speaker:** Tarmo Ketola (Jyväskylä) & Michael Briga (University of Turku)

**Title**: Environment, pathogen and host – disease triangle in wild and in pre- health care Finland

**Abstract:** Diseases are biologically very different. Some diseases spread mainly via human-to-human contact but others have life cycles that are bound more to environmental conditions. This biological diversity in different diseases can thus interact with spatial and social network properties, and create disease flora unique to different areas. Environmentally mediated diseases, caused by environmentally growing opportunistic pathogens, are driven mostly by environmental conditions affecting pathogen abundance and virulence. In obligatory, host dependent, pathogens the drivers of epidemics are more strongly dependent on host population structure.

In this talk I will shortly present some research on environmental drivers of virulence of opportunistic pathogen, followed by presentation of dataset containing millions of death cases in pre-health care Finland. This large underutilized dataset from years 1800-1850 contain ca. 400 parishes in Finland and combined with contemporary statistics it offers intriguing possibilities for epidemiological and historical work. With this data I have tested how parish size and number of villages affect risk of dying on three contagious diseases; smallpox, pertussis and measles.

Friday November 9th at 14:15-16 in MaD381

**Speaker:** Lasse Leskelä (Aalto)

**Title:** Parameter estimators of sparse network models with thin overlapping communities

**Abstract:** This talk presents a statistical network model generated by a large number of randomly sized overlapping communities, where any pair of nodes sharing a community is linked with probability q via the community. In the special case with q = 1 the model reduces to a random intersection graph which is known to generate high levels of transitivity also in the sparse context. The parameter q adds a degree of freedom and leads to a parsimonious and analytically tractable network model with tunable density, transitivity, and degree fluctuations. We prove that the parameters of this model can be consistently estimated in the large and sparse limiting regime using moment estimators based on partially observed densities of links, 2-stars, and triangles. The talk is based on a research paper written in collaboration with Joona Karjalainen (Aalto University) and Johan van Leeuwaarden (TU Eindhoven), arXiv:1802.01171.

Friday November 2nd at 14:15-16 in MaD381

**Speaker:** Jukka Nyblom (Jyväskylä)

**Title: **Tilastotieteen varhaishistoriaa Suomessa (presentation in Finnish)

**Abstract: **Legendre julkaisi 1805 tutkielmansa komeettojen ratojen määrittämisestä, jonka liitteessä hän esitti algebrallisen version pienimmän neliösumman menetelmästään (pns.). Gauss julkaisi tutkielmansa taivaankappaleiden liikkeistä v. 1809, missä hän esitti pns.-menetelmän probabilistisen version ja samalla väitti käyttäneensä ko. menetelmää jo vuodesta 1795. Tästä seurasi yksi tieteen historian suurista prioriteettikiistoista. Suomen tieteen historian näkökulmasta on mielenkiintoista, että jo v. 1815 menetelmää on sovellettu Turun akatemiassa. Fysiikan professori G. G. Hällströmin johdolla julkaistiin sarja pro gradu –tutkielmia maan elliptisyyden mittaamisesta, 4 kpl v. 1810 ja 2 kpl v. 1815. Näistä viimeisimmässä J.G. Bonsdorff soveltaa pns.-menetelmää eri puolilla maailmaa heilurin avulla tehtyihin maan elliptisyyden mittauksiin. Tämä tutkielma on ilmeisesti jäänyt Suomessa huomiotta, kunnes Tampereen yliopiston tilastotieteen lehtori Pekka Pere sen löysi. Tarkastelen esitelmässäni tätä pns.-menetelmän ja muutakin havaintojen käsittelyn historiaa 1800-luvun alun Turun akatemiassa.

Friday October 26th at 14:15-15:15 in MaD355

**Speaker:** Juha Heikkinen (Natural Resources Institute Finland Luke)

**Title:** Wolves move fast near houses

**Abstract:** I present an exploratory analysis of the association between the velocity of wolf movement and vicinity of human settlements. The analysis is based on 23,000 segments between two consecutive locations of GPS-collared wolves with approximately 30 minutes interval between relocations. For each segment, the distance to the nearest human residence was determined from CORINE Land Cover 2012 classification of 20m squares and the average velocity was determined as the length of the segment divided by the time interval between the two relocations. The velocities tended to be greater when the distance to the nearest residence was less than 400m.

This little study is a spin-off from Academy project "Models of heterogeneity, contextuality and self-interaction in ordered spatial point patterns with applications to animal movement and forest inventory (ordSpat)". The latter part of the talk sketches what we really want to do in the project.

Friday October 5th at 14:15-16 in MaA210

**Speaker**: Marko Laine (Finnish Meteorological Institute)

**Title**: Dimension reduction for problems in satellite remote sensing of the environment

**Abstract: **I discuss two dimension reduction techniques that we have been using and developing at FMI. One is for statistical inverse problems in satellite retrieval of atmospheric constituents and uses forward model Jacobian and prior information to compose the parameter space into a part that is informed by the likelihood and into a complement space determined by the prior. The other problem is related to spatio temporal data fusion of satellite and in-situ observations. It uses reduced basis of the model state space covariance for efficient estimation by data assimilation techniques based on Kalman smoother.

Friday September 28th at 14:15-16 in MaD381

**Speaker:** Janne Kujala (ZenRobotics & University of Jyväskylä)

**Title:** Probabilistic foundations of contextuality: the Contextuality-by-Default theory

**Abstract:** Intuitively contextuality means that the measurement of a property (perception of a stimulus, spin of a particle, etc.) may depend on what other properties it is measured with (the context). Contextuality is usually defined as the non-existence of a joint distribution of all random variables representing measurable properties given the observed joint distributions of certain subsets of them in each context. However, in strict mathematical sense, noncontextuality defined like that is impossible since overlapping jointly distibuted subsets of random variables must all be jointly distributed. To avoid such contradictions one has to adopt the Contextuality-by-Default (CbD) approach: random variables representing measurements in different contexts are always distinct and stochastically unrelated to each other. Contextuality can then be defined as the non-existence of a coupling of all joint measurements such that each subcoupling corresponding to measurements of the same property in different contexts satisfies a certain property C. Traditional analysis of contextuality corresponds to property C being "all variables are equal with probability 1". However, in typical experiments both in psychology and in quantum mechanics, the so called no-signalling property is violated: the distribution of a property may change depending on the context. This yields traditional approaches inapplicable without ignoring the signaling. With CbD, we can generalize C to "all variables are equal with maximal possible probability". This allows testing whether a system has inherent quantum-like contextuality on top of any signaling.

We consider different measures quantifying the degree of contextuality as well as the challenges of their computation.

## Spring 2018

Friday April 27th at 10.15-12 in MaD381

**Speaker:** Anna-Kaisa Ylitalo

**Title:** Statistical analysis of eye movement data

**Abstract: **Eye tracking is a method for recording eye movements in order to find out where do people look at and when. The method has been used in various studies in psychology, marketing, car driving, and even in health research to study which kind of salads people pick on their plates. In this talk, I’ll present two kinds of eye movement applications and ideas on how to approach them. First, I will concentrate on an art study, in which people were looking at pictures of paintings while their eye movements were recorded. Here, a sequential spatial point process model suggested in Penttinen and Ylitalo (2016) is applied to extract long-term memory effect (i.e. learning) from an eye movement sequence of a participant looking at an abstract painting. In the latter part of the talk I’ll give examples of music reading studies; In music reading the movement of a gaze is more restricted than in picture viewing, which brings more challenge to the analysis. This work is part of a consortium project Reading Music, funded by the Academy of Finland 2014-2018.

Friday April 13th at 12.15-14 in MaD381

**Speaker:** Jenni Niku

**Title: **Comparing estimation methods for generalized linear latent variable models

**Abstract: **In many studies in community ecology, multivariate abundance data are often collected. Such data are characterized by two main features.

First, the data are high-dimensional in that the number of species often exceeds the number of sites. Second, the data almost always cannot be suitably transformed to be normally distributed. Instead, the most common types of responses recorded include presence-absence records, overdispersed species counts, biomass, and heavily discretized percent cover data. One promising approach for modelling data described above is generalized linear latent variable models. By extending the standard generalized linear modelling framework to include latent variables, we can account for covariation between species not accounted for by the predictors, species interactions and correlations driven by missing covariates.

The main challenge with using GLLVMs is computationally efficient estimation and inference. Since the responses are not normally distributed and the marginal likelihood involves integrating out the unknown latent variables, the likelihood does not possess a closed form. However, the most well-known methods for overcoming this issue like Gauss-Hermite quadrature, Expectation Maximization method and Bayesian Markov Chain Monte Carlo estimation are computationally very intensive, especially with multiple latent variables or with large number of responses. We show how estimation and inference for the considered models can be performed efficiently using either the Laplace or the variational approximation method. We use simulations to study the finite-sample properties of the two approaches. Examples are used to illustrate the methods. An R package gllvm for fitting the models is also introduced.

Friday March 23rd at 12.15-14 in MaD380

**Speaker:** Anton Muravev

**Title: **Metaheuristics and Evolutionary Algorithms: The Overview

**Abstract: **Metaheuristics are general-purpose heuristic optimization algorithms that do not use any information about the problem, requiring only the evaluation of candidate solutions. In addition to solving black-box problems, their properties may be desirable when the domain knowledge is not easily applicable or the fitness landscape is too complex. In particular, evolutionary algorithms (EA) are some of the most widespread metaheuristics with numerous practical applications. We aim to provide a general overview of the field, its most essential concepts and achievements along with some practical considerations.

In this seminar we will consider some historical aspects of metaheuristic optimization, the origins of evolutionary computation, its fundamental advantages and limitations. We describe the terminology and the general framework of the EA design, as well as some commonly used operators and techniques. We then briefly cover the multitude of the most relevant variants of evolutionary algorithms and outline their respective application areas. Finally, we consider the problem of neuroevolution – the use of evolutionary algorithms to optimize the architecture and/or weights of the problem-specific neural network. As human-designed neural architectures are approaching their limits, the neuroevolution research is experiencing a newfound growth; we thus explore some of the current developments in this area.

Wednesday February 28th at 13:00-14.00 in MaA210

**Speaker:** Essi Syrjälä

**Title:** Joint modeling approaches of food consumption and the risk of islet autoimmunity (pre-T1D)

**Abstract: **Pre-T1D is a preclinical phase that is identified by the presence of type 1 diabetes (T1D) -associated autoantibodies. Some evidence on the association between the early nutrition and the development of pre-T1D or T1D exists but no specific dietary factor has yet been shown to be an unambiguous risk factor.

A prospective birth cohort of 6069 infants born in 1996-2004 with genetic susceptibility to T1D was recruited. Child’s diet was measured with 3-day food records at the ages of 3, 6, 12, 24, 36, 48, 60 and 72 months and T1D-associated autoantibodies were measured at 3 to 12-month intervals up to the age of 15 years.

We used a time-dependent Cox model, a basic joint model and a joint latent class mixed model to investigate the association between food consumption and pre-T1D, separately. Whereas a time-dependent Cox is a single model, joint models couple a survival model with a linear mixed effects model, which enables the modeling of two phenomena at the same time efficiently. Joint models have great potential in nutritional epidemiological studies based on (i) their ability to identify the individual exposure trajectories even when information is observed only at some measuring points that can themselves include missing values, (ii) their ability to reduce the measurement error common with nutritional data and (iii) the ability of joint latent class mixed models to potentially detect periods of sensitivity and risk groups. We found that different models revealed different features of the nutritional data and our findings regarding that will be presented.

Friday February 9th at 12:15-14 in MaD381

**Speaker:** Gleb Tikhonov (University of Helsinki)

**Title: **Analysis of ecological community data with latent factor models

**Abstract: **Last decade has brought significant expansion to the methodological tools that are available for an ecologist interested in analysis of data on ecological communities. Instead of previously commonly used ordination techniques, a new branch of model-based statistical methods has emerged, which is called joint species distribution models (JSDM). While different JSDMs has been constructed based on very different machine learning techniques, a particularly big group of powerful and flexible models is designed upon latent factors approach. In my talk I will present our ongoing development on such latent factor-based JSDM, which is called a Hierarchical Model of Species Communities (HMSC). While in its most simple version, HMSC is just a combination of generalized linear mixed model with sparse Bayesian latent factor model, we have implemented a set of important extensions that are much desired in practical analysis of ecological data. Thus, our framework is capable to account for the additional data on species traits and phylogenic relationships, deal with hierarchical and spatially explicit sampling designs, account for potential non-stationarity in species associations, and finally be efficiently used in time-series analysis.

Friday January 26th at 10:15-12 in MaD 355.

**Speaker:** Sara Taskinen (University of Jyväskylä)

**Title:** Blind source separation based on robust autocovariance matrices

**Abstract:** Assume a Blind Source Separation (BSS) model, that is, the observed p time series are assumed to be linear combinations of p latent uncorrelated weakly stationary time series. The aim is then to find an estimate for the unmixing matrix which transforms the observed time series back to uncorrelated latent time series. In the classical SOBI (Second Order Blind Identification) method, approximate joint diagonalization of the sample covariance matrix and sample autocovariance matrices with several lags is used to estimate the unmixing matrix. However, it is well known that in the presence of outliers, the sample covariance matrix and sample autocovariance matrices perform poorly and yield to unreliable unmixing matrix estimates. In this talk we thus propose a robust SOBI method which uses so-called M-autocovariance matrices in the estimation. We use finite-sample simulation studies and a real data example to illustrate the performance of our method.