Statistics seminar

Welcome to Statistics seminar on Fridays 14:15-16:00


Upcoming talks

Friday October 5th at 14:15-16 in MaA210

Speaker: Marko Laine (Finnish Meteorological Institute)

Title: Dimension reduction for problems in satellite remote sensing of the environment

Abstract: I discuss two dimension reduction techniques that we have been using and developing at FMI. One is for statistical inverse problems in satellite retrieval of atmospheric constituents and uses forward model Jacobian and prior information to compose the parameter space into a part that is informed by the likelihood and into a complement space determined by the prior. The other problem is related to spatio temporal data fusion of satellite and in-situ observations. It uses reduced basis of the model state space covariance for efficient estimation by data assimilation techniques based on Kalman smoother.


Friday October 26th at 14:15-16

Speaker: TBA

Title: TBA


Friday November 9th at 14:15-16 in MaD381

Speaker: Lasse Leskelä (Aalto)

Title: Statistical modeling and learning of overlapping network communities


Friday March 15th 

Speaker: Tuomas Virtanen (Tampere University of Technology)

Title: TBA


Friday May 17th

Speaker: Tiina Manninen (Tampere University of Technology)

Title: TBA

Spring 2018

Friday April 27th at 10.15-12 in MaD381

Speaker: Anna-Kaisa Ylitalo

Title: Statistical analysis of eye movement data

Abstract: Eye tracking is a method for recording eye movements in order to find out where do people look at and when. The method has been used in various studies in psychology, marketing, car driving, and even in health research to study which kind of salads people pick on their plates. In this talk, I’ll present two kinds of eye movement applications and ideas on how to approach them. First, I will concentrate on an art study, in which people were looking at pictures of paintings while their eye movements were recorded. Here, a sequential spatial point process model suggested in Penttinen and Ylitalo (2016) is applied to extract long-term memory effect (i.e. learning) from an eye movement sequence of a participant looking at an abstract painting. In the latter part of the talk I’ll give examples of music reading studies; In music reading the movement of a gaze is more restricted than in picture viewing, which brings more challenge to the analysis. This work is part of a consortium project Reading Music, funded by the Academy of Finland 2014-2018.


Friday April 13th at 12.15-14 in MaD381

Speaker: Jenni Niku

Title: Comparing estimation methods for generalized linear latent variable models

Abstract: In many studies in community ecology, multivariate abundance data are often collected. Such data are characterized by two main features.

First, the data are high-dimensional in that the number of species often exceeds the number of sites. Second, the data almost always cannot be suitably transformed to be normally distributed. Instead, the most common types of responses recorded include presence-absence records, overdispersed species counts, biomass, and heavily discretized percent cover data. One promising approach for modelling data described above is generalized linear latent variable models. By extending the standard generalized linear modelling framework to include latent variables, we can account for covariation between species not accounted for by the predictors, species interactions and correlations driven by missing covariates.

The main challenge with using GLLVMs is computationally efficient estimation and inference. Since the responses are not normally distributed and the marginal likelihood involves integrating out the unknown latent variables, the likelihood does not possess a closed form. However, the most well-known methods for overcoming this issue like Gauss-Hermite quadrature, Expectation Maximization method and Bayesian Markov Chain Monte Carlo estimation are computationally very intensive, especially with multiple latent variables or with large number of responses. We show how estimation and inference for the considered models can be performed efficiently using either the Laplace or the variational approximation method. We use simulations to study the finite-sample properties of the two approaches. Examples are used to illustrate the methods. An R package gllvm for fitting the models is also introduced.


Friday March 23rd at 12.15-14 in MaD380

Speaker: Anton Muravev

Title: Metaheuristics and Evolutionary Algorithms: The Overview

Abstract: Metaheuristics are general-purpose heuristic optimization algorithms that do not use any information about the problem, requiring only the evaluation of candidate solutions. In addition to solving black-box problems, their properties may be desirable when the domain knowledge is not easily applicable or the fitness landscape is too complex. In particular, evolutionary algorithms (EA) are some of the most widespread metaheuristics with numerous practical applications. We aim to provide a general overview of the field, its most essential concepts and achievements along with some practical considerations.

In this seminar we will consider some historical aspects of metaheuristic optimization, the origins of evolutionary computation, its fundamental advantages and limitations. We describe the terminology and the general framework of the EA design, as well as some commonly used operators and techniques. We then briefly cover the multitude of the most relevant variants of evolutionary algorithms and outline their respective application areas. Finally, we consider the problem of neuroevolution – the use of evolutionary algorithms to optimize the architecture and/or weights of the problem-specific neural network. As human-designed neural architectures are approaching their limits, the neuroevolution research is experiencing a newfound growth; we thus explore some of the current developments in this area.


Wednesday February 28th at 13:00-14.00 in MaA210

Speaker: Essi Syrjälä

Title: Joint modeling approaches of food consumption and the risk of islet autoimmunity (pre-T1D)

Abstract: Pre-T1D is a preclinical phase that is identified by the presence of type 1 diabetes (T1D) -associated autoantibodies. Some evidence on the association between the early nutrition and the development of pre-T1D or T1D exists but no specific dietary factor has yet been shown to be an unambiguous risk factor.

A prospective birth cohort of 6069 infants born in 1996-2004 with genetic susceptibility to T1D was recruited. Child’s diet was measured with 3-day food records at the ages of 3, 6, 12, 24, 36, 48, 60 and 72 months and T1D-associated autoantibodies were measured at 3 to 12-month intervals up to the age of 15 years.

 We used a time-dependent Cox model, a basic joint model and a joint latent class mixed model to investigate the association between food consumption and pre-T1D, separately. Whereas a time-dependent Cox is a single model, joint models couple a survival model with a linear mixed effects model, which enables the modeling of two phenomena at the same time efficiently. Joint models have great potential in nutritional epidemiological studies based on (i) their ability to identify the individual exposure trajectories even when information is observed only at some measuring points that can themselves include missing values, (ii) their ability to reduce the measurement error common with nutritional data and (iii) the ability of joint latent class mixed models to potentially detect periods of sensitivity and risk groups. We found that different models revealed different features of the nutritional data and our findings regarding that will be presented.


Friday February 9th at 12:15-14 in MaD381

Speaker: Gleb Tikhonov (University of Helsinki)

Title: Analysis of ecological community data with latent factor models

Abstract: Last decade has brought significant expansion to the methodological tools that are available for an ecologist interested in analysis of data on ecological communities. Instead of previously commonly used ordination techniques, a new branch of model-based statistical methods has emerged, which is called joint species distribution models (JSDM). While different JSDMs has been constructed based on very different machine learning techniques, a particularly big group of powerful and flexible models is designed upon latent factors approach. In my talk I will present our ongoing development on such latent factor-based JSDM, which is called a Hierarchical Model of Species Communities (HMSC). While in its most simple version, HMSC is just a combination of generalized linear mixed model with sparse Bayesian latent factor model, we have implemented a set of important extensions that are much desired in practical analysis of ecological data. Thus, our framework is capable to account for the additional data on species traits and phylogenic relationships, deal with hierarchical and spatially explicit sampling designs, account for potential non-stationarity in species associations, and finally be efficiently used in time-series analysis.


Friday January 26th at 10:15-12 in MaD 355.

Speaker: Sara Taskinen (University of Jyväskylä)

Title: Blind source separation based on robust autocovariance matrices

Abstract: Assume a Blind Source Separation (BSS) model, that is, the observed p time series are assumed to be linear combinations of p latent uncorrelated weakly stationary time series. The aim is then to find an estimate for the unmixing matrix which transforms the observed time series back to uncorrelated latent time series. In the classical SOBI (Second Order Blind Identification) method, approximate joint diagonalization of the sample covariance matrix and sample autocovariance matrices with several lags is used to estimate the unmixing matrix. However, it is well known that in the presence of outliers, the sample covariance matrix and sample autocovariance matrices perform poorly and yield to unreliable unmixing matrix estimates. In this talk we thus propose a robust SOBI method which uses so-called M-autocovariance matrices in the estimation. We use finite-sample simulation studies and a real data example to illustrate the performance of our method.