Statistics seminar

Welcome to Statistics seminar on Fridays 14:15-16:00


Next talk

Friday November 16th at 14:15-16 in MaA210

Speaker: Tarmo Ketola (Jyväskylä)

Title: Environment, pathogen and host – disease triangle in wild and in pre- health care Finland

Abstract: Diseases are biologically very different. Some diseases spread mainly via human-to-human contact but others have life cycles that are bound more to environmental conditions. This biological diversity in different diseases can thus interact with spatial and social network properties, and create disease flora unique to different areas. Environmentally mediated diseases, caused by environmentally growing opportunistic pathogens, are driven mostly by environmental conditions affecting pathogen abundance and virulence. In obligatory, host dependent, pathogens the drivers of epidemics are more strongly dependent on host population structure.

In this talk I will shortly present some research on environmental drivers of virulence of opportunistic pathogen, followed by presentation of dataset containing millions of death cases in pre-health care Finland. This large underutilized dataset from years 1800-1850 contain ca. 400 parishes in Finland and combined with contemporary statistics it offers intriguing possibilities for epidemiological and historical work. With this data I have tested how parish size and number of villages affect risk of dying on three contagious diseases; smallpox, pertussis and measles.


Upcoming talks

Friday 25th January at 14:15-16 in MaA210

Speaker: Ville Leinonen (UEF)

Title: TBA


Friday March 15th 

Speaker: Tuomas Virtanen (Tampere University of Technology)

Title: TBA


Friday May 17th

Speaker: Tiina Manninen (Tampere University of Technology)

Title: TBA

Fall 2018

Friday November 9th at 14:15-16 in MaD381

Speaker: Lasse Leskelä (Aalto)

Title: Parameter estimators of sparse network models with thin overlapping communities

Abstract: This talk presents a statistical network model generated by a large number of randomly sized overlapping communities, where any pair of nodes sharing a community is linked with probability q via the community. In the special case with q = 1 the model reduces to a random intersection graph which is known to generate high levels of transitivity also in the sparse context. The parameter q adds a degree of freedom and leads to a parsimonious and analytically tractable network model with tunable density, transitivity, and degree fluctuations. We prove that the parameters of this model can be consistently estimated in the large and sparse limiting regime using moment estimators based on partially observed densities of links, 2-stars, and triangles. The talk is based on a research paper written in collaboration with Joona Karjalainen (Aalto University) and Johan van Leeuwaarden (TU Eindhoven), arXiv:1802.01171


Friday November 2nd at 14:15-16 in MaD381

Speaker: Jukka Nyblom (Jyväskylä)

Title: Tilastotieteen varhaishistoriaa Suomessa (presentation in Finnish)

Abstract: Legendre julkaisi 1805 tutkielmansa komeettojen ratojen määrittämisestä, jonka liitteessä hän esitti algebrallisen version pienimmän neliösumman menetelmästään (pns.). Gauss julkaisi tutkielmansa taivaankappaleiden liikkeistä v. 1809, missä hän esitti pns.-menetelmän probabilistisen version ja samalla väitti käyttäneensä ko. menetelmää jo vuodesta 1795. Tästä seurasi yksi tieteen historian suurista prioriteettikiistoista. Suomen tieteen historian näkökulmasta on mielenkiintoista, että jo v. 1815 menetelmää on sovellettu Turun akatemiassa. Fysiikan professori G. G. Hällströmin johdolla julkaistiin sarja pro gradu –tutkielmia maan elliptisyyden mittaamisesta, 4 kpl v. 1810 ja 2 kpl v. 1815. Näistä viimeisimmässä J.G. Bonsdorff soveltaa pns.-menetelmää eri puolilla maailmaa heilurin avulla tehtyihin maan elliptisyyden mittauksiin. Tämä tutkielma on ilmeisesti jäänyt Suomessa huomiotta, kunnes Tampereen yliopiston tilastotieteen lehtori Pekka Pere sen löysi. Tarkastelen esitelmässäni tätä pns.-menetelmän ja muutakin havaintojen käsittelyn historiaa 1800-luvun alun Turun akatemiassa.


Friday October 26th at 14:15-15:15 in MaD355

Speaker: Juha Heikkinen (Natural Resources Institute Finland Luke)

Title: Wolves move fast near houses

Abstract: I present an exploratory analysis of the association between the velocity of wolf movement and vicinity of human settlements. The analysis is based on 23,000 segments between two consecutive locations of GPS-collared wolves with approximately 30 minutes interval between relocations. For each segment, the distance to the nearest human residence was determined from CORINE Land Cover 2012 classification of 20m squares and the average velocity was determined as the length of the segment divided by the time interval between the two relocations. The velocities tended to be greater when the distance to the nearest residence was less than 400m.

This little study is a spin-off from Academy project "Models of heterogeneity, contextuality and self-interaction in ordered spatial point patterns with applications to animal movement and forest inventory (ordSpat)". The latter part of the talk sketches what we really want to do in the project.


Friday October 5th at 14:15-16 in MaA210

Speaker: Marko Laine (Finnish Meteorological Institute)

Title: Dimension reduction for problems in satellite remote sensing of the environment

Abstract: I discuss two dimension reduction techniques that we have been using and developing at FMI. One is for statistical inverse problems in satellite retrieval of atmospheric constituents and uses forward model Jacobian and prior information to compose the parameter space into a part that is informed by the likelihood and into a complement space determined by the prior. The other problem is related to spatio temporal data fusion of satellite and in-situ observations. It uses reduced basis of the model state space covariance for efficient estimation by data assimilation techniques based on Kalman smoother.


Friday September 28th at 14:15-16 in MaD381

Speaker: Janne Kujala (ZenRobotics & University of Jyväskylä)

Title: Probabilistic foundations of contextuality: the Contextuality-by-Default theory

Abstract: Intuitively contextuality means that the measurement of a property (perception of a stimulus, spin of a particle, etc.) may depend on what other properties it is measured with (the context). Contextuality is usually defined as the non-existence of a joint distribution of all random variables representing measurable properties given the observed joint distributions of certain subsets of them in each context.  However, in strict mathematical sense, noncontextuality defined like that is impossible since overlapping jointly distibuted subsets of random variables must all be jointly distributed.  To avoid such contradictions one has to adopt the Contextuality-by-Default (CbD) approach: random variables representing measurements in different contexts are always distinct and stochastically unrelated to each other.  Contextuality can then be defined as the non-existence of a coupling of all joint measurements such that each subcoupling corresponding to measurements of the same property in different contexts satisfies a certain property C.  Traditional analysis of contextuality corresponds to property C being "all variables are equal with probability 1". However, in typical experiments both in psychology and in quantum mechanics, the so called no-signalling property is violated: the distribution of a property may change depending on the context. This yields traditional approaches inapplicable without ignoring the signaling. With CbD, we can generalize C to "all variables are equal with maximal possible probability".  This allows testing whether a system has inherent quantum-like contextuality on top of any signaling. 

We consider different measures quantifying the degree of contextuality as well as the challenges of their computation.

Spring 2018

Friday April 27th at 10.15-12 in MaD381

Speaker: Anna-Kaisa Ylitalo

Title: Statistical analysis of eye movement data

Abstract: Eye tracking is a method for recording eye movements in order to find out where do people look at and when. The method has been used in various studies in psychology, marketing, car driving, and even in health research to study which kind of salads people pick on their plates. In this talk, I’ll present two kinds of eye movement applications and ideas on how to approach them. First, I will concentrate on an art study, in which people were looking at pictures of paintings while their eye movements were recorded. Here, a sequential spatial point process model suggested in Penttinen and Ylitalo (2016) is applied to extract long-term memory effect (i.e. learning) from an eye movement sequence of a participant looking at an abstract painting. In the latter part of the talk I’ll give examples of music reading studies; In music reading the movement of a gaze is more restricted than in picture viewing, which brings more challenge to the analysis. This work is part of a consortium project Reading Music, funded by the Academy of Finland 2014-2018.


Friday April 13th at 12.15-14 in MaD381

Speaker: Jenni Niku

Title: Comparing estimation methods for generalized linear latent variable models

Abstract: In many studies in community ecology, multivariate abundance data are often collected. Such data are characterized by two main features.

First, the data are high-dimensional in that the number of species often exceeds the number of sites. Second, the data almost always cannot be suitably transformed to be normally distributed. Instead, the most common types of responses recorded include presence-absence records, overdispersed species counts, biomass, and heavily discretized percent cover data. One promising approach for modelling data described above is generalized linear latent variable models. By extending the standard generalized linear modelling framework to include latent variables, we can account for covariation between species not accounted for by the predictors, species interactions and correlations driven by missing covariates.

The main challenge with using GLLVMs is computationally efficient estimation and inference. Since the responses are not normally distributed and the marginal likelihood involves integrating out the unknown latent variables, the likelihood does not possess a closed form. However, the most well-known methods for overcoming this issue like Gauss-Hermite quadrature, Expectation Maximization method and Bayesian Markov Chain Monte Carlo estimation are computationally very intensive, especially with multiple latent variables or with large number of responses. We show how estimation and inference for the considered models can be performed efficiently using either the Laplace or the variational approximation method. We use simulations to study the finite-sample properties of the two approaches. Examples are used to illustrate the methods. An R package gllvm for fitting the models is also introduced.


Friday March 23rd at 12.15-14 in MaD380

Speaker: Anton Muravev

Title: Metaheuristics and Evolutionary Algorithms: The Overview

Abstract: Metaheuristics are general-purpose heuristic optimization algorithms that do not use any information about the problem, requiring only the evaluation of candidate solutions. In addition to solving black-box problems, their properties may be desirable when the domain knowledge is not easily applicable or the fitness landscape is too complex. In particular, evolutionary algorithms (EA) are some of the most widespread metaheuristics with numerous practical applications. We aim to provide a general overview of the field, its most essential concepts and achievements along with some practical considerations.

In this seminar we will consider some historical aspects of metaheuristic optimization, the origins of evolutionary computation, its fundamental advantages and limitations. We describe the terminology and the general framework of the EA design, as well as some commonly used operators and techniques. We then briefly cover the multitude of the most relevant variants of evolutionary algorithms and outline their respective application areas. Finally, we consider the problem of neuroevolution – the use of evolutionary algorithms to optimize the architecture and/or weights of the problem-specific neural network. As human-designed neural architectures are approaching their limits, the neuroevolution research is experiencing a newfound growth; we thus explore some of the current developments in this area.


Wednesday February 28th at 13:00-14.00 in MaA210

Speaker: Essi Syrjälä

Title: Joint modeling approaches of food consumption and the risk of islet autoimmunity (pre-T1D)

Abstract: Pre-T1D is a preclinical phase that is identified by the presence of type 1 diabetes (T1D) -associated autoantibodies. Some evidence on the association between the early nutrition and the development of pre-T1D or T1D exists but no specific dietary factor has yet been shown to be an unambiguous risk factor.

A prospective birth cohort of 6069 infants born in 1996-2004 with genetic susceptibility to T1D was recruited. Child’s diet was measured with 3-day food records at the ages of 3, 6, 12, 24, 36, 48, 60 and 72 months and T1D-associated autoantibodies were measured at 3 to 12-month intervals up to the age of 15 years.

 We used a time-dependent Cox model, a basic joint model and a joint latent class mixed model to investigate the association between food consumption and pre-T1D, separately. Whereas a time-dependent Cox is a single model, joint models couple a survival model with a linear mixed effects model, which enables the modeling of two phenomena at the same time efficiently. Joint models have great potential in nutritional epidemiological studies based on (i) their ability to identify the individual exposure trajectories even when information is observed only at some measuring points that can themselves include missing values, (ii) their ability to reduce the measurement error common with nutritional data and (iii) the ability of joint latent class mixed models to potentially detect periods of sensitivity and risk groups. We found that different models revealed different features of the nutritional data and our findings regarding that will be presented.


Friday February 9th at 12:15-14 in MaD381

Speaker: Gleb Tikhonov (University of Helsinki)

Title: Analysis of ecological community data with latent factor models

Abstract: Last decade has brought significant expansion to the methodological tools that are available for an ecologist interested in analysis of data on ecological communities. Instead of previously commonly used ordination techniques, a new branch of model-based statistical methods has emerged, which is called joint species distribution models (JSDM). While different JSDMs has been constructed based on very different machine learning techniques, a particularly big group of powerful and flexible models is designed upon latent factors approach. In my talk I will present our ongoing development on such latent factor-based JSDM, which is called a Hierarchical Model of Species Communities (HMSC). While in its most simple version, HMSC is just a combination of generalized linear mixed model with sparse Bayesian latent factor model, we have implemented a set of important extensions that are much desired in practical analysis of ecological data. Thus, our framework is capable to account for the additional data on species traits and phylogenic relationships, deal with hierarchical and spatially explicit sampling designs, account for potential non-stationarity in species associations, and finally be efficiently used in time-series analysis.


Friday January 26th at 10:15-12 in MaD 355.

Speaker: Sara Taskinen (University of Jyväskylä)

Title: Blind source separation based on robust autocovariance matrices

Abstract: Assume a Blind Source Separation (BSS) model, that is, the observed p time series are assumed to be linear combinations of p latent uncorrelated weakly stationary time series. The aim is then to find an estimate for the unmixing matrix which transforms the observed time series back to uncorrelated latent time series. In the classical SOBI (Second Order Blind Identification) method, approximate joint diagonalization of the sample covariance matrix and sample autocovariance matrices with several lags is used to estimate the unmixing matrix. However, it is well known that in the presence of outliers, the sample covariance matrix and sample autocovariance matrices perform poorly and yield to unreliable unmixing matrix estimates. In this talk we thus propose a robust SOBI method which uses so-called M-autocovariance matrices in the estimation. We use finite-sample simulation studies and a real data example to illustrate the performance of our method.