Robust non-linear multivariate methods

In this project, we will develop methods of data analysis that are simultaneously able to (i) model non-linear dependencies and (ii) tolerate large amounts of contaminated and faulty data (a property known as robustness). Both properties are highly called for in the analysis of the complex data sets encountered today. Our main focus will be on constructing estimators of location and scatter and on using them to develop robust non-linear dimension reduction. The developed methods will be evaluated both theoretically and through their capabilities in data analysis.

Assistant professor Joni Virta is the project leader, and PhD student Lauri Heinonen is currently working in the project.

The research project is funded by the Academy of Finland (decision number 321883).

Synthetic Health and Research Data (SHARED)

The project researches and develops different methods for producing synthetic data. Synthetic data refer to data that have been produced on the basis of some real-world data (so-called original data), and which resemble them as closely as possible. Often, the goal of synthetic data are also that the observations in the original data can no longer be recognized, in which case the data are considered to be anonymous. Such synthetic data, which preserves the characteristics of the original data, but which are also considered to be anonymous, can be used more freely, for example, for scientific research as well as for development and innovation activities.

PhD candidate Katariina Perkonoja is currently working in the research project, and focuses on studying synthetic longitudinal patient data.

The research project is funded by the Novo Nordisk Foundation (decision number NNF19SA00591

The International Childhood Cardiovascular Cohort Consortium (i3C Consortium) 

The International Childhood Cardiovascular Cohorts (i3C) Consortium includes seven cohorts in the United States, Australia and Finland. Since the 1970s the cohorts have collected data on cardiovascular risk factors in childhood and have followed participants into midlife. As participants are entering their 50s and early 60s, this consortium represents a unique opportunity to explore the association of childhood risk factors with adult cardiovascular events.

PhD candidate Noora Kartiosuo is currently working in the consortium.

The Cardiovascular Risk in Young Finns Study

The Cardiovascular Risk in Young Finns Study is one of the largest follow-up studies into cardiovascular risk from childhood to adulthood. The main aim is to determine the contribution made by childhood lifestyle, biological, and psychological measures to the risk of cardiovascular diseases in adulthood. 

PhD candidate Noora Kartiosuo is currently working in the study.

Special Turku Coronary Risk Factor Intervention Project (STRIP)

The main purpose of the STRIP Study (Special Turku Coronary Risk Factor Intervention Project) is the prevention of atherosclerosis and coronary heart disease by a dietary intervention which began in infancy and has continued to early adulthood. The trial was launched in 1990 when 1062 7-month-old children and their families were enrolled. Half of the families have received individualized dietary and other life-style counseling at least twice a year whereas the rest of the families have served as a control group. The STRIP Study intervention continued until the participants reached the age of 20 years.

PhD candidate Noora Kartiosuo is currently working in the project.

Previous research in the Center of Statistics

Statistical methodology in cancer cell imaging, I. Ahonen, J. Nevalainen & M. Nees

Development of new statistical methods for multivariate and dependent time series, M. Matilainen, K. Nordhausen, H. Oja

New Statistical Procedures for Supervised Dimension Reduction, J. Virta, K. Nordhausen, H. Oja

Statistics for high-dimensional complex data with applications in lipidomics, M. Pesonen (Kujala), J. Nevalainen

Developed software

SpatialBSS: Blind Source Separation for Multivariate Spatial Data 

tsBSS: Blind Source Separation and Supervised Dimension Reduction for Time Series 

ICtest: Estimating and Testing the Number of Interesting Components in Linear Dimension Reduction 

tensorBSS: Blind Source Separation Methods for Tensor-Valued Observations