University of Jyväskylä

Dissertation: Unstable Feature Relevance in Classification Problems (Skrypnyk)

Start date: Dec 21, 2011 12:00 PM

End date: Dec 21, 2011 03:00 PM

M.Sc. Iryna Skrypnyk defends her doctoral dissertation in accounting titled  “Unstable Feature Relevance in Classification Problems”. Opponent Professor Keijo Ruotsalainen (University of Oulu) and custos Professor Seppo Puuronen (University of Jyväskylä).

 

Abstract

Over the last decade, data mining has gone through a significant transformation influenced by advanced data collection technologies. Today data mining faces the challenge of dealing with increasingly complex data structures. As a result, data often exhibits instability in measured attribute values (features). In other words, the set of relevant features is not the same through the entire set of domain examples. Considering this problem from another angle, data includes regions with local properties that, in particular, differ from each other with regard to the feature relevance profiles. Global models, therefore, cannot reflect the essential knowledge about the data structure. This thesis presents a description of the unstable feature relevance problem in classification tasks, elaborating the concept of heterogeneous classification problems and introducing different types of feature space heterogeneity. It also suggests a multi-model solution derived from the definition of a subproblem as a group of instances with easier class discrimination and lower complexity in the subspace of locally relevant features. The solution is presented within an ensemble learning framework. The search strategies, suggested for decomposition of classification problems with unstable feature relevance, express different levels of granularity with respect to classes. Evaluation of the candidate subproblems is executed through profiles of feature relevance. These profiles are vectors of weights obtained from feature merit measures and, alternatively, a result of distance metric adaptation. Additional measures of complexity, including class boundaries and density-based measures, are suggested to evaluate decomposition and to serve as preliminary heterogeneity tests. This research contributes towards reaching complementary data analysis goals on classification problems and revealing important insights on the data structure and its complexity. The effects on classification performance were studied through numerous experiments on synthetic, benchmark, and real data from a biomedical research domain. It was found that extraction of subproblems is possible in many cases and it provides meaningful data partitioning results. In many cases it also leads to improvement in predictive performance.

Further information

Iryna Skrypnyk, iryna.skrypnyk@jyu.fi

The dissertation is published in the series Jyväskylä Studies in Computing, Jyväskylä: University of Jyväskylä, 2011, 232 p. ISSN 1456-5390 ; 152. Inquiries: University Library, Publishing Unit, tel. 040 805 3825, myynti@library.jyu.fi.

Posted by

Filed under: