P. Toiviainen & T. Eerola
University of Jyväskylä, Finland

Self-Organizing Map of the Essen Collection

For a more detailed account of the method, see ISSCM 2001 conference paper in PDF format.

Purpose

The purpose of the method is to visualize large corpora of music (Data mining). The method draws on the findings that listeners judge similarity of melodies based on the frequently occurring features of music (Castellano et al.1984; Krumhansl et al. 1999, 2000). This statistical information contributes to psychological similarity of melodies (Eerola et al. 2001).

Feature extraction

Common statistical measures of music are extracted from each melody separately:

Visualization by SOM

Statistical features are multidimensional and difficult to examine. Dimensions are reduced by self-organizing map (SOM, Kohonen 1997).

The SOM is an artificial neural network that simulates the process of self-organization in the central nervous system with a simple, yet effective, numerical algorithm. It consists of a two-dimensional planar array of simple processing units, each of which is associated with a reference vector. The dimensionality of these reference vectors is equal to that of the vectors used as input. After being trained with the input vectors, the SOM provides a non-linear topographic mapping from the multidimensional input space to the two-dimensional array. In other words, each input vector is mapped to some unit in the array, and vectors that are close to each other in the input space are mapped near each other. In addition, the SOM identifies the most salient features of the input set by detecting in each part of the input vector distribution the dimensions with the highest variance. Figure 3 depicts schematically the principles of the mapping provided by the SOM. After being trained with statistical representations of different musical styles, the SOM can be used for visualizing the organization of the melodies. Melodies that display similar statistical properties in terms of pitch distributions and note transitions are located at adjacent positions on the map. Multiple features can be combined into one visualization by using the SOM of each feature as an input in the training of a supermap (Figure 4). Further relevance for the supermap can be obtained by weigthing each separate feature map by its perceptual salience.



Musical Material

Applications

The demonstration of the method is divided into three tools. Tool 1 provides a coarse overview of features by displaying the organization of each map together with the entropy of each feature. Entropy is a measure of complexity that has been used previously in discriminating musical styles (Knopoff, & Hutchinson, 1983; Snyder, 1990). This tool shows the songs with similar features in proximate areas and can thus be used to investigate the similarity relationships between the songs. It also enables the playback of any chosen song on each SOM. A demonstration of this tool is available on the WWW (www.jyu.fi/musica/essen). Tool 2 provides a visualization of the statistical features as represented by the SOMs. Tool 3 combines keyword search with the similarity relations of the features. This tool can be used to find stylistic clusters or specific locations of the songs containing any selected criteria such as "ballads", 3/4 time-signatures, "Tirol" or any combination of these. This facilitates formulating and answering musically and culturally interesting questions from the corpus.

Conclusions and future directions

A method for the analysis of large corpus of music and specific practical tools for musical data mining were presented. The method was based on the statistical distribution of symbolic events and subsequent investigation of similarity relationships. Self-organizing neural network (SOM) was used to visualize the feature vectors. However, there is currently a lot of room for the improvement of the method itself. For example, taking into account the overall melodic contour, hierarchical reduction of the melodic surface, perceptual weighting of the events according to the metrical position and salience and phrasing would provide more sophistication and increase the perceptual relevance of the method. Further research would be needed to assess the applicability of the present method to audio-based material.


References