Saliency-Based Multilabel Linear Discriminant Analysis

Abstract

Linear discriminant analysis (LDA) is a classical statistical machine-learning method, which aims to find a linear data transformation increasing class discrimination in an optimal discriminant subspace. Traditional LDA sets assumptions related to the Gaussian class distributions and single-label data annotations. In this article, we propose a new variant of LDA to be used in multilabel classification tasks for dimensionality reduction on original data to enhance the subsequent performance of any multilabel classifier. A probabilistic class saliency estimation approach is introduced for computing saliency-based weights for all instances. We use the weights to redefine the between-class and within-class scatter matrices needed for calculating the projection matrix. We formulate six different variants of the proposed saliency-based multilabel LDA (SMLDA) based on different prior information on the importance of each instance for their class(es) extracted from labels and features. Our experiments show that the proposed SMLDA leads to performance improvements in various multilabel classification problems compared to several competing dimensionality reduction methods.

Existing System

? Multi-label databases exist for various real applications, such as Yeast database for protein localization sites prediction, CAL500 database for music retrieval, or medical database for text classification. ? Linear Discriminant Analysis (LDA) and its variants have been widely used to extract discriminant data representations for solving various problems involving supervised dimensionality reduction, e.g., in human action recognition, biological data classification , and facial image analysis. ? In this paper, we propose a novel method for multi-label data classification based on a probabilistic approach that is able to estimate the contribution of each data item to the classes it belongs to by taking into account prior information encoded using various types of metrics.

Disadvantages

? Compared to single-label problems, the characteristics of multilabel problems are more complicated and unpredictable. ? Moreover, different classes typically contain a varying number of data items, leading to class-imbalanced problems. ? The problem of multilabel learning (MLL) has been widely studied and various multilabel classifiers have been suggested. ? In multilabel LDA (MLDA) and its variants, these problems have been tackled by introducing different weights to take into account the label and/or feature correlation of different items. ? We formulate a novel SMLDA method that uses the saliency-based weights in the scatter matrices and can alleviate the problems related to imbalanced datasets.

Proposed System

• The proposed method is based on a probabilistic model for defining the weights of individual samples in a weighted multi-label LDA approach. • The proposed Saliency-based weighted Multi-label LDA approach is shown to lead to performance improvements in various multi-label classification problems. • We compare our proposed approach to related methods on 10 diverse multi-label data sets, and the results show considerable improvements in multilabel classification tasks using our approach. • We introduce the general concepts of saliency estimation and the probabilistic saliency estimation approach needed to develop the proposed method.

Advantages

? We integrate different label and feature information previously used as weights in dimensionality reduction to SMLDA by using them as prior information for probabilistic saliency estimation and show experimentally that our approach leads to a better performance. ? The performance is affected severely due to the imbalance of input datasets. ? Although weighted LDA algorithms enhance the performance in singlelabel classification tasks compared to traditional LDA, such variants are still not directly applicable for multilabel classification tasks. ? In all test cases by both classifiers and any evaluation metric, the average performance of the proposed approach is better.

Download DOC Download PPT