When Does Diversity Help Generalization in Classification Ensembles

      

ABSTARCT :

Ensembles, as a widely used and effective technique in the machine learning community, succeed within a key element--``diversity.'' The relationship between diversity and generalization, unfortunately, is not entirely understood and remains an open research issue. To reveal the effect of diversity on the generalization of classification ensembles, we investigate three issues on diversity, that is, the measurement of diversity, the relationship between the proposed diversity and the generalization error, and the utilization of this relationship for ensemble pruning. In the diversity measurement, we measure diversity by error decomposition inspired by regression ensembles, which decompose the error of classification ensembles into accuracy and diversity. Then, we formulate the relationship between the measured diversity and ensemble performance through the theorem of margin and generalization and observe that the generalization error is reduced effectively only when the measured diversity is increased in a few specific ranges, while in other ranges, larger diversity is less beneficial to increasing the generalization of an ensemble. Besides, we propose two pruning methods based on diversity management to utilize this relationship, which could increase diversity appropriately and shrink the size of the ensemble without much-decreasing performance. The empirical results validate the reasonableness of the proposed relationship between diversity and ensemble generalization error and the effectiveness of the proposed pruning methods.

EXISTING SYSTEM :

? Ensemble methods for different classifiers like Bagging and Boosting which combine the decisions of multiple hypotheses are some of the strongest existing machine learning methods. ? Boosting and Bagging provide diversity by sub-sampling or re-weighting the existing training examples. ? One ensemble approach that also utilizes artificial training data is the active learning method introduced in (Cohn, Atlas, & Ladner, 1994). Rather than to improve accuracy, the goal of the committee here is to select good new training examples using the existing training data. ? We do this by rejecting a new classifier if adding it to the existing ensemble decreases its accuracy.

DISADVANTAGE :

? Moreover, we adopt two methods from diversity maximization via composable core sets and change them slightly to make them suitable for pruning problems, namely the Gonzalez’s algorithm (GMA) and local search algorithm (LCS). ? Some researchers hold the view as well that diversity among the members of a team of classifiers is a crucial issue in classifiers’ combination. ? The relationship between the proposed diversity and the ensemble generalization error is investigated and analyzed theoretically, which demonstrates that diversity has different impacts on the ensemble generalization in different ranges. ? Those methods that cannot fix the size of the pruned sub-ensemble might lead to increasing or reducing the size of pruned sub-ensembles and affect their space cost.

PROPOSED SYSTEM :

• Our proposed work aims to show that DECORATE can be useful in many ways other than improving classification accuracy in a purely supervised setting. • We propose several extensions to our preliminary work; and an introduction and related work for each of our extensions is provided separately in the chapter on proposed work. • The goal of our proposed research is to show that DECORATE can also be effectively used for (1) active learning, (2) semi-supervised learning, (3) combining active learning with semisupervision, (4) regression, (5) improving class membership probability estimates and (6) relational learning. • We also propose to implement co-training to use as a baseline semi-supervised algorithm for comparison.

ADVANTAGE :

? The performance of a diversity measure might depend on the context of data and the use of diversity . ? It is possible to improve the ensemble generalization performance with a smaller size using ensemble pruning. ? Static pruning techniques focus on identify a small sub-ensemble with good generalization performance, accompanied by expensive computation . ? In this case, diversity needs to be decreased to increase the generalization performance of the ensemble, with the basic idea of keeping the ensemble classifying the corresponding instance correctly. ? It aims to prune an ensemble with rare performance degradation, with the basic idea of utilizing diversity and accuracy simultaneously.

Download DOC Download PPT

We have more than 145000 Documents , PPT and Research Papers

Have a question ?

Chat on WhatsApp