Representation Learning from Limited Educational Data with Crowd sourced Labels

Abstract : Representation learning has been proven to play an important role in the unprecedented success of machine learning models in numerous tasks, such as machine translation, face recognition and recommendation.The majority of existing representation learning approaches often require large amounts of consistent and noise-free labels. However, due to various reasons such as budget constraints and privacy concerns, labels are very limited in many real-world scenarios. Directly applying standard representation learning approaches on small labeled data sets will easily run into over-fitting problems and lead to sub-optimal solutions.Even worse , in some domains such as education, the limited labels are usually annotated by multiple workers with diverse expertise, which yieldsnoises and inconsistency in such crowd sourced labels.In this paper, we propose a novel framework which aims to learn effective representations from limited data with crowd sourced labels. Specifically, we design a grouping based deep neural network to learn embeddings from limited amounts of training samples and present a Bayesian confidence estimator to capture the inconsistency among crowd sourced labels. Furthermore, to expedite the training process, we develop a hard example selection procedure to adaptively pick up training examples that are being misclassified by the current version of the model. Extensive experiments conducted on three real-world educational data sets demonstrate the superiority of our framework on learning representations from limited data with crowd sourced labels,comparing with various state-of-the-art baselines. In addition, we provide a comprehensive analysis on each of the main components of our proposed framework and also introduce the promising results it achieved in our real production to fully understand the proposed framework. To encourage reproducible results, we make our code available online.
 EXISTING SYSTEM :
 ? Representation learning, especially deep learning has largely advanced the field of machine learning and its applications. ? Such success typically requires a large amount of labeled data, which is usually unavailable in many domains. ? Various types of techniques have been developed to enable learning with limited labeled data and next we will review representative techniques.
 DISADVANTAGE :
 ? This problem becomes more critical in building ML models in educational scenarios. The difficulties are two-fold: first, label annotation in educational scenarios usually requires more domain knowledge compared to standard crowd sourcing tasks such as image classification, part-of-speech tagging, etc. ? It is more ambiguous when labeling a 60-min class (whether the class quality is good or bad) than annotating images. This will lead to very inconsistent labels. Second, labeling each sample in educational scenarios requires much more efforts than standard annotation tasks. For example, it may take a crowd worker less than 1 second to annotate an image while the worker has to watch a 60-min video before determining the class quality
 PROPOSED SYSTEM :
 ? we study the problem of representation learning with crowd sourced labels. ? We design a novel representation learning framework RLL for crowd sourced labels under the limited and inconsistent settings. ? Experimental results on two real-world applications demonstrate ? (1) the proposed framework outperforms the representative baselines; and ? (2)it is necessary to address the limited and inconsistent label problems simultaneously.
 ADVANTAGE :
 ? Transfer learning allows to utilize knowledge in the source domain to improve the performance of learning tasks in the target domain. ? On the other hand, the training process becomes extremely long due to the fact that a complete combination of groups may be incredibly large. ? If we feed all the groups into the DNN, the training process is not efficient and does not guarantee an optimal performance. ? Our proposed representation learning framework, we design a grouping based deep neural architecture to generate hundreds of thousands of training instances from only a limited number of labeled data annotated by crowd workers.

We have more than 145000 Documents , PPT and Research Papers

Have a question ?

Mail us : info@nibode.com