Automatic detection of students’ engagement during online learning: A bagging ensemble deep learning approach
Abstract
The COVID-19 pandemic has reshaped education and shifted learning from in-person to online. While this shift offers advantages such as liberating the learning process from time and space constraints and enabling education to occur anywhere and anytime, a challenge lies in detecting student engagement during online learning due to limited interaction. Student engagement, defined as the active involvement of students in the educational journey, is a critical factor influencing the overall learning experience. This research addresses this challenge by proposing a model using bagging (bootstrap aggregating) ensemble learning applied to 1-dimensional convolutional neural networks (1D CNN), 1-dimensional residual networks (1D resnet), and hybrid ensemble deep learning models. Utilizing the daisee dataset, our findings indicate that the bagging ensemble of the 1D CNN model achieves 93.25% accuracy, surpassing the individual model by 3.25%. The deep learning ensemble bagging attains 93.75%, outperforming the unique 1D resnet model by 3.5%. Additionally, the hybrid ensemble bagging achieves the highest accuracy of 94.25%, a 1% improvement over the 1D CNN model and a 0.5% increase over the 1D resnet model.
Existing System
They mainly make two contributions: 1. For artificial tagging, two video lengths are used, that is, a long clip of 60 seconds and a short clip of 10 seconds. Experiments have proven that people have a better understanding of 10-second short clip videos. 2. They identify student engagement by extracting manual features and use the classifier to classify engagement. The experiment results show that, in the short video binary classification task of 10 seconds, the recognition result of the machine learning model is equivalent to the level of manual annotation. It is clear that the method of machine learning can be used for automatic recognition of learning participation. Since then, domestic and foreign scholars have carried out a series of more in-depth studies on the prediction of student engagement. The main research methods include frame-based methods, space-time-based features methods, and multi-modal features-based methods. Besides, spatio-temporal-based methods are sometimes used in combination with the other two methods.
Disadvantages
Need for large datasets: deep learning models typically require large volumes of labeled data to train effectively. If the dataset is insufficient or imbalanced, the model's performance might degrade, leading to inaccurate engagement predictions. High computational requirements: deep learning models, particularly ensemble methods like bagging, can be computationally intensive. This requires considerable hardware resources, which can be costly, especially for real-time applications or in resource-limited environments. Difficulty in debugging: in ensemble models, particularly those with many deep learning components, identifying which part of the model is making a wrong prediction or why the system fails can be challenging. Bias and fairness: deep learning models can inadvertently learn biases from the data, leading to biased engagement predictions. For instance, a model might unfairly assess certain demographic groups as less engaged based on skewed training data, which can result in unfair outcomes.
Proposed System
The system integrates multimodal data inputs, including visual (e.G., Facial expressions, eye tracking), auditory (e.G., Speech analysis), and behavioral (e.G., Mouse and keyboard interactions) signals to capture a comprehensive view of student engagement. These diverse data sources are processed through a deep learning pipeline that involves multiple stages, including data collection, preprocessing, feature extraction, and model training. At the core of the system is a bagging ensemble approach, which combines the outputs of multiple deep learning models to improve prediction accuracy and reduce overfitting. This ensemble method helps ensure that the engagement detection is robust across varying student behaviors and learning contexts. Specifically, the system utilizes different types of models, such as convolutional neural networks (cnns) for image data (e.G., Facial expression recognition), recurrent neural networks (rnns) or long short-term memory (LSTM) networks for sequential behavioral data, and other deep learning architectures tailored for different input modalities. These models are trained on labeled datasets to classify student engagement into categories (e.G., Engaged, disengaged) or provide continuous engagement scores.
Advantages
Robustness to overfitting: by combining the predictions of multiple models, bagging reduces the risk of overfitting that individual models, especially deep learning models, might suffer from. This is particularly useful in educational environments, where engagement data can be noisy and diverse. Adaptive learning: continuous engagement tracking enables adaptive learning systems that can adjust course materials, pace, or teaching methods based on real-time engagement data, leading to more personalized learning experiences. Comprehensive engagement detection: by processing various input signals (audio, visual, behavioral), deep learning models can detect subtle cues of engagement (or disengagement) that may be missed by human observers or simpler machine learning methods. Automatic feedback generation: by automatically identifying disengaged students, the system can prompt instructors to intervene, or generate automated notifications and suggestions for students, enhancing the overall learning process without requiring constant human oversight.
