Online Recruitment Fraud (ORF) Detection Using Deep Learning Approaches
Abstract
Most companies nowadays are using digital platforms for the recruitment of new employees to make the hiring process easier. The rapid increase in the use of online platforms for job posting has resulted in fraudulent advertising. The scammers are making money through fraudulent job postings. Online recruitment fraud has emerged as an important issue in cybercrime. Therefore, it is necessary to detect fake job postings to get rid of online job scams. In recent studies, traditional machine learning and deep learning algorithms have been implemented to detect fake job postings; this research aims to use two transformer-based deep learning models, i.E., Bidirectional encoder representations from transformers (BERT) and robustly optimized bert-pretraining approach (roberta) to detect fake job postings precisely. In this research, a novel dataset of fake job postings is proposed, formed by the combination of job postings from three different sources. Existing benchmark datasets are outdated and limited due to knowledge of specific job postings, which limits the existing models’ capability in detecting fraudulent jobs. Hence, we extend it with the latest job postings. Exploratory data analysis (eda) highlights the class imbalance problem in detecting fake jobs, which tends the model to act aggressively toward the minority class. Responding to overcome this problem, the work at hand implements ten top-performing synthetic minority oversampling technique (smote) variants. The models’ performances balanced by each SMOTE variant are analyzed and compared. All implemented approaches are performed competitively. However, BERT+SMOBD SMOTE achieved the highest balanced accuracy and recall of about 90%.
Existing System
Employment scam is one of the serious issues in recent times addressed in the domain of online recruitment frauds (ORF). In recent days, many companies prefer to post their vacancies online so that these can be accessed easily and timely by the job-seekers. However, this intention may be one type of scam by the fraud people because they offer employment to job-seekers in terms of taking money from them. Fraudulent job advertisements can be posted against a reputed company for violating their credibility. These fraudulent job post detection draws a good attention for obtaining an automated tool for identifying fake jobs and reporting them to people for avoiding application for such jobs. For this purpose, machine learning approach is applied which employs several classification algorithms for recognizing fake posts.
Disadvantages
High computational cost: deep learning models, especially those used for complex fraud detection, require significant computational resources for training and inference. This can lead to high operational costs in terms of hardware, energy consumption, and time, especially for real-time detection. Data privacy and security concerns: fraud detection systems often require access to large datasets containing sensitive user information, such as personal details, job history, and communication logs. Ensuring privacy and compliance with data protection regulations (e.G., GDPR, CCPA) can be challenging and requires additional safeguards. Imbalanced data: fraudulent activities are typically much less common than legitimate activities, leading to imbalanced datasets. Deep learning models may struggle with this class imbalance, often resulting in poor performance in detecting the minority fraudulent class without specialized techniques (e.G., Oversampling, class weights). Model interpretability and transparency: deep learning models are often considered "black boxes," meaning that it can be difficult to interpret or explain their decision-making process. This lack of transparency is a major disadvantage when trying to understand why a particular recruitment attempt is flagged as fraudulent, especially in critical or legal contexts.
Proposed System
The target of this study is to detect whether a job post is fraudulent or not. Identifying and eliminating these fake job advertisements will help the job seekers to concentrate on legitimate job posts only. In this context, a dataset from kaggle is employed that provides information regarding a job that may or may not be suspicious. A. Implementation of classifiers in this framework classifiers are trained using appropriate parameters. For maximizing the performance of these models, default parameters may not be sufficient enough. Adjustment of these parameters enhances the reliability of this model which may be regarded as the optimised one for identifying as well as isolating the fake job posts from the job seekers. B. Performance evaluation metrics while evaluating performance skill of a model, it is necessary to employ some metrics to justify the evaluation. For this purpose, following metrics are taken into consideration in order to identify the best relevant problem-solving approach. Accuracy is a metric that identifies the ratio of true predictions over the total number of instances considered. However, the accuracy may not be enough metric for evaluating model‘s performance since it does not consider wrong predicted cases if a fake post is treated as a true one, it creates a significant problem. Hence, it is necessary to consider false positive and false negative cases that compensate to misclassification.
Advantages
Improved accuracy: machine learning algorithms can be trained on large datasets to detect patterns and identify features that are indicative of fake job postings. This can lead to more accurate and reliable predictions, reducing the number of false positives and false negatives. Efficiency: machine learning models can process large volumes of data much more quickly and efficiently than manual review by human moderators, reducing the time required to review job postings and improving the overall efficiency of the job posting process. Adaptability: machine learning models can adapt to new fraud techniques and patterns as they emerge, making them more effective at identifying and preventing fake job postings. Consistency: machine learning algorithms can provide consistent results across all job postings, eliminating the potential for subjectivity and inconsistencies that can arise with human moderators.
