Spammer Detection in tweeter dataset

      

ABSTARCT :

With the increased popularity of online social networks, spammers find these platforms easily accessible to trap users in malicious activities by posting spam messages. In this work, we have taken Twitter platform and performed spam tweets detection. To stop spammers, Google SafeBrowsing and Twitter’s BotMaker tools detect and block spam tweets. These tools can block malicious links, however they cannot protect the user in real-time as early as possible. Thus, industries and researchers have applied different approaches to make spam free social network platform. Some of them are only based on user-based features while others are based on tweet based features only. However, there is no comprehensive solution that can consolidate tweet’s text information along with the user based features. To solve this issue, we propose a framework which takes the user and tweet based features along with the tweet text feature to classify the tweets. The benefit of using tweet text feature is that we can identify the spam tweets even if the spammer creates a new account which was not possible only with the user and tweet based features. We have evaluated our solution with four different machine learning algorithms namely - Support Vector Machine, Neural Network, Random Forest and Gradient Boosting. With Neural Network, we are able to achieve an accuracy of 91.65% and surpassed the existing solution by approximately 18%.

EXISTING SYSTEM :

? Many existing solutions, there are very few comprehensive solutions that can be used for blocking spam tweets in real-time. ? A work studies to detect spammers who post at least a tweet which contains unrelated URL with the real content of the tweet. ? It seems that our approach is much simple than the previous approaches, but later we will show its effectiveness. ? It has various forms and definitions depend on the type of the network. With millions of users across worldwide, Twitter provides a variety of news and events. ? However, with the ease of dissemination of news, and allowing users to discuss the stories in their status, these services also open opportunities for another kind of spam.

DISADVANTAGE :

? They used word vector to train their model, but they have not explored user or tweet based features to address the problem. ? In this paper, we give a framework based on different machine learning approach that deals with various problems including accuracy shortage, time lag(BotMaker) and high processing time to handle thousands of tweets in 1 sec. ? The benefit of using these words based on their entropy score in the feature-set is that we were able to reduce uncertainty in the prediction outcome as these words have a different impact of frequency count in spam and non-spam tweets. ? We will consolidate these three approaches to handle Spam Drift problem.

PROPOSED SYSTEM :

• In this study, the proposed spammer detection classifies accounts into a spammer or non-spammer by studying/identifying user behavior and tweet-based features (number of followers, following, mentions and hashtag). • In this study, we propose a new approach which follows rules of Twitter to identify spammer. • One work proposed a novel approach to detect spambot in Twitter. • The tweets usually contain malicious link. It proposed graph-based feature and content-based feature. • The proposed approach, although it seems simpler than the other work, it shows promising approach.

ADVANTAGE :

? This research used information gain to determine the ranking of important features. ? We will measure the Twitter spam detection performance on our dataset by using four machine learning algorithms, Support Vector Machine with kernel, Neural Network, Gradient Boosting and Random Forest. ? But one of the weak aspects of BotMaker is that it fails to protect a victim from new spam, i.e. it is not an efficient tool for real-time spam tweets detection. ? The application is built to evaluate the performance comparing C5.0 algorithm. ? Some comparisons have been performed as well, and Naïve Bayes returns much better performances comparing the other classification algorithms.

Download DOC Download PPT

We have more than 145000 Documents , PPT and Research Papers

Have a question ?

Chat on WhatsApp