Cyberbully Detection Using Term Weighting Scheme and Naïve Bayes Classifier
DOI:
https://doi.org/10.11113/ijic.v10n1.254Keywords:
Cyberbullying, text mining, detectionAbstract
The internet especially social media has been a major platform where people interact with each other. We are able to interact with each other regardless of time and place because of the advancement of technology. Unfortunately, not all of the interaction that goes on are good or positive. One of the negative interaction that can happen online is cyberbullying which has rapidly increase throughout the years, whether it be through social media, emails or texting. Therefore, it is important to prevent cyberbullying from occurring which is why this research is done. Detection the presence of cyberbullying is one if the main issue in avoiding it from happening. Cyberbullying detection can be challenging because the many languages used in the world, most of the time slangs and informal languages are used and special characters like emoji are also used during online conversation. The aim of this research is to detect the presence of text cyberbullying from online post. Two term weighting schemes and two classification algorithms are compared in this research. The weighting schemes used namely Entropy and Term Frequency -  Inverse Document Frequency (TF-IDF) for feature selection and Naïve Bayes algorithm is used and compared with Support Vector Machine (SVM) algorithm. As a result, it shows that Naïve Bayes classifier yields a better accuracy when used with TF-IDF which is 97.60%. Hopefully this research is able give other researchers an insight, particularly to those who are interested in a similar area.