Classification of Sexual Harassment on Facebook Using Term Weighting Schemes
Facebook is the largest Social Network Service, and its users are growing rapidly. Facebook has become one of the main sources of information for individuals and organizations; and this exponential increase of information has raised the issue of information security. In United States alone, 62% of online abuses occurred through Facebook and the most common form of online abuse is sexual harassment with 44%. Victims to online sexual harassment are living under pressure, because the harasser has an ability to propagate messages at any time under any identity. Several content filtering tools for web-based especially Facebook has been proposed. Most of these approaches are not suitable and has limitations when applied to current Social Network Services such as Facebook. As a result, the content-based technique which includes deeper understanding of the semantics of text would probably perform better to forbid illegal post contents. In this project, three terms weighting schemes namely Entropy, TF-IDF, and Modified TF-IDF are used as feature selection process in filtering Facebook posts. The performance of these techniques will be examined via datasets, and the accuracy of their result is measured by Support Vector Machine (SVM). Entropy, TF-IDF, and Modified TF-IDF are judged based on accuracy, precision, recall and F score. Results showed that Modified TF-IDF performed better than Entropy and TFIDF. It is hoped that this study would give other researchers an insight especially who want to work with similar area.