Classification of Online Grooming on Chat Logs Using Two Term Weighting Schemes

Nur Rafeeqkha Sulaiman; Maheyzah Md. Siraj

doi:10.11113/ijic.v9n2.239

Authors

Nur Rafeeqkha Sulaiman School of Computing Universiti Teknologi Malaysia 81310 UTM Johor Bahru, Johor, Malaysia
Maheyzah Md. Siraj School of Computing Universiti Teknologi Malaysia 81310 UTM Johor Bahru, Johor, Malaysia

DOI:

https://doi.org/10.11113/ijic.v9n2.239

Keywords:

TF.IDF.ICSdF, FRFS, SVM, online grooming, classification

Abstract

Due to the growth of Internet, it has not only become the medium for getting information, it has also become a platform for communicating. Social Network Service (SNS) is one of the main platform where Internet users can communicate by distributing, sharing of information and knowledge. Chatting has become a popular communication medium for Internet users whereby users can communicate directly and privately with each other. However, due to the privacy of chat rooms or chatting mediums, the content of chat logs is not monitored and not filtered. Thus, easing cyber predators preying on their preys. Cyber groomers are one of cyber predators who prey on children or minors to satisfy their sexual desire. Workforce expertise that involve in intelligence gathering always deals with difficulty as the complexity of crime increases, human errors and time constraints. Hence, it is difficult to prevent undesired content, such as grooming conversation, in chat logs. An investigation on two term weighting schemes on two datasets are used to improve the content-based classification techniques. This study aims to improve the content-based classification accuracy on chat logs by comparing two term weighting schemes in classifying grooming contents. Two term weighting schemes namely Term Frequency â€“ Inverse Document Frequency â€“ Inverse Class Space Density Frequency (TF.IDF.ICSdF) and Fuzzy Rough Feature Selection (FRFS) are used as feature selection process in filtering chat logs. The performance of these techniques were examined via datasets, and the accuracy of their result was measured by Support Vector Machine (SVM). TF.IDF.ICSdF and FRFS are judged based on accuracy, precision, recall and F score measurement.

Classification of Online Grooming on Chat Logs Using Two Term Weighting Schemes

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

IJIC