Classification of Online Grooming on Chat Logs Using Two Term Weighting Schemes
Keywords:TF.IDF.ICSdF, FRFS, SVM, online grooming, classification
Due to the growth of Internet, it has not only become the medium for getting information, it has also become a platform for communicating. Social Network Service (SNS) is one of the main platform where Internet users can communicate by distributing, sharing of information and knowledge. Chatting has become a popular communication medium for Internet users whereby users can communicate directly and privately with each other. However, due to the privacy of chat rooms or chatting mediums, the content of chat logs is not monitored and not filtered. Thus, easing cyber predators preying on their preys. Cyber groomers are one of cyber predators who prey on children or minors to satisfy their sexual desire. Workforce expertise that involve in intelligence gathering always deals with difficulty as the complexity of crime increases, human errors and time constraints. Hence, it is difficult to prevent undesired content, such as grooming conversation, in chat logs. An investigation on two term weighting schemes on two datasets are used to improve the content-based classification techniques. This study aims to improve the content-based classification accuracy on chat logs by comparing two term weighting schemes in classifying grooming contents. Two term weighting schemes namely Term Frequency â€“ Inverse Document Frequency â€“ Inverse Class Space Density Frequency (TF.IDF.ICSdF) and Fuzzy Rough Feature Selection (FRFS) are used as feature selection process in filtering chat logs. The performance of these techniques were examined via datasets, and the accuracy of their result was measured by Support Vector Machine (SVM). TF.IDF.ICSdF and FRFS are judged based on accuracy, precision, recall and F score measurement.