South China Sea Conflicts Classification Using Named Entity Recognition (NER) and Part-of-Speech (POS) Tagging
Keywords:Named Entity Recognition, Part-of-Speech, Machine Learning, text classification
Internet connects everyone to everything globally. The existence of Internet eases people in completing daily tasks. Thanks to Internet, information is being digitalized and spread openly to the public. Online news articles not only provide us with useful and reliable information and reports, it also eases information extraction and gathering for research purposes especially in Natural Language Processing (NLP) and Machine Learning (ML). The topics regarding the South China Sea have been popular lately due to the rise of conflicts between several countries claim on the islands in the sea. Gathering data through Internet and online sources proves to be easy, but to process a huge amount data and to identify only useful information manually takes a longer time to complete. Extracting important features from a text document can be done by using one or a combination of feature extraction methods. Relevant information and the classification of news articles in relation to the conflicts in South China Sea need to be done. In this paper, a model is proposed to use Named Entity Recognition (NER) that search for and classifies important information regarding to the conflicts. In order to do that, a combination of Part-of-Speech (POS) and NER are needed to extract type of conflicts from the news.Â This study also claims to classify news by using Conditional Random Field (CRF) algorithm and Multinomial NaÃ¯ve Bayes (MNB) as classification methods by training and testing the data.