Supervised Machine Learning Algorithms for Sentiment Analysis of Bangla Newspaper

Authors

  • Sabrina Jahan Maisha Department of Computer Science and Engineering BGC Trust University Bangladesh "BGC Biddyanagar" Chandanaish Chattogram, Bangladesh
  • Nuren nafisa Department of Computer Science and Engineering Chittagong University of Engineering and Technology Raozan, Chattogram, Bangladesh
  • Abdul Kadar Muhammad Masum Department of Computer Science and Engineering International Islamic University Chittagong Kumira, Chattogram, Bangladesh

DOI:

https://doi.org/10.11113/ijic.v11n2.321

Keywords:

Bangla Newspaper, Document level, Natural Language Processing, Sentiment Analysis, Supervised Machine Learning Algorithm

Abstract

We can state undoubtedly that Bangla language is rich enough to work with and implement various Natural Language Processing (NLP) tasks. Though it needs proper attention, hardly NLP field has been explored with it. In this age of digitalization, large amount of Bangla news contents are generated in online platforms. Some of the contents are inappropriate for the children or aged people. With the motivation to filter out news contents easily, the aim of this work is to perform document level sentiment analysis (SA) on Bangla online news. In this respect, the dataset is created by collecting news from online Bangla newspaper archive.  Further, the documents are manually annotated into positive and negative classes. Composite process technique of “Pipeline” class including Count Vectorizer, transformer (TF-IDF) and machine learning (ML) classifiers are employed to extract features and to train the dataset. Six supervised ML classifiers (i.e. Multinomial Naive Bayes (MNB), K-Nearest Neighbor (K-NN), Random Forest (RF), (C4.5) Decision Tree (DT), Logistic Regression (LR) and Linear Support Vector Machine (LSVM)) are used to analyze the best classifier for the proposed model. There has been very few works on SA of Bangla news. So, this work is a small attempt to contribute in this field. This model showed remarkable efficiency through better results in both the validation process of percentage split method and 10-fold cross validation. Among all six classifiers, RF has outperformed others by 99% accuracy. Even though LSVM has shown lowest accuracy of 80%, it is also considered as good output. However, this work has also exhibited surpassing outcome for recent and critical Bangla news indicating proper feature extraction to build up the model.

Downloads

Published

2021-10-31

How to Cite

Maisha, S. J., nafisa , N. ., & Muhammad Masum, A. K. (2021). Supervised Machine Learning Algorithms for Sentiment Analysis of Bangla Newspaper. International Journal of Innovative Computing, 11(2), 15–23. https://doi.org/10.11113/ijic.v11n2.321

Issue

Section

Computer Science