Skip to main content
placeholder image

A Comprehensive Pre-processing Approach for High-Performance Classification of Twitter Data with several Machine Learning Algorithms

Conference Paper


Abstract


  • Producing an average of five hundred million tweets per date, Twitter has grown as one of the most comprehensive platforms of data interpretation for the researchers. Beforehand, various researches have been conveyed on twitter data i.e., sentimental analysis. Nevertheless, not much research has been performed to classify the tweets in terms of categories so that tweets can be spread as per user preferences. In this research, we started by constructing four comprehensive classes: politics, sports, crime and natural. Next, we implemented our proposed preprocessing model on the raw twitter dataset. After that, we implemented different machine learning techniques (Random Forest, K-Nearest Neighbors, Naive Bayes, Logistic Regression, Decision Tree and Support Vector Machine) to classify the twitter data. Finally, we examined the outcomes with and without preprocessing in terms of sensitivity, specificity, and accuracy. We found that our proposed preprocessing model enhanced the performance of all the machine learning classifiers.

Publication Date


  • 2020

Citation


  • Sarker, A., Islam, M. R., & Srizon, A. Y. (2020). A Comprehensive Pre-processing Approach for High-Performance Classification of Twitter Data with several Machine Learning Algorithms. In 2020 IEEE Region 10 Symposium, TENSYMP 2020 (pp. 630-633). doi:10.1109/TENSYMP50017.2020.9230590

Scopus Eid


  • 2-s2.0-85096423862

Web Of Science Accession Number


Start Page


  • 630

End Page


  • 633

Abstract


  • Producing an average of five hundred million tweets per date, Twitter has grown as one of the most comprehensive platforms of data interpretation for the researchers. Beforehand, various researches have been conveyed on twitter data i.e., sentimental analysis. Nevertheless, not much research has been performed to classify the tweets in terms of categories so that tweets can be spread as per user preferences. In this research, we started by constructing four comprehensive classes: politics, sports, crime and natural. Next, we implemented our proposed preprocessing model on the raw twitter dataset. After that, we implemented different machine learning techniques (Random Forest, K-Nearest Neighbors, Naive Bayes, Logistic Regression, Decision Tree and Support Vector Machine) to classify the twitter data. Finally, we examined the outcomes with and without preprocessing in terms of sensitivity, specificity, and accuracy. We found that our proposed preprocessing model enhanced the performance of all the machine learning classifiers.

Publication Date


  • 2020

Citation


  • Sarker, A., Islam, M. R., & Srizon, A. Y. (2020). A Comprehensive Pre-processing Approach for High-Performance Classification of Twitter Data with several Machine Learning Algorithms. In 2020 IEEE Region 10 Symposium, TENSYMP 2020 (pp. 630-633). doi:10.1109/TENSYMP50017.2020.9230590

Scopus Eid


  • 2-s2.0-85096423862

Web Of Science Accession Number


Start Page


  • 630

End Page


  • 633