Quick Summary: Examining Sentiments on Twitter for the 2019 Nigerian Presidential Elections

Somto Enendu
4 min readJan 9, 2019

Twitter sentiment analysis brings a lot of power to the table when it comes to examining and capturing the mood of voters prior to an election. In the age of social media, this is very powerful as many people share their different perspectives, biases and opinions surrounding political discourse, in favor of one candidate or another on social media platforms.

Sentiment Analysis, or Opinion Mining, is a sub-field of Natural Language Processing (NLP) that tries to identify and extract opinions or sentiments within a given text. Doing sentiment analysis on any given text helps to gauge the attitude, opinion and sentiments of its writer based on the computational treatment of subjectivity in the text. This will, for example, prove useful in trying to gauge the sentiments of a sample of electoral voters, and may be a good indication on the way an election will swing.

2019 Nigerian Presidential Elections

The presidential elections in Nigeria holds every four years. The upcoming elections will be held on the 16th of February 2019. There are about 40 candidates that are vying for the Presidential seat, however based on past data and the power of party structures across the country, there are only two candidates that are most likely going to win the elections — the incumbent, President Buhari of the APC and his main opposition, Atiku Abubakar of the PDP.

Twitter Sentiment Analysis on both candidates

Project examine Nigeria is a sentiment analysis project to examine the sentiment of Nigerians on twitter regarding both candidates prior to the elections. The aim of the project is to capture the opinions and sentiments of a small sample of Nigerian electoral voters who are twitter users. This is done by analyzing their tweets on a daily basis over a period of time to see how these opinions are varying or changing over time. These tweets are classified as positive, neutral or negative sentiments using a machine learning algorithm (Logistic Regression)and a dictionary-based sentiment analysis library (Sentimental). The combination of the aforementioned using heuristic rules produces a single output signifying if a tweet is positive, neutral or negative. The methodology used in this project is discussed in brief below:

Methodology

Tweets are analyzed and classified into their labels (positive, negative, neutral) first using Logistic regression — a supervised machine learning algorithm/model. The model is trained with this training dataset from Sentiment140. We first perform typical machine learning methodologies including data pre-processing/cleaning, feature extraction, feature engineering etc. The dataset contains about 1.6 million records, however we used only about 400,000 for the training of the model. The model was tested on unseen records/a test set, and returned an accuracy score of 82.5%.

We also used a simple dictionary-based sentiment analysis system to classify tweets. This system works by returning an integer score based on the words it finds in each tweet that are already in a predefined dictionary. Tweets with more negative words will return a negative score and vice-versa. A score of 0 indicates that such tweet may be neutral.

We then combine the result from these two methods using rule-based methods to determine the final sentiment of the tweet. For each day from the commencement of the project to the election day, we collect tweets at the end of the day using the Twitter API based on the keywords, ‘buhari’ and ‘atiku’ and then classify these tweets into positive, negative or neutral sentiments. However, it is important to note that we only consider tweets in which those keywords are subjects or objects of the tweet. Hence, this helps us filter out tweets that may just be headlines, quotes, articles etc but contain the names of any of the candidates. We use dependency parsing to achieve this — using the linguistic features of the spacy model.

The results of this process is documented daily on Project Examine Nigeria social media page. An aggregation of the results during the time frame of experimentation, could be a good indication in whose favor the election may swing. Of course, there are other factors to consider in order to correctly predict the outcome of an election (including past trends, political affiliations, demographics etc). However, this project helps to discover the public opinions of a sample of electoral voters leading up to the election which may also be an indication of how the larger population will vote.

Discussion

The methodology used in this project is not flawless. An accuracy score of 82.5 indicates that the model has a true prediction power of 82.5%, meaning it also has about an 18% misclassification rate. However, the inclusion of the dictionary-based sentiment analysis system helps to account for the weakness of the machine learning model. Still, the classification power of our system is not perfect.

To improve the system for future use or purposes, deep learning methods (e.g. Word2Vec) can be used for data processing. Other advanced models or algorithms can also be used for classification purposes, to achieve a higher accuracy. The dictionary of words can also be improved by adding more words with negative or positive connotations, that are common to the Nigerian society, especially in political discourse.

You can find the project code here

Visualizations of daily results can be found here

** This post will be updated with the aggregated results after the experiment is completed**

Authors: David Anda & Somto Enendu

--

--