Analysing the Corruption Hashtags on Twitter

By Sidney Ochieng
Data Science Lab
  Published 11 May 2016
Share this Article

On February 24, 2017 the hashtags #whatcorruptioncostsus and #costofcorruption began to trend on Twitter as people took to the platform to express their frustration with the current state of affairs by pointing out the negative and dire effects of corruption, down to a personal level. .

IHub Research, through the Umati project has been monitoring Twitter, and other social media platforms, for almost 2 years now using inbuilt software to collect conversational data around hashtags and users that may lead to conversations that contain hate speech. To better understand the ebbs and flows of online conversations in Kenya - many of which are often emotive or reactive and sometimes lead to hate speech - we have also started analysing datasets around trending discussions, such as this one on corruption. It is interesting to note that the conversation lasted over 24 hours.

The data was collected over 3 days starting the evening of the 24th. We collected 21,606 tweets during that period. The data was collected using Twitter’s Streaming API allowing for real time collection however due to limits on the API this may not necessarily represent the entirety of every tweet generated during the collection period. For this analysis, we only look at the text of tweets and none of the accompanying metadata. Further work may consider using some of that data to further investigate drivers of conversation.

The first step was to filter out duplicate tweets, leaving us with with 3,599 tweets for the analysis. (This already tells us that the conversation on these hashtags are usually amplifications of what others were saying, primarily in the form of retweets. This may be due to the fact that issues being raised are shared by a majority of the people who choose to amplify a few core messages.

The next step was data cleaning to ensure that any unnecessary noise is removed before proceeding with the analysis.  In this stage, we  converted the entirety of text to lowercase, removed punctuation, unnecessary whitespace, stopwords (such as ‘the’, ‘at’, ‘we’ etc) and hyperlinks. The hashtags themselves are also removed  since all tweets contain them and conversation of  non-UTF8 characters to UTF8, this is because non-UTF8 characters break the algorithms we use.

Finally we convert all this to a term document matrix, a matrix that describes the frequency of terms that occur in a collection of documents, or in our case tweets. From this, we found the most frequently occurring words in our dataset. These are represented below as a word cloud.

Open in new tab to view full sized image Corruption word cloud

We looked at some of the words that were mentioned most frequently along with the word “corruption” and created networked word graphs for some of them. We then  chose four terms for further analysis: youth, voting, leaders, 2017. These terms were chosen based on a combination of the amount of data generated on each, disparity to the corruption issues of the day and the type of networks they produced.


This word appeared 117 times in our dataset. Looking at the network graph around it you get the sense that people find their leaders to be tribal, greedy and corrupt as seen from the sample tweets below:

It thrives because these gluttonious leaders have mastered the art of piting us against each other through tribal coccon #CostofCorruption

#CostofCorruption is recycling the same greedy selfish leaders back into power who keep on stealing from us

All our leaders r corrupt to the core, let's suggest solutions without talking about political parties n present leadership#CostOfCorruption

Others pointed out that leadership is someone that we as citizenry choose and it’s a reflection of who we are as society and who we value.

#CostofCorruption got all pple whining an' yet they still tick ✔ the same corrupt, greedy an' non-conscience leaders..on the ballot paper

Open in new tab to view full sized image Word cloud for the term leader

most painful thing is that we still vote for this corrupt leaders #CostOfCorruption

Some people pointed that the only way to change the leadership was through election and encouraged people to register to vote.

#LetsFightCorruptionBy by electing our leaders based on factual and good track records rather than being tribal

#LetsFightCorruptionBy registering as voters and electing the right people into positions of leadership #IEBC #2017

#LetsFightCorruptionBy voting out the corrupt leaders on the coming elections

#LetsFightCorruptionBy by electing our leaders based on factual and good track records rather than being tribal

Notice the hashtag #costofcorruption was used to show the ills they observed and #letsfightcourruption was used to suggest how to go about correcting the ills that people brought up.



Open in new tab to view full sized image Network graph for the term youth

This word appeared 93 times in the dataset. The #costofcorruption hashtag was trending just the story of the Youth Fund missing money broke. So it came up prominently in this subset. However the key point in in this dataset was youth unemployment and how corruption was preventing the youth from finding jobs.

#CostofCorruption graduates still jobless without opportunities yet still being fined heavily for defaulting to repay HELB

Corruption Deprives the Kenyan Youths of 250,000 Jobs per Financial Year.Corruption Is An enemy of Development #CostOfCorruption

#CostofCorruption is thousands of youths go without employment, poor quality service delivery, the poor are getting poorer, list is endless

Others highlighted the fact that the youth did not have role models to follow:

Youth role models who would still[sic] from the sick & the poor #CostofCorruption

Kenyan youth want riches by hook or crook because role models are Mike Sonko, WSR & Steve Mbogo. #CostofCorruption

Because of the lack of jobs the youth take shortcuts, such as gambling, that may encourage criminality and debt.

#CostOfCorruption has led to so many youths like me, seek daily bread via sportpesa betway betin mcheza elitebet etc

CostofCorruption youths watabidi waingie sportpesa wapate payslip #CorruptionIsFlowingInOurBlood.


Open in new tab to view full sized image Network graph for the term 2017

Combined these words appeared 87 times(Voting- 44, 2017- 43 ). The two words are analysed together because the they were highly correlated.

There was expression of the fact that people always vote on from a tribal standpoint, that this was doing nothing to improve the situation of corruption in the country and needs to change:

#CostofCorruption come 2017 let Kenyans vote,, hii maneno ya tribal coalitions ndio inatumaliza, am sure am more wiser with my vote

#LetsFightCorruptionBy voting out n not electing corrupt leaders even if they are our tribesmates

Open in new tab to view full sized image Network graph for the term voting

Venting about the #CostofCorruption and then voting for tribal kingpins won't help. Voting - tribalism = low #CostofCorruption

Others choose to highlight what they’d want from new leaders in the coming elections:

#LetsFightCorruptionBy voting for candidates who say NO to CORRUPTION and Demonstrate Accountability, Integrity and Transparency.

#LetsFightCorruptionBy voting in leaders of integrity and that should start from kericho n malindi by-elections.

Looking at these 4 terms brings forth a familiar theme: our leaders are failing us, particularly the youth, by using tribalism to divide us leading to unemployment and the only way to change the status quo is via voting.

Research carried out by iHub Research on ICT and Governance in East Africa shows that voting is not the only way that the citizenry interacts with the government to monitor and change behaviour within it. There are several ICT tools, both high tech and lower, for governance to interact with citizenry such as websites, mobile phones, radio, and web applications but these are not used by citizenry because they are not optimistic about action being taken on the issues they raise, thus limiting their use of the applications.

There is a need for the government, at all levels, to do more to inform Kenyans on how best to interact with them so that elections are not seen as the only way citizens employ their democratic rights.

comments powered by Disqus