The Garissa Attack: Network and Sentiment Analysis of Twitter Data (Part 1)

By Chris Orwa
Data Science Lab
  Published 06 Oct 2015
Share this Article

By Sidney Ochieng

On 2 April 2015, Al Shabaab gunmen stormed the Garissa University College, killing 147 people, and injuring about 79 more. “Garissa” had been a keyword  under the  data collection radar of the Umati project; this post explains the analysis on the data that was captured between April 2nd and April 7th, 2015, it does not however cover other hashtags that emerged later such as “#GarissaAttack”.  During that time, we collected over 400,000 tweets.

Sentiment Analysis
As an initial step, we applied sentiment analysis over the datasets using the indico.io API. This allowed us to give every tweet a score of between 1 and 0 where 1 is a positive sentiment, 0 is negative and 0.5 is neutral. We used this as a rough gauge of how conversations/reactions change over time, allowing us to find areas to examine. Next, we used this sentiment data to create a moving average graph that showed how sentiment changed over time.

image04

Blue = raw data
Red = unique tweets

As indicated by the graph, there was a rather sharp decline between the 2nd and the 4th. In this article, we will refer to this decline as ‘the dip’. We focused on the dip, dividing the dataset into two; before the dip and after the dip.To understand the context within which the conversations took place we employed two data-mining techniques: frequent terms and association mining, to investigate the most relevant and frequently used terms in conversations.Frequent terms is a simple yet powerful technique which performs a count of the words, or in our case which users, that were most used in our dataset. To reduce the levels of noise we drop stop words i.e words that add little meaning to a sentence such as “the”, “a” and “an” but are used a lot in natural language.

Association Mining
Next, we take some of the interesting terms from the frequent terms and run a process called association mining, which is a procedure meant to find frequent patterns, correlations, associations or causal structures from datasets. In the case of textual data it measures the correlation (how often certain words appear alongside others) between two word. If we take two words, example, "garissa" and "university" and the correlation is 0.6, it means that every time the word "garissa" the word "university" 60% of the time..Frequent terms and association mining are used to investigate the most relevant and frequently used terms in conversations.

The initial sentiment after news of the attack broke to the public was mostly negative. This may be attributed to it being a grave occurrence, with people expressing their dismay at the violence. We see this in the most commonly used words: ”killed", "lives" , "lost" ,"sad" in the moving average dip. At around 16:20:37 on April 03, we see sentiment begin to trend upward; sympathy and empathy began to appear, as evidenced by the introduction of the #147notjustanumber hashtag which happens after the moving average dip. Other trending terms showcasing a sense of solidarity with the victims included: jesuis african" ,"charliehebdo" ,“families", ”human"

Network Analysis
Next, we created network graphs for keywords. This allows us to view the keywords used in relation with certain terms giving us a means to see what the conversation around terms and how the words were grouped.

Each keyword, or node, is illustrated by a dot. The colour of the node is determined by which cluster the node belongs to. Lines between nodes represent relationships. These lines are called edges. In general, nodes which are related are shown close together, whereas unrelated nodes are shown further apart. We also look at how dense the network is; if very dense, most nodes are connected and it shows a high degree of collaboration and information flow within the network. Nodes with high degrees are easily identified as those with many connections/edges and they appear in the middle of clusters or the network. We examined the network graphs of posts generated during and around the dip.

Consequently, we noted that before and after the dip, certain Twitter users stood out: Harry_styles, zvieneuve, lunarnomad and mamesslidaf. Harry_styles stood out due to the significant response one of his tweets ( shown below) received.

 

image02

 

The resultant network graph from Harry_styles’s tweet is illustrated below.

harrystyles

 

Looking at the others, @lunarnomad is particularly interesting, due to the clusters “religion”, “media” and “blacklivesmatter” that appeared on her network graph. This indicates that  her tweets and the reactions to them were about the coverage of the event as seen from her tweets. She also linked it to events that were happening elsewhere in the United States highlight police brutality against African Americans by using the hashtag #blacklivesmater, a hashtag which first that began in the wake of the July 2013 acquittal of George Zimmerman in the Florida shooting death of African-American teen Trayvon Martin. Lunarnomad’s network graph and tweets are below.

image05

 

image03

The charliehebdo keyword trended as the sentiment began to rise from the dip. This came up as a reaction to what people felt was a world and media indifference to the death of Africans. People were comparing the response to the Garissa attack with that of the shooting of 12 journalists who worked for the Charlie Hebdo magazine in France, in January 2015. We see clusters around media coverage” in the network graph. People also linked the Garissa attack to events that had trended in the United States, such as the protests in Ferguson, Missouri and the Black Lives Matter campaign,  both of which have clusters as indicated in the network graph below (“Ferguson and “BlackLivesMatter” respectively). We also see a new hashtag created: “#africanlivesmatter”.

image00

The application of sentiment analysis to the Garissa attack is part of the larger Umati project. While the goal of Umati is to monitor and analyse dangerous speech, we used data from the Garissa attack to validate the tools we wish to apply to the Umati project monitoring process. With the sentiment analysis described here, the focus was not per se on identifying dangerous speech, but more broadly on how events catalyse online speech, and in turn how computer analytical methods can be used to track and study this speech online (including dangerous speech). From this brief study we noted that events in Kenya can lead to conversations that exceed the geographical boundaries of the country, and can consequently garner discussion globally on issues that Kenyan Twitter users relate with. In other words, while the focus of Umati is on events in Kenya and dangerous speech about these local events, international events can also catalyse positive or negative discussion in the Kenyan online space.

The next step is to finesse the tools and techniques we applied here and focus them on dangerous speech, as opposed to any speech we collected, as we did in this post. Read more about the Umati project here.

Some of the tools used to produce this analysis are available on our github page here.

comments powered by Disqus