The Umati project emerged out of concern that mobile and digital technologies may have played a catalyzing role in Kenyan 2007/08 post-election violence. The project seeks to better understand the use of dangerous speech in the Kenyan online space. The project monitors particular blogs, forums, online newspapers, Facebook and Twitter. Online content monitored includes tweets, status updates and comments, posts, and blog entries.
The Umati Project has relied on a manual, largely human process for collecting and categorizing online hate speech. Human input proved necessary for accurate reviewing of local vernacular languages and local vocabulary, to create a database of inflammatory speech. More on the methodology used can be found in the Umati Phase 1 Final Report.
Now in its second phase, Umati is employing Machine Learning (ML) techniques and Natural Language Processing (NLP) to detect, collect, select, and sort hate and dangerous speech from the Kenyan online space. We are looking to automate aspects of the current Umati process in order to improve the scalability of the system.