The Trouble With Open Data

By Conrad Akunga
  Published 12 Nov 2012
Share this Article
The Trouble With Open Data

By Conrad Akunga ([email protected])

Unless you have been living under the proverbial rock, you must have come across the term OpenData, either professionally or personally.However as with many novel concepts penetrating the local ICT lexis, much more heat than light has been generated about OpenData.
The premise is simple. Government is the largest collector and custodian of data. This data is usually sitting ignored in silos across government bodies. Suppose this raw data were exposed to citizens, so that innovative applications can be built to dissect, view and extrapolate additional intelligence using this data?
Kenya Open Data Launch The President of Kenya, Mwai Kibaki, visits the Ushaidi and iHub Nairobi stand at the Kenya Open Government Data Portal launch in July 2011
The ICT board has long been a stellar champion of this concept, the genesis of which were laid out in a ICT policy discussion mailing list, KICTANET, culminating in the development and launch of the portal, Since the launch of the portal, there have been several initiatives to attempt to make use of the data and derive new intelligence. However, the question arises – have we really fully exploited the concept of OpenData? I would postulate that no, we have not nearly come close to harnessing this to its full. And here is why.
Seeing The Trees, Missing the Forest
For some reason, whether by accident or by design, the definition of OpenData seems to have been somehow restricted to government data. This is an unnecessary and arbitrary restriction. Businesses can also harness this data and dissect, transform and combine it to yield additional intelligence. Supermarkets for example collect a lot of information every time a customer is at the till. Consider what business intelligence can be derived if this data is cross-­‐referenced with anonymized customer information. A forward thinking business can mine this information and the resulting intelligence can guide it in various ways:
  1. Customer segmentation
  2. Product inventory level optimization
  3. Product matching
  4. Purchase patterns
The business does not need to invest in its own capacity for this. It can simply upload its data publicly and incentivize developers and statisticians to mine this information and provide new insights.
Dependence On Government Benevolence
One of the first questions that arose after the launch of the OpenData portal was how does one request data that is not currently there? The official answer was to write a “request” for what datasets one requires and await feedback. This, unfortunately, is not good enough. Not nearly good enough. Because this means someone decides whether or not to accede to a request for data. This in effect means the data on the portal is subject to somebody’s benevolence. This is neither scalable, sustainable nor transparent. For instance, during the Olympics Kenya sent an absurdly large delegation of officials to London. I wrote a request for the following:

1. Who exactly traveled to London at taxpayer expense?

2. How much was spent on travel, accommodation, per diems and otherexpenses for each of these?

As of now I am yet to receive anything other than a promise to write to the relevant parties to request this information.This I find wanting. If we are committed to true transparency and openness of data, we should be operating on two simple premises

1. The public has a right to details of any and all public expenditure. Except of course for security spending and a few exceptions. After all, it is not government money. It is public money.

2. This data should be continuously and automatically uploaded. It should not await a request.

With these two premises in place, it will be possible for the public to aggressively audit government on various levels – expenditure, income, personnel, projects, etc. Imagine being able to quickly answer questions such as:
  1. What is the total expenditure on tea and refreshments across allministries?
  2. What is the total expenditure on travel across government, broken downby air, road, rail and sea
  3. What percentage of government expenditure are recurrent versusdevelopmental across government bodies?
  4. What is the rate of hiring across government by quarter?
  5. How many airline tickets does government purchase per quarter, andfrom which airlines?
  6. What are all the aggregated line item expenditures, broken down acrossministries?
Leading By Example

The most obvious dataset I would expect government to release is the complete raw results of the last conducted census results. Of course anomyized so that the data is no longer personally identifiable.

This would be a very powerful tool for

  1. Policy makers
  2. Statisticians
  3. Actuarial scientists
  4. Computer scientists
  5. Economists

The potential of mining all that information – across all axes – demographic, health, agriculture, religion, employment, etc. and deriving new insights is breathtaking.

Policy bodies, government institutions, businesses, health practitioners and many other stakeholders would be ready and willing to fund initiatives to develop applications and tools to derive this intelligence from the raw data.

This dataset is still not forthcoming.

Open Data Sets Some of the Available Open Data Sets on
Fixing Fundamentals

OpenData datasets are, by definition, outputs. The question then arises, what are the inputs?

One of the biggest problems government faces is automation. Many of us know only too well that not all government institutions are making full use of ICT. One of the best examples is the Kenya Police Service.

A Report Corruption Sign at Fort Jesus Mombasa A Report Corruption Sign at Fort Jesus Mombasa

Those of us who have visited a police station will know that most still use manual quire books to keep records such as the occurrence book.

How many of us have had the experience of a relative or a friend who has failed to return home, and been forced to look for them? You must physically visit every police station and read through the occurrence book.

Imagine police stations automate, and the OB is online. You can simply search by ID number or name of your friend or relative and the tool will tell you

  1. What police station they are in
  2. Why they have been arrested
  3. What you need to do next

Over and above this, data on crime and misdemeanours can also be published and mined to glean new insights, over and above informing the citizens.

If government can fix the fundamentals of automation across government, OpenData becomes that much easier to realize.

Conrad Akunga has worked in the software industry for over 10 years. He is a co founder of Innova Limited, a software company specializing in the development of software and tools for the finance and investments industry.He is also the co-founder of, a civic education and governance watchdog portal.He also sits on the Board Of Advisors of the Nairobi iHub.He is also a philosopher, writer and all round good guy.

comments powered by Disqus