Imagine you are stuck in outer-space and need to find your way back to planet earth. A good bet would be to follow the stars, but a problem exists – how to deduce patterns to act a makeshift landmarks. A similar problem persists when data is visualized on a cartesian (X-Y) plane – how to make valid conclusions from spread out data points.
A witty solution was found in bioinformatics, and more specifically in genetics. In order to identify genotypes and locate their positions, biostatisticians developed spatial exploration tools to infer genetic structures. One such tool of interest to us is the R package adegenet. It contains several methods for analysing an form of spatial data. Let's look at an example - the map below shows areas covered by Zuku internet fibre cable in Nairobi.
What's the best pattern to lay cable whichminimises material usage and cover all locations? Generate a Minimum Spanning Tree(MST). The MST is a path along weighted points with least cost to adjoining points. Factoring the latitude and longitude as an X-Y axis and running the MST in the adegenet package in R we get the diagram below.
From the diagram above, the optimised cable route joins at the entire network at the CBD and shows nodes where cables from different neighbourhoods should adjoin. We do hope this is how zuku has laid out their cables.
The Data Science Lab at iHub Research capitalises on the growing opportunities presented by technology and data to surface information and tell stories through data analytics.