Our esteemed Mayor of London, Boris Johnson, has 'Boris bikes' and 'Boris buses', and we thought he also might like to have a 'Boris Board'. So we grabbed some data from the London Data Store and put together this light-hearted attempt to imagine 'Boris Board' as an interactive data visualisation that Mayor Johnson could use to run the show!
(NB The Boris Board works in all modern browsers - so if Boris uses Internet Explorer v8 or lower he will need to upgrade to one of these easy options.)
The idea behind this write-up is to give readers some insight into how we approached creating this data visualisation. We've added some code snippets and some ideas on how to handle different problems that you might encounter along the way. Our hope that any budding data visualisation folk out there might find this useful in their own attempts to visualise data in an interesting way.
For any of our existing customers reading this (and any of our future customers) we hope this gives some insight into the data visualisation process and how it might work in your context (shameless corporate plug over).
Right, where to begin? Well, a very good friend of Coolgarif Tech (hat tip to Mr. Deasy!) brought the London Data Store and New York Open Data repositories to our attention a few weeks back. We had been looking for an opportunity to design and showcase an interesting data visualisation using our latest interactive dashboard design ideas. The key point for any visualisation is that it's only as strong as the data underpinning it and finding interesting opensource datasets can prove challenging. However with the data available through both data stores there was plenty to get stuck into.
We planned to work on the project in and around our existing client work and were keen to get Boris Board completed before the end of the summer silly season. In total we worked on it periodically over a five week period - although actual development time was shorter than this - probably 15 developer days all in. I worked mostly on getting the data into a consumable form and James worked his magic with the dashboard implementation. We also enlisted the advice and skills of our graphic designer Brad to provide the colour palate, look and feel.
An initial pass of the data suggested a visualisation that mashed up some of the comparable statistics between the two cities and to draw up some interesting comparisons and contrasts between them. Certainly an interesting concept, but one that would have taken considerably more effort than we were willing to expend. So after some debate, we narrowed the project down to focus purely on London, and more specifically, the more pleasant aspects of the data.
I'm not sure what it is about the statisticians who choose and collect these data sets but there are some pretty grim numbers in there (violence to staff at ambulance call-outs anyone?). We didn't want to end up with a visualisation of London full of crime and horribleness. If we were asked to define the theme underpinning the Boris Board it would probably be something along the lines of "How can I compare the boroughs by nice stuff". That probably doesn't make a tremendous amount of sense but we hope you get the rough idea. For anyone looking for more depressing statistics on life in London or the UK at large we refer you to the Mail Online website.
For anyone looking to make use of the London Data Store, we found a good starting point in the data store catalogue, which gives an nice overview of the various datasets available. Eyeballing this list in Excel allowed us to quickly draw up a short-list of interesting themes that we could use as a starting point. And yes, I do use Excel even though I'm supposedly a developer. This is generally considered a sin against software developer humanity but honestly it's one of my favourite tools for hacking data quickly. Comes from all those years spread-sheeting around on the trading floors of Canary Wharf!
Anyhow armed with this short list of potential targets my task was then to download each dataset into excel or csv, take a look at the data structure provided and the quality of the data. For the datasets that looked sufficiently interesting I'd mock up a data structure on paper and discuss their inclusion with James. Once we'd decided that a particular dataset made sense I wrote a python script to parse the data into the correct format.
From an implementation perspective given the various datasets are pretty small we decided to serve the ultimate visualisation straight out of the JSON documents stored on the server. For a larger project with variable or streaming data I would probably have chosen my go-to python framework, flask and expose the data through a RESTful endpoint, producing JSON for the front end to pick up.
The primary element of the visualisation is the map of London split out by borough. The slider on the heat map shows the cumulative change in population based on the Population Projections from 2001 to 2031 for London Boroughs by single year of age and gender using the Strategic Housing and Land Availability Assessment (SHLAA) housing data and 2008 CLG household projections. These datasets are available here. These data are the sole copyright of the © Greater London Authority, 2010. We also provided a simple line graph which illustrates the cumulative estimated population growth for the entire city over the period. This is based on the aggregate of the population data provided and is visible in the top right hand corner of the dashboard
In order to make the visualisation more interactive we also made each individual borough interactive so that the user can click on a borough of their interest on the map, and then use the slider to see the cumulative change in population over time broken down by various age cohorts - Child (0-18), Adult (19-45), Middle-Age (45-65) and Senior (65+). This is visible in the horizontal bar chart that appears once a borough is clicked and shows how the estimated age profile of each borough is expected to shift over time.
From a design perspective we were keen to include all of the data relating to the population dataset above the fold in the screen. For the lower half of the page we wanted to pick out some interesting data sets that allowed further comparison between the boroughs. Each of these is available through the various Boris-themed icons provided.
The first of these, imaginatively titled 'Trees', illustrates the available data on street trees per borough. We thought that this would be interesting for all those folk who secretly hanker for a life in the countryside. The horizontal bar chart shows the estimated total street trees by borough in 2011 and the side-by-side bar chart shows the number of trees planted and felled per borough in both 2009 and 2010, which also included trees planted under the Mayor's Street Trees Initiative. The total numbers of trees for 2007 were available but the notes to the dataset suggested that these were mostly estimates so we ignored this data. The original dataset is available here.
The next two graphs are 'Walking' and 'Cycling' and come from the same dataset, which shows the proportion of residents who walk for at least 30 minutes at a given frequency, and proportion of residents who cycle (any length or purpose) at a given frequency. For each dataset the results were broken into different cohorts of once per month, once per week, three times per week and five times per week. In order to visualise the data we decided on a stacked bar graphs for each. The data comes from the Active People Survey produced by Sport England in August 2012. The original dataset is available here.
Finally, having looked at nature and exercise we thought that a touch of culture would be welcome in the overview of different boroughs and we used data from a survey by the Department for Culture, Media and Sport on the percentage of respondents who made use of public libraries, visited museums and galleries or engagement in the arts from 2008 and 2009. We employed a grouped bar chart to illustrate these percentages across the different cultural activities. The original dataset is available here.
Datasets that didn't make the cut where number of public toilets, number of licenced premises, number and type of schools, Boris Bike hires, hours worked and happiness index per borough.
Couple of points to make here on the format of the data in the London datastore. Generally this is an excellent resource and an excellent initiative that should be applauded. These are if anything minor quibbles from an implementation perspective. Some but not all of the datasets come in both excel and csv format. Where a dataset comes in excel only you need to hack the sheets around a bit to export them into a usable csv format which can delay the process. Actually I must remember to send the guys at the Data Store the csv files (and JSON if they wish) that we created so that they can add them to the store for future reference.
Another minor complaint is that there isn't as much time series data available as we would have ideally liked. A lot of the datasets come from one-off surveys or are aggregations of surveys across multiple years. This means that the data in our visualisation is a a mashup of different dates and time horizons which isn't completely optimal. We solved this by providing individual charts for each of the sets and not trying to merge them into one visualisation component. Also we're guessing that as the London Data Store is a relatively new project the depth of the available data sets will grow over time.
What could one, say Mayor Johnson perhaps, do with the information presented? What insights did we glean from this effort?
Well the aggregate population numbers suggest that life isn't going to get any quieter in this fine city any time soon with population expected to top 9mill by 2013. At a borough specific level it looks like Tower Hamlets is going to get older and that Wandsworth is going to get busier and younger!! So if you are setting up a care home or planning a new child minding service you might want to bear that in mind!
If you are into greenery it looks like Bromley has the most street trees (or did back in 2010 when the data was collected) and that the city of London has lots of catching up to do with the rest of the boroughs on this front. Worryingly Waltham forest lost more trees than they had planted, so perhaps there was a big bloody storm out that direction in 2010. But generally there is a positive trend to planting new trees particularly, following the Mayors street tree initiative, with Enfield, Harrow and Houndslow coming out the big winners here. It will be interesting to track this data over time.
On the walking front the folk of Camden seem to get out and about on foot most frequently (City of London also scored well here although this is based on a pathetic number of responses with a significant confidence interval so we took those results with a pinch of salt!). On the cycling stakes Boris will be delighted as the percentage of folk that cycle more than once a week seems pretty consistent across all the boroughs! Good news if you are a thinking about setting up a hipster fixy bike shop any time soon! Will be interesting to see how this data changes over time give the recent investment in cycling architecture across the city. Finally for the Thespians and culture vultures looks like library attendance is lower than we would have expected, museum and gallery attendance is strong in certain boroughs. Although again we'd like to see these surveys over multiple time periods to see how things are developing at a borough level especially as this survey data originated back in 2009.