This notebook goes through some steps to convert a large unwieldy file from the Council's Litter Bin Sensor Project into a simpler dataset for visualisation. We use a number of functions provided by the Python pandas library.
We assume that we've already downloaded the file to our local file system. So our first step is to import the CSV file as a pandas DataFrame
:
In [2]:
import pandas as pd
table = pd.read_csv("../data/binsensors.csv")
Let's have a look at the column labels:
In [3]:
list(table.columns.values)
Out[3]:
Suppose we just want to select a couple of columns, we can use the column labels like this:
In [4]:
table[['ID', 'Address']]
Out[4]:
But a couple of interesting columns (for the collection date and the weight measured by the sensor) have very complicated labels, so let's simplify them.
First, we'll just make a list of all the labels, then we'll bind the relevant string values to a couple of variables. This means that we don't have to worry about mis-typing things like 'Date & Time of bin collection (Europe/London)
!
In [5]:
l = list(table.columns.values)
date = l[8]
fill = l[10]
date, fill
Out[5]:
Now that we've got short variables date
and time
in place of the long strings, let's go ahead and replace those labels with something simpler:
In [6]:
table = table.rename(columns={date: 'Date', fill: 'Fill_level'})
Now we'll make a new table with just four columns:
In [7]:
table1 = table[['ID', 'Address', 'Date', 'Fill_level']]
And we'll just take the first 30 rows:
In [8]:
tabletop = table1.head(30)
In [9]:
tabletop
Out[9]:
Finally, we'll write the result to a JSON formatted file.
In [10]:
tabletop.to_json('../data/binsensorsimple.json', orient="records")