How to get a simple dataset from a big CSV file

This notebook goes through some steps to convert a large unwieldy file from the Council's Litter Bin Sensor Project into a simpler dataset for visualisation. We use a number of functions provided by the Python pandas library.

We assume that we've already downloaded the file to our local file system. So our first step is to import the CSV file as a pandas DataFrame:



In [2]:

    
import pandas as pd
table = pd.read_csv("../data/binsensors.csv")

Let's have a look at the column labels:



In [3]:

    
list(table.columns.values)









    Out[3]:





['ID',
 'Site',
 'Site name',
 'Address',
 'City',
 'Site content type',
 'Content type',
 'Content type name',
 'Date & Time of bin collection (Europe/London)',
 'Frozen',
 "Fill level before collection (%) [Nb 100% is actually a bin that's 80% full]",
 'Fill level after',
 ' Volume (litres3) ',
 ' Weight (kg) ',
 'Partial',
 'Container slot:ID',
 'Container slot:Name',
 'Container slot:Time (Europe/London)',
 'Container slot:Fill level before',
 'Container slot:Fill level after',
 'Container slot:Volume',
 'Container slot:Weight',
 'Container slot:Confidence',
 'Confidence']

Suppose we just want to select a couple of columns, we can use the column labels like this:



In [4]:

    
table[['ID', 'Address']]









    Out[4]:






  
    
      
      ID
      Address
    
  
  
    
      0
      12371082.0
      Princes Street
    
    
      1
      12370969.0
      Princes Street
    
    
      2
      12370940.0
      Princes Street
    
    
      3
      12370931.0
      Princes Street
    
    
      4
      12370895.0
      Shandwick Place
    
    
      5
      12370835.0
      Shandwick Place
    
    
      6
      12370992.0
      Coates Crescent
    
    
      7
      12371305.0
      West Maitland Street
    
    
      8
      12371189.0
      Clifton Terrace
    
    
      9
      12370352.0
      80 Haymarket Terrace
    
    
      10
      12371102.0
      83 Haymarket Terrace
    
    
      11
      12370927.0
      20 Atholl Cresent
    
    
      12
      12370899.0
      Atholl Crescent
    
    
      13
      12370879.0
      Shandwick Place
    
    
      14
      12370388.0
      Princes Street
    
    
      15
      12369997.0
      Princes Street
    
    
      16
      12370884.0
      Princes Street
    
    
      17
      12370088.0
      3 Chambers Street
    
    
      18
      12370302.0
      Newkirkgate
    
    
      19
      12370476.0
      22 Chambers Street
    
    
      20
      12370263.0
      Chambers Street
    
    
      21
      12370983.0
      Cathedral Lane
    
    
      22
      12370275.0
      Cathedral Lane
    
    
      23
      12370041.0
      The Mound
    
    
      24
      12370063.0
      The Mound
    
    
      25
      12370161.0
      Mound Place
    
    
      26
      12370147.0
      54 Leith Walk
    
    
      27
      12370172.0
      Ramsay Lane
    
    
      28
      12369991.0
      96 Leith Walk
    
    
      29
      12369917.0
      210 Leith Walk
    
    
      ...
      ...
      ...
    
    
      9971
      10854972.0
      Nicolson Street
    
    
      9972
      10856192.0
      Shandwick Place
    
    
      9973
      10845164.0
      Princes Street Gardens East
    
    
      9974
      10855307.0
      South Bridge
    
    
      9975
      10854871.0
      7 Canongate
    
    
      9976
      10851260.0
      271 Canongate
    
    
      9977
      10852254.0
      Canongate
    
    
      9978
      10849062.0
      162 Canongate
    
    
      9979
      10848466.0
      Regent Road
    
    
      9980
      12359624.0
      Waterloo Place
    
    
      9981
      10848617.0
      Regent Road
    
    
      9982
      12358547.0
      South Bridge
    
    
      9983
      10851500.0
      86 Canongate
    
    
      9984
      12360158.0
      Waterloo Place
    
    
      9985
      10848023.0
      Shandwick Place
    
    
      9986
      10847071.0
      Shandwick Place
    
    
      9987
      10850092.0
      Shandwick Place
    
    
      9988
      12360654.0
      Coates Crescent
    
    
      9989
      12358088.0
      Coates Crescent
    
    
      9990
      12363496.0
      Coates Crescent
    
    
      9991
      12367091.0
      Clifton Terrace
    
    
      9992
      10849669.0
      80 Haymarket Terrace
    
    
      9993
      10847247.0
      90 Haymarket Terrace
    
    
      9994
      12364372.0
      20 Atholl Cresent
    
    
      9995
      10850459.0
      22 Haymarket Terrace
    
    
      9996
      12367706.0
      Clifton Terrace
    
    
      9997
      10839663.0
      West Maitland Street
    
    
      9998
      12363241.0
      Festival Square
    
    
      9999
      12362887.0
      Festival Square
    
    
      10000
      NaN
      NaN
    
  

10001 rows × 2 columns

But a couple of interesting columns (for the collection date and the weight measured by the sensor) have very complicated labels, so let's simplify them.

First, we'll just make a list of all the labels, then we'll bind the relevant string values to a couple of variables. This means that we don't have to worry about mis-typing things like 'Date & Time of bin collection (Europe/London)!



In [5]:

    
l = list(table.columns.values)
date = l[8]
fill = l[10]
date, fill









    Out[5]:





('Date & Time of bin collection (Europe/London)',
 "Fill level before collection (%) [Nb 100% is actually a bin that's 80% full]")

Now that we've got short variables date and time in place of the long strings, let's go ahead and replace those labels with something simpler:



In [6]:

    
table = table.rename(columns={date: 'Date', fill: 'Fill_level'})

Now we'll make a new table with just four columns:



In [7]:

    
table1 = table[['ID', 'Address', 'Date', 'Fill_level']]

And we'll just take the first 30 rows:



In [8]:

    
tabletop = table1.head(30)



In [9]:

    
tabletop









    Out[9]:






  
    
      
      ID
      Address
      Date
      Fill_level
    
  
  
    
      0
      12371082.0
      Princes Street
      2016-08-09 15:46:34
      78.0
    
    
      1
      12370969.0
      Princes Street
      2016-08-09 15:37:28
      54.0
    
    
      2
      12370940.0
      Princes Street
      2016-08-09 15:32:35
      86.0
    
    
      3
      12370931.0
      Princes Street
      2016-08-09 15:32:19
      61.0
    
    
      4
      12370895.0
      Shandwick Place
      2016-08-09 15:27:16
      31.0
    
    
      5
      12370835.0
      Shandwick Place
      2016-08-09 15:25:38
      23.0
    
    
      6
      12370992.0
      Coates Crescent
      2016-08-09 15:21:17
      99.0
    
    
      7
      12371305.0
      West Maitland Street
      2016-08-09 15:16:15
      88.0
    
    
      8
      12371189.0
      Clifton Terrace
      2016-08-09 15:01:12
      55.0
    
    
      9
      12370352.0
      80 Haymarket Terrace
      2016-08-09 14:59:18
      91.0
    
    
      10
      12371102.0
      83 Haymarket Terrace
      2016-08-09 14:56:28
      80.0
    
    
      11
      12370927.0
      20 Atholl Cresent
      2016-08-09 14:45:10
      67.0
    
    
      12
      12370899.0
      Atholl Crescent
      2016-08-09 14:43:32
      92.0
    
    
      13
      12370879.0
      Shandwick Place
      2016-08-09 14:38:21
      94.0
    
    
      14
      12370388.0
      Princes Street
      2016-08-09 14:30:27
      87.0
    
    
      15
      12369997.0
      Princes Street
      2016-08-09 14:25:53
      100.0
    
    
      16
      12370884.0
      Princes Street
      2016-08-09 14:25:44
      95.0
    
    
      17
      12370088.0
      3 Chambers Street
      2016-08-09 14:18:21
      42.0
    
    
      18
      12370302.0
      Newkirkgate
      2016-08-09 14:18:15
      93.0
    
    
      19
      12370476.0
      22 Chambers Street
      2016-08-09 14:16:19
      84.0
    
    
      20
      12370263.0
      Chambers Street
      2016-08-09 14:13:56
      68.0
    
    
      21
      12370983.0
      Cathedral Lane
      2016-08-09 14:07:19
      87.0
    
    
      22
      12370275.0
      Cathedral Lane
      2016-08-09 14:06:33
      58.0
    
    
      23
      12370041.0
      The Mound
      2016-08-09 14:02:54
      37.0
    
    
      24
      12370063.0
      The Mound
      2016-08-09 13:58:57
      39.0
    
    
      25
      12370161.0
      Mound Place
      2016-08-09 13:54:31
      88.0
    
    
      26
      12370147.0
      54 Leith Walk
      2016-08-09 13:54:11
      100.0
    
    
      27
      12370172.0
      Ramsay Lane
      2016-08-09 13:53:07
      100.0
    
    
      28
      12369991.0
      96 Leith Walk
      2016-08-09 13:51:42
      95.0
    
    
      29
      12369917.0
      210 Leith Walk
      2016-08-09 13:49:04
      99.0

Finally, we'll write the result to a JSON formatted file.



In [10]:

    
tabletop.to_json('../data/binsensorsimple.json', orient="records")

	ID	Address
0	12371082.0	Princes Street
1	12370969.0	Princes Street
2	12370940.0	Princes Street
3	12370931.0	Princes Street
4	12370895.0	Shandwick Place
5	12370835.0	Shandwick Place
6	12370992.0	Coates Crescent
7	12371305.0	West Maitland Street
8	12371189.0	Clifton Terrace
9	12370352.0	80 Haymarket Terrace
10	12371102.0	83 Haymarket Terrace
11	12370927.0	20 Atholl Cresent
12	12370899.0	Atholl Crescent
13	12370879.0	Shandwick Place
14	12370388.0	Princes Street
15	12369997.0	Princes Street
16	12370884.0	Princes Street
17	12370088.0	3 Chambers Street
18	12370302.0	Newkirkgate
19	12370476.0	22 Chambers Street
20	12370263.0	Chambers Street
21	12370983.0	Cathedral Lane
22	12370275.0	Cathedral Lane
23	12370041.0	The Mound
24	12370063.0	The Mound
25	12370161.0	Mound Place
26	12370147.0	54 Leith Walk
27	12370172.0	Ramsay Lane
28	12369991.0	96 Leith Walk
29	12369917.0	210 Leith Walk
...	...	...
9971	10854972.0	Nicolson Street
9972	10856192.0	Shandwick Place
9973	10845164.0	Princes Street Gardens East
9974	10855307.0	South Bridge
9975	10854871.0	7 Canongate
9976	10851260.0	271 Canongate
9977	10852254.0	Canongate
9978	10849062.0	162 Canongate
9979	10848466.0	Regent Road
9980	12359624.0	Waterloo Place
9981	10848617.0	Regent Road
9982	12358547.0	South Bridge
9983	10851500.0	86 Canongate
9984	12360158.0	Waterloo Place
9985	10848023.0	Shandwick Place
9986	10847071.0	Shandwick Place
9987	10850092.0	Shandwick Place
9988	12360654.0	Coates Crescent
9989	12358088.0	Coates Crescent
9990	12363496.0	Coates Crescent
9991	12367091.0	Clifton Terrace
9992	10849669.0	80 Haymarket Terrace
9993	10847247.0	90 Haymarket Terrace
9994	12364372.0	20 Atholl Cresent
9995	10850459.0	22 Haymarket Terrace
9996	12367706.0	Clifton Terrace
9997	10839663.0	West Maitland Street
9998	12363241.0	Festival Square
9999	12362887.0	Festival Square
10000	NaN	NaN

	ID	Address	Date	Fill_level
0	12371082.0	Princes Street	2016-08-09 15:46:34	78.0
1	12370969.0	Princes Street	2016-08-09 15:37:28	54.0
2	12370940.0	Princes Street	2016-08-09 15:32:35	86.0
3	12370931.0	Princes Street	2016-08-09 15:32:19	61.0
4	12370895.0	Shandwick Place	2016-08-09 15:27:16	31.0
5	12370835.0	Shandwick Place	2016-08-09 15:25:38	23.0
6	12370992.0	Coates Crescent	2016-08-09 15:21:17	99.0
7	12371305.0	West Maitland Street	2016-08-09 15:16:15	88.0
8	12371189.0	Clifton Terrace	2016-08-09 15:01:12	55.0
9	12370352.0	80 Haymarket Terrace	2016-08-09 14:59:18	91.0
10	12371102.0	83 Haymarket Terrace	2016-08-09 14:56:28	80.0
11	12370927.0	20 Atholl Cresent	2016-08-09 14:45:10	67.0
12	12370899.0	Atholl Crescent	2016-08-09 14:43:32	92.0
13	12370879.0	Shandwick Place	2016-08-09 14:38:21	94.0
14	12370388.0	Princes Street	2016-08-09 14:30:27	87.0
15	12369997.0	Princes Street	2016-08-09 14:25:53	100.0
16	12370884.0	Princes Street	2016-08-09 14:25:44	95.0
17	12370088.0	3 Chambers Street	2016-08-09 14:18:21	42.0
18	12370302.0	Newkirkgate	2016-08-09 14:18:15	93.0
19	12370476.0	22 Chambers Street	2016-08-09 14:16:19	84.0
20	12370263.0	Chambers Street	2016-08-09 14:13:56	68.0
21	12370983.0	Cathedral Lane	2016-08-09 14:07:19	87.0
22	12370275.0	Cathedral Lane	2016-08-09 14:06:33	58.0
23	12370041.0	The Mound	2016-08-09 14:02:54	37.0
24	12370063.0	The Mound	2016-08-09 13:58:57	39.0
25	12370161.0	Mound Place	2016-08-09 13:54:31	88.0
26	12370147.0	54 Leith Walk	2016-08-09 13:54:11	100.0
27	12370172.0	Ramsay Lane	2016-08-09 13:53:07	100.0
28	12369991.0	96 Leith Walk	2016-08-09 13:51:42	95.0
29	12369917.0	210 Leith Walk	2016-08-09 13:49:04	99.0