I have had two simple raspberry pi weather stations running for a while now.

Both have pressure, temperature and humidity sensors.

One I have in the carefully controlled environment of my study, the other is hanging out of the window.

The study one is known as pijessie as it started life as a Raspberry Pi running the Jessie version of Raspbian.

The outside station is known as kittycam as I intend at some point to attach a camera so I can watch our cat come and go.

For a while I have been noticing that the pressure values have been quite a way apart. The software I am includes a conversion to altitude and I find these numbers more natural for me to think about.

The altitude conversion assumes the pressure at sea level is 1023.25 hPa, which is the mean pressure at sea level.

When the pressure is higher than this the altitude comes out below sea level, when pressure is lower than this above sea level.

As always, Wikipedia has good information on this: https://en.wikipedia.org/wiki/Atmospheric_pressure

For a while I had been noticing the two sensors giving values differing by about 10 metres altitude.

I had put this down to the sensors not being calibrated accurately, but also noticed that kittycam was more prone to weird glitches.

Now the glitches I put down to the fact I have one process collecting data every minute and another process creating a display on my laptop so I can glance over and see what the weather is doing. The latter was just polling the sensor every 10 minutes.

The code does not do anything smart like get a lock and my guess was that the two processes were occasionally trampling on each other's feet.

Long story short, I decided to take a closer look.

``````

In [34]:

# Tell matplotlib to plot in line
%matplotlib inline

import datetime

# import pandas
import pandas

# seaborn magically adds a layer of goodness on top of Matplotlib
# mostly this is just changing matplotlib defaults, but it does also
# provide some higher level plotting methods.
import seaborn

# Tell seaborn to set things up
seaborn.set()

``````
``````

In [35]:

# input files: the data from the two sensors
infiles = ["../files/kittycam_weather.csv", "../files/pijessie_weather.csv"]

``````
``````

In [36]:

data = []

for infile in infiles:

``````
``````

In [37]:

# take a look at what we got
data[0].describe()

``````
``````

Out[37]:

temp
pressure
altitude
sealevel_pressure
humidity
temp_dht

count
42768.000000
42768.000000
42768.000000
42768.000000
42764.000000
42764.000000

mean
28.114357
101385.375842
-5.068642
101387.094487
76.806599
27.730374

std
2.205370
430.276252
36.334073
409.738250
8.512122
2.068583

min
22.800000
56117.000000
-1314.018026
67136.000000
29.500000
22.299999

25%
26.500000
101141.750000
-26.939647
101142.000000
71.599998
26.200001

50%
27.700000
101411.000000
-7.157439
101412.000000
77.099998
27.400000

75%
29.900000
101648.000000
15.246750
101649.000000
83.500000
29.299999

max
43.200000
102242.000000
3397.334521
118353.000000
94.300003
47.099998

``````
``````

In [38]:

# plots are always good

data[0].plot(subplots=True)

``````
``````

Out[38]:

array([<matplotlib.axes._subplots.AxesSubplot object at 0x7f1a6b1e4128>,
<matplotlib.axes._subplots.AxesSubplot object at 0x7f1a6a7e40f0>,
<matplotlib.axes._subplots.AxesSubplot object at 0x7f1a69ef79b0>,
<matplotlib.axes._subplots.AxesSubplot object at 0x7f1a6afef0f0>,
<matplotlib.axes._subplots.AxesSubplot object at 0x7f1a6ae8d7b8>,
<matplotlib.axes._subplots.AxesSubplot object at 0x7f1a6aeb2908>], dtype=object)

``````

Now the two sets of data have different indices since the processes collecting the data are not in sync.

So we need to align the data and then fill in missing values

``````

In [39]:

# align returns two new dataframes, now aligned
d1, d2 = data[0].align(data[1])

``````
``````

In [40]:

# have a look, note the count is just the valid data.
# Things have been aligned, but missing values are set ton NaN
d1.describe()

``````
``````

Out[40]:

temp
pressure
altitude
sealevel_pressure
humidity
temp_dht

count
42768.000000
42768.000000
42768.000000
42768.000000
42764.000000
42764.000000

mean
28.114357
101385.375842
-5.068642
101387.094487
76.806599
27.730374

std
2.205370
430.276252
36.334073
409.738250
8.512122
2.068583

min
22.800000
56117.000000
-1314.018026
67136.000000
29.500000
22.299999

25%
26.500000
101141.750000
-26.939647
101142.000000
71.599998
26.200001

50%
27.700000
101411.000000
-7.157439
101412.000000
77.099998
27.400000

75%
29.900000
101648.000000
15.246750
101649.000000
83.500000
29.299999

max
43.200000
102242.000000
3397.334521
118353.000000
94.300003
47.099998

``````
``````

In [41]:

# Use interpolation to fill in the missing values
d1 = d1.interpolate(method='time')
d2 = d2.interpolate(method='time')

``````
``````

In [42]:

# Now plot
d1.altitude.plot()
print(len(d1))

``````
``````

80476

``````
``````

In [43]:

# For convenience, add a new series to d1 with the altitude data from d2
d1['altitude2'] = d2.altitude

``````
``````

In [44]:

# Now plot the two
d1[['altitude', 'altitude2']][10000:30000].clip(-60,60).plot()

``````
``````

Out[44]:

<matplotlib.axes._subplots.AxesSubplot at 0x7f1a6aa1fe48>

``````
``````

In [45]:

(d1.altitude - d1.altitude2)[10000:30000].clip(-20,15).plot()

``````
``````

Out[45]:

<matplotlib.axes._subplots.AxesSubplot at 0x7f1a6a463630>

``````

So we do have a difference around 5m. More interestingly, there seems to be some sort of daily pattern to the data.