I have had two simple raspberry pi weather stations running for a while now.

Both have pressure, temperature and humidity sensors.

One I have in the carefully controlled environment of my study, the other is hanging out of the window.

The study one is known as pijessie as it started life as a Raspberry Pi running the Jessie version of Raspbian.

The outside station is known as kittycam as I intend at some point to attach a camera so I can watch our cat come and go.

For a while I have been noticing that the pressure values have been quite a way apart. The software I am includes a conversion to altitude and I find these numbers more natural for me to think about.

The altitude conversion assumes the pressure at sea level is 1023.25 hPa, which is the mean pressure at sea level.

When the pressure is higher than this the altitude comes out below sea level, when pressure is lower than this above sea level.

As always, Wikipedia has good information on this: https://en.wikipedia.org/wiki/Atmospheric_pressure

For a while I had been noticing the two sensors giving values differing by about 10 metres altitude.

I had put this down to the sensors not being calibrated accurately, but also noticed that kittycam was more prone to weird glitches.

Now the glitches I put down to the fact I have one process collecting data every minute and another process creating a display on my laptop so I can glance over and see what the weather is doing. The latter was just polling the sensor every 10 minutes.

The code does not do anything smart like get a lock and my guess was that the two processes were occasionally trampling on each other's feet.

Long story short, I decided to take a closer look.



In [34]:

    
# Tell matplotlib to plot in line
%matplotlib inline

import datetime

# import pandas
import pandas

# seaborn magically adds a layer of goodness on top of Matplotlib
# mostly this is just changing matplotlib defaults, but it does also
# provide some higher level plotting methods.
import seaborn

# Tell seaborn to set things up
seaborn.set()



In [35]:

    
# input files: the data from the two sensors
infiles = ["../files/kittycam_weather.csv", "../files/pijessie_weather.csv"]



In [36]:

    
# Read the data

data = []

for infile in infiles:
    data.append(pandas.read_csv(infile, index_col='date', parse_dates=['date']))



In [37]:

    
# take a look at what we got
data[0].describe()









    Out[37]:






  
    
      
      temp
      pressure
      altitude
      sealevel_pressure
      humidity
      temp_dht
    
  
  
    
      count
      42768.000000
      42768.000000
      42768.000000
      42768.000000
      42764.000000
      42764.000000
    
    
      mean
      28.114357
      101385.375842
      -5.068642
      101387.094487
      76.806599
      27.730374
    
    
      std
      2.205370
      430.276252
      36.334073
      409.738250
      8.512122
      2.068583
    
    
      min
      22.800000
      56117.000000
      -1314.018026
      67136.000000
      29.500000
      22.299999
    
    
      25%
      26.500000
      101141.750000
      -26.939647
      101142.000000
      71.599998
      26.200001
    
    
      50%
      27.700000
      101411.000000
      -7.157439
      101412.000000
      77.099998
      27.400000
    
    
      75%
      29.900000
      101648.000000
      15.246750
      101649.000000
      83.500000
      29.299999
    
    
      max
      43.200000
      102242.000000
      3397.334521
      118353.000000
      94.300003
      47.099998



In [38]:

    
# plots are always good 

data[0].plot(subplots=True)









    Out[38]:





array([<matplotlib.axes._subplots.AxesSubplot object at 0x7f1a6b1e4128>,
       <matplotlib.axes._subplots.AxesSubplot object at 0x7f1a6a7e40f0>,
       <matplotlib.axes._subplots.AxesSubplot object at 0x7f1a69ef79b0>,
       <matplotlib.axes._subplots.AxesSubplot object at 0x7f1a6afef0f0>,
       <matplotlib.axes._subplots.AxesSubplot object at 0x7f1a6ae8d7b8>,
       <matplotlib.axes._subplots.AxesSubplot object at 0x7f1a6aeb2908>], dtype=object)

Now the two sets of data have different indices since the processes collecting the data are not in sync.

So we need to align the data and then fill in missing values



In [39]:

    
# align returns two new dataframes, now aligned
d1, d2 = data[0].align(data[1])



In [40]:

    
# have a look, note the count is just the valid data.
# Things have been aligned, but missing values are set ton NaN
d1.describe()









    Out[40]:






  
    
      
      temp
      pressure
      altitude
      sealevel_pressure
      humidity
      temp_dht
    
  
  
    
      count
      42768.000000
      42768.000000
      42768.000000
      42768.000000
      42764.000000
      42764.000000
    
    
      mean
      28.114357
      101385.375842
      -5.068642
      101387.094487
      76.806599
      27.730374
    
    
      std
      2.205370
      430.276252
      36.334073
      409.738250
      8.512122
      2.068583
    
    
      min
      22.800000
      56117.000000
      -1314.018026
      67136.000000
      29.500000
      22.299999
    
    
      25%
      26.500000
      101141.750000
      -26.939647
      101142.000000
      71.599998
      26.200001
    
    
      50%
      27.700000
      101411.000000
      -7.157439
      101412.000000
      77.099998
      27.400000
    
    
      75%
      29.900000
      101648.000000
      15.246750
      101649.000000
      83.500000
      29.299999
    
    
      max
      43.200000
      102242.000000
      3397.334521
      118353.000000
      94.300003
      47.099998



In [41]:

    
# Use interpolation to fill in the missing values
d1 = d1.interpolate(method='time')
d2 = d2.interpolate(method='time')



In [42]:

    
# Now plot
d1.altitude.plot()
print(len(d1))



In [43]:

    
# For convenience, add a new series to d1 with the altitude data from d2
d1['altitude2'] = d2.altitude



In [44]:

    
# Now plot the two
d1[['altitude', 'altitude2']][10000:30000].clip(-60,60).plot()









    Out[44]:





<matplotlib.axes._subplots.AxesSubplot at 0x7f1a6aa1fe48>



In [45]:

    
(d1.altitude - d1.altitude2)[10000:30000].clip(-20,15).plot()









    Out[45]:





<matplotlib.axes._subplots.AxesSubplot at 0x7f1a6a463630>

So we do have a difference around 5m. More interestingly, there seems to be some sort of daily pattern to the data.

	temp	pressure	altitude	sealevel_pressure	humidity	temp_dht
count	42768.000000	42768.000000	42768.000000	42768.000000	42764.000000	42764.000000
mean	28.114357	101385.375842	-5.068642	101387.094487	76.806599	27.730374
std	2.205370	430.276252	36.334073	409.738250	8.512122	2.068583
min	22.800000	56117.000000	-1314.018026	67136.000000	29.500000	22.299999
25%	26.500000	101141.750000	-26.939647	101142.000000	71.599998	26.200001
50%	27.700000	101411.000000	-7.157439	101412.000000	77.099998	27.400000
75%	29.900000	101648.000000	15.246750	101649.000000	83.500000	29.299999
max	43.200000	102242.000000	3397.334521	118353.000000	94.300003	47.099998