I have had two simple raspberry pi weather stations running for a while now.
Both have pressure, temperature and humidity sensors.
One I have in the carefully controlled environment of my study, the other is hanging out of the window.
The study one is known as pijessie as it started life as a Raspberry Pi running the Jessie version of Raspbian.
The outside station is known as kittycam as I intend at some point to attach a camera so I can watch our cat come and go.
For a while I have been noticing that the pressure values have been quite a way apart. The software I am includes a conversion to altitude and I find these numbers more natural for me to think about.
The altitude conversion assumes the pressure at sea level is 1023.25 hPa, which is the mean pressure at sea level.
When the pressure is higher than this the altitude comes out below sea level, when pressure is lower than this above sea level.
As always, Wikipedia has good information on this: https://en.wikipedia.org/wiki/Atmospheric_pressure
For a while I had been noticing the two sensors giving values differing by about 10 metres altitude.
I had put this down to the sensors not being calibrated accurately, but also noticed that kittycam was more prone to weird glitches.
Now the glitches I put down to the fact I have one process collecting data every minute and another process creating a display on my laptop so I can glance over and see what the weather is doing. The latter was just polling the sensor every 10 minutes.
The code does not do anything smart like get a lock and my guess was that the two processes were occasionally trampling on each other's feet.
Long story short, I decided to take a closer look.
In [34]:
# Tell matplotlib to plot in line
%matplotlib inline
import datetime
# import pandas
import pandas
# seaborn magically adds a layer of goodness on top of Matplotlib
# mostly this is just changing matplotlib defaults, but it does also
# provide some higher level plotting methods.
import seaborn
# Tell seaborn to set things up
seaborn.set()
In [35]:
# input files: the data from the two sensors
infiles = ["../files/kittycam_weather.csv", "../files/pijessie_weather.csv"]
In [36]:
# Read the data
data = []
for infile in infiles:
data.append(pandas.read_csv(infile, index_col='date', parse_dates=['date']))
In [37]:
# take a look at what we got
data[0].describe()
Out[37]:
In [38]:
# plots are always good
data[0].plot(subplots=True)
Out[38]:
Now the two sets of data have different indices since the processes collecting the data are not in sync.
So we need to align the data and then fill in missing values
In [39]:
# align returns two new dataframes, now aligned
d1, d2 = data[0].align(data[1])
In [40]:
# have a look, note the count is just the valid data.
# Things have been aligned, but missing values are set ton NaN
d1.describe()
Out[40]:
In [41]:
# Use interpolation to fill in the missing values
d1 = d1.interpolate(method='time')
d2 = d2.interpolate(method='time')
In [42]:
# Now plot
d1.altitude.plot()
print(len(d1))
In [43]:
# For convenience, add a new series to d1 with the altitude data from d2
d1['altitude2'] = d2.altitude
In [44]:
# Now plot the two
d1[['altitude', 'altitude2']][10000:30000].clip(-60,60).plot()
Out[44]:
In [45]:
(d1.altitude - d1.altitude2)[10000:30000].clip(-20,15).plot()
Out[45]:
So we do have a difference around 5m. More interestingly, there seems to be some sort of daily pattern to the data.