Notebook to Illustrate Handling, Calculation, and Plotting of DDC Trend File Data

This Notebook shows how to read and work with DDC Trend File data. It utilizes a Siemens CSV Reader function that is found in the ddc_readers module to read the data into a Pandas DataFrame. It then shows how to do some quick exploration and summarization of the data. Next, some plotting techniques are shown. Finally, calculation of new values is shown with specific emphasis on how deal with missing data and non-overlapping timestamps.



In [1]:

    
# Import the needed libraries
import pandas as pd
import numpy as np
import ddc_readers      # the module that has DDC trend file readers

# import matplotlib pyplot commands
import matplotlib.pyplot as plt
 
# Show Plots in the Notebook
%matplotlib inline

# Increase the size of plots and their fonts
plt.rcParams['figure.figsize']= (10, 8)   # set Chart Size
plt.rcParams['font.size'] = 14            # set Font size in Chart

# Use a plotting style'style' the plot using 'bmh' style
plt.style.use('bmh')

Read in the DDC data into a Pandas Dataframe and Examine it



In [2]:

    
# The parameter in the function below controls whether the location of each 
# point is included in its point name.
df = ddc_readers.siemens_reader('data/siemens_sample.csv', include_location_in_point_name=False)

# Show the first 5 rows of the DataFrame
df.head()









    Out[2]:







  
    
      
      bh_100e_tec_room_temp
      bh_100w_tec_rm_rh
      bhmb_no2_w
      bh_101_tec_rm_co2
      bhmb_blr_hgr
      bh_uhe_tec_rm_rh
      bhmb_hrv01_en
      bhmb_100_occ
      bhmb_co_w
      bh_100w_tec_htg_loopout
      ...
      bh_102_tec_rm_rh
      bhmb_101_occ
      bh_uhe_tec_room_temp
      bh_uhw_tec_htg_loopout
      bh_101_tec_rm_rh
      bh_uhe_tec_rm_co2
      bh_100w_tec_rm_co2
      bh_uhw_tec_rm_rh
      bhmb_co_e
      bhmb_hrv01_sat
    
  
  
    
      2017-10-10 00:00:00
      81.25
      50.0
      0.02
      1000.0
      188.78
      31.2
      NaN
      NaN
      0.0
      0.0
      ...
      50.0
      NaN
      71.25
      0.0
      50.0
      306.0
      1000.0
      30.4
      0.0
      73.75
    
    
      2017-10-10 00:15:00
      81.25
      50.0
      0.01
      1000.0
      181.24
      31.2
      NaN
      NaN
      0.0
      0.0
      ...
      50.0
      NaN
      71.25
      0.0
      50.0
      304.0
      1000.0
      30.4
      0.0
      73.78
    
    
      2017-10-10 00:30:00
      81.25
      50.0
      0.01
      1000.0
      179.66
      31.2
      NaN
      NaN
      0.0
      0.0
      ...
      50.0
      NaN
      71.25
      0.0
      50.0
      307.0
      1000.0
      30.4
      0.0
      73.79
    
    
      2017-10-10 00:45:00
      81.50
      50.0
      0.02
      1000.0
      188.72
      31.2
      NaN
      NaN
      0.0
      0.0
      ...
      50.0
      NaN
      71.25
      0.0
      50.0
      306.0
      1000.0
      30.4
      0.0
      73.80
    
    
      2017-10-10 01:00:00
      81.50
      50.0
      0.03
      1000.0
      181.39
      31.2
      NaN
      NaN
      0.0
      0.0
      ...
      50.0
      NaN
      71.25
      0.0
      50.0
      308.0
      1000.0
      30.4
      0.0
      73.81
    
  

5 rows × 55 columns

You can see that there is a separate column for each point. The index of the DataFrame is the timestamps of the readings. Some of the points don't have data for every possible timestamp, as indicated by the NaN values (these are Numpy.NaN values).

The head method can show how many every rows you want:



In [3]:

    
df.head(3)









    Out[3]:







  
    
      
      bh_100e_tec_room_temp
      bh_100w_tec_rm_rh
      bhmb_no2_w
      bh_101_tec_rm_co2
      bhmb_blr_hgr
      bh_uhe_tec_rm_rh
      bhmb_hrv01_en
      bhmb_100_occ
      bhmb_co_w
      bh_100w_tec_htg_loopout
      ...
      bh_102_tec_rm_rh
      bhmb_101_occ
      bh_uhe_tec_room_temp
      bh_uhw_tec_htg_loopout
      bh_101_tec_rm_rh
      bh_uhe_tec_rm_co2
      bh_100w_tec_rm_co2
      bh_uhw_tec_rm_rh
      bhmb_co_e
      bhmb_hrv01_sat
    
  
  
    
      2017-10-10 00:00:00
      81.25
      50.0
      0.02
      1000.0
      188.78
      31.2
      NaN
      NaN
      0.0
      0.0
      ...
      50.0
      NaN
      71.25
      0.0
      50.0
      306.0
      1000.0
      30.4
      0.0
      73.75
    
    
      2017-10-10 00:15:00
      81.25
      50.0
      0.01
      1000.0
      181.24
      31.2
      NaN
      NaN
      0.0
      0.0
      ...
      50.0
      NaN
      71.25
      0.0
      50.0
      304.0
      1000.0
      30.4
      0.0
      73.78
    
    
      2017-10-10 00:30:00
      81.25
      50.0
      0.01
      1000.0
      179.66
      31.2
      NaN
      NaN
      0.0
      0.0
      ...
      50.0
      NaN
      71.25
      0.0
      50.0
      307.0
      1000.0
      30.4
      0.0
      73.79
    
  

3 rows × 55 columns

The iloc index locator approach also allows you to limit the number of columns shown.



In [4]:

    
# Here we will look at the first 5 rows and only the first 3 columns
# Note the use of square brackets, not parentheses!
df.iloc[:5, :3]









    Out[4]:







  
    
      
      bh_100e_tec_room_temp
      bh_100w_tec_rm_rh
      bhmb_no2_w
    
  
  
    
      2017-10-10 00:00:00
      81.25
      50.0
      0.02
    
    
      2017-10-10 00:15:00
      81.25
      50.0
      0.01
    
    
      2017-10-10 00:30:00
      81.25
      50.0
      0.01
    
    
      2017-10-10 00:45:00
      81.50
      50.0
      0.02
    
    
      2017-10-10 01:00:00
      81.50
      50.0
      0.03



In [5]:

    
# Look at the last 5 rows
df.tail()









    Out[5]:







  
    
      
      bh_100e_tec_room_temp
      bh_100w_tec_rm_rh
      bhmb_no2_w
      bh_101_tec_rm_co2
      bhmb_blr_hgr
      bh_uhe_tec_rm_rh
      bhmb_hrv01_en
      bhmb_100_occ
      bhmb_co_w
      bh_100w_tec_htg_loopout
      ...
      bh_102_tec_rm_rh
      bhmb_101_occ
      bh_uhe_tec_room_temp
      bh_uhw_tec_htg_loopout
      bh_101_tec_rm_rh
      bh_uhe_tec_rm_co2
      bh_100w_tec_rm_co2
      bh_uhw_tec_rm_rh
      bhmb_co_e
      bhmb_hrv01_sat
    
  
  
    
      2017-10-12 03:30:00
      81.00
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      2017-10-12 03:45:00
      81.00
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      2017-10-12 04:00:00
      80.75
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      2017-10-12 04:15:00
      81.00
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      2017-10-12 04:30:00
      80.75
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
  

5 rows × 55 columns



In [6]:

    
# Look at the last 5 rows, but only 3 columns.
# the -5 means start with the 5th row back from the end
df.iloc[-5:, :3]









    Out[6]:







  
    
      
      bh_100e_tec_room_temp
      bh_100w_tec_rm_rh
      bhmb_no2_w
    
  
  
    
      2017-10-12 03:30:00
      81.00
      NaN
      NaN
    
    
      2017-10-12 03:45:00
      81.00
      NaN
      NaN
    
    
      2017-10-12 04:00:00
      80.75
      NaN
      NaN
    
    
      2017-10-12 04:15:00
      81.00
      NaN
      NaN
    
    
      2017-10-12 04:30:00
      80.75
      NaN
      NaN

You can export the entire DataFrame to an Excel spreadsheet if you want. When you look at the spreadsheet you see that the bh_100e_tec_room_temp point spans a greater range of date/times than the other points. That is why you see so many NaN's at the end of this DataFrame.



In [7]:

    
df.to_excel('siemens.xlsx')

You can see the names of all the points present by looking at the columns attribute of the DataFrame.



In [8]:

    
df.columns









    Out[8]:





Index(['bh_100e_tec_room_temp', 'bh_100w_tec_rm_rh', 'bhmb_no2_w',
       'bh_101_tec_rm_co2', 'bhmb_blr_hgr', 'bh_uhe_tec_rm_rh',
       'bhmb_hrv01_en', 'bhmb_100_occ', 'bhmb_co_w', 'bh_100w_tec_htg_loopout',
       'bh_uhe_tec_heat_cool', 'bhmb_hrv01_spd', 'bh_102_tec_room_temp',
       'bh_100e_tec_heat_cool', 'bhmb_blr_hgs', 'bh_100e_tec_rm_co2',
       'bh_100e_tec_htg_loopout', 'bh_uhw_tec_heat_cool', 'bh_102_tec_rm_co2',
       'bh_uhw_tec_rm_co2', 'bhmb_cp1b_ss', 'bhmb_hrv01_oads',
       'bh_102_tec_heat_cool', 'bhmb_hrv01_eads', 'bhmb_hrv01_eat',
       'bh_100w_tec_room_temp', 'bh_100e_tec_rm_rh', 'bhmb_osat',
       'bh_102_tec_vlv_2_comd', 'bh_102_tec_htg_loopout', 'bhmb_euh_rmt',
       'bh_100w_tec_vlv_2_comd', 'bhmb_hrv01_pat', 'bh_101_tec_room_temp',
       'bh_101_tec_vlv_2_comd', 'bh_101_tec_heat_cool', 'bhmb_cp1a_ss',
       'bhmb_hrv01_phcv', 'bh_uhe_tec_htg_loopout', 'bhmb_no2_e',
       'bh_uhw_tec_room_temp', 'bh_101_tec_htg_loopout',
       'bh_100e_tec_vlv_2_comd', 'bhmb_ht_gutter', 'bhmb_hrv01_hcv',
       'bh_102_tec_rm_rh', 'bhmb_101_occ', 'bh_uhe_tec_room_temp',
       'bh_uhw_tec_htg_loopout', 'bh_101_tec_rm_rh', 'bh_uhe_tec_rm_co2',
       'bh_100w_tec_rm_co2', 'bh_uhw_tec_rm_rh', 'bhmb_co_e',
       'bhmb_hrv01_sat'],
      dtype='object')

If you want to know how many different timestamps there are, you can use the len function on the DataFrame. Remember that most DDC points do not have a value for every timestamp.



In [9]:

    
len(df)









    Out[9]:





209

The describe() method will give you statistical summaries for each column. The count value shown below only counts the rows where there is actual data, not a NaN. describe() only includes the numeric columns, so that is why there are fewer columns in this summary than the whole DataFrame. Some of these DDC points have text values like 'On' and 'Off' or 'Heat' and 'Cool'.



In [10]:

    
df.describe()









    Out[10]:







  
    
      
      bh_100e_tec_room_temp
      bh_100w_tec_rm_rh
      bhmb_no2_w
      bh_101_tec_rm_co2
      bhmb_blr_hgr
      bh_uhe_tec_rm_rh
      bhmb_co_w
      bh_100w_tec_htg_loopout
      bhmb_hrv01_spd
      bh_102_tec_room_temp
      ...
      bhmb_hrv01_hcv
      bh_102_tec_rm_rh
      bh_uhe_tec_room_temp
      bh_uhw_tec_htg_loopout
      bh_101_tec_rm_rh
      bh_uhe_tec_rm_co2
      bh_100w_tec_rm_co2
      bh_uhw_tec_rm_rh
      bhmb_co_e
      bhmb_hrv01_sat
    
  
  
    
      count
      195.000000
      101.0
      101.000000
      101.0
      101.000000
      101.000000
      101.0
      101.0
      101.0
      101.000000
      ...
      100.0
      101.0
      101.000000
      101.0
      101.0
      101.000000
      101.0
      101.000000
      101.0
      101.000000
    
    
      mean
      80.608974
      50.0
      0.021386
      1000.0
      181.250495
      32.578218
      0.0
      0.0
      18.0
      69.821782
      ...
      0.0
      50.0
      70.856436
      0.0
      50.0
      313.702970
      1000.0
      31.695050
      0.0
      73.243564
    
    
      std
      0.899232
      0.0
      0.012809
      0.0
      6.537937
      0.905936
      0.0
      0.0
      0.0
      0.750280
      ...
      0.0
      0.0
      0.260640
      0.0
      0.0
      6.042424
      0.0
      0.867453
      0.0
      0.813843
    
    
      min
      74.000000
      50.0
      0.000000
      1000.0
      153.420000
      31.200000
      0.0
      0.0
      18.0
      69.000000
      ...
      0.0
      50.0
      70.250000
      0.0
      50.0
      302.000000
      1000.0
      30.400000
      0.0
      71.820000
    
    
      25%
      80.500000
      50.0
      0.010000
      1000.0
      177.790000
      31.200000
      0.0
      0.0
      18.0
      69.250000
      ...
      0.0
      50.0
      70.750000
      0.0
      50.0
      309.000000
      1000.0
      30.400000
      0.0
      72.630000
    
    
      50%
      80.750000
      50.0
      0.020000
      1000.0
      181.740000
      32.800000
      0.0
      0.0
      18.0
      69.500000
      ...
      0.0
      50.0
      70.750000
      0.0
      50.0
      314.000000
      1000.0
      32.000000
      0.0
      73.090000
    
    
      75%
      81.000000
      50.0
      0.030000
      1000.0
      186.310000
      33.200000
      0.0
      0.0
      18.0
      71.000000
      ...
      0.0
      50.0
      71.250000
      0.0
      50.0
      318.000000
      1000.0
      32.400000
      0.0
      73.940000
    
    
      max
      81.500000
      50.0
      0.050000
      1000.0
      191.550000
      33.600000
      0.0
      0.0
      18.0
      71.250000
      ...
      0.0
      50.0
      71.250000
      0.0
      50.0
      324.000000
      1000.0
      32.800000
      0.0
      74.860000
    
  

8 rows × 42 columns

Plotting the Data

Pandas provides a number of plotting functions, which really access Matplotlib plotting functions under the hood. Here is how you would plot a couple columns using the Pandas plotting functions:



In [11]:

    
df[['bh_101_tec_room_temp', 'bh_uhw_tec_room_temp']].plot()









    Out[11]:





<matplotlib.axes._subplots.AxesSubplot at 0x240a58b9c88>

I'll address the gap problems in a moment. Notice that you need to use double square brackets to select out particular columns in the DataFrame. Here is how you would implement the above graph through use of Matplotlib function calls. The values to use for the X axis are the date/times in the index of the DataFrame:



In [12]:

    
plt.plot(df.index, df.bh_101_tec_room_temp, label='bh_101_tec_room_temp')
plt.plot(df.index, df.bh_uhw_tec_room_temp, label='bh_uhw_tec_room_temp')
plt.xticks(rotation=35)
plt.legend()









    Out[12]:





<matplotlib.legend.Legend at 0x240a5aede80>

This is more complicated, but sometimes you need the flexibility of Matplotlib to accomplish what you need. So, it is good to know both methods. Also, you can mix Pandas plotting function with Matplotlib functions in the same plot. I'll stick to Pandas plotting calls for the rest of the tutorial.

Eliminating the Gaps in the Lines

The gaps in the lines are caused by the fact that there are NaN values scattered through out the data. These occur because the timestamps for each sensor point are not exactly the same as the timestamps for other sensors. In the graph above, some sensor has values on October 10 near 8 am, but the two sensors we are plotting do not have values then. To have these gaps spanned by a connecting line, you can drop out the NaN values before plotting using the dropna() method:



In [13]:

    
df.bh_101_tec_room_temp.dropna().plot(legend=True)
df.bh_uhw_tec_room_temp.dropna().plot(legend=True)









    Out[13]:





<matplotlib.axes._subplots.AxesSubplot at 0x240a5b110b8>

Note that I had to use the legend=True parameter in the plot call because when Pandas plots one series, it defaults to not showing a legend. I had to plot the series separately because the NaN values for each series may not occur at the same set of timestamps, so separate dropna() calls are needed on each series.

If you have a number of columns you want to plot, you can write a loop to simplify the code. The code below first creates a list of colummns to plot and then loops through to plot each one individually. The list is generated by finding every sensor point that has "temp" in its name using a Python "list comprehension".



In [14]:

    
temp_pts = [col for col in df.columns if 'temp' in col]
temp_pts









    Out[14]:





['bh_100e_tec_room_temp',
 'bh_102_tec_room_temp',
 'bh_100w_tec_room_temp',
 'bh_101_tec_room_temp',
 'bh_uhw_tec_room_temp',
 'bh_uhe_tec_room_temp']



In [15]:

    
for pt in temp_pts:
    df[pt].dropna().plot(legend=True)
    plt.legend(loc='upper right')    # mixed in a Matplotlib command

Dealing with NaN's in Calculated Values

This section will address how to perform calculations with the point data but deal with the fact that the timestamps of the sensors are not always aligned. For purposes of instruction, I'll create a DataFrame with two sensors that have obvious misalignment of the timestamps. The first sensor will be the actual sensor bh_100w_tec_room_temp, which generally has 15 minute spacing between points, and the second sensor will be fabricated with some random values but with 5 minute spacing.



In [16]:

    
# Select out the bh_100w_tec_room_temp sensor, drop the NaNs, 
# and give the column a shorter name.
df_t1 = df[['bh_100w_tec_room_temp']].dropna()
df_t1.columns = ['temp1']                    # rename column
df_t1.head()









    Out[16]:







  
    
      
      temp1
    
  
  
    
      2017-10-10 00:00:00
      87.25
    
    
      2017-10-10 00:15:00
      87.00
    
    
      2017-10-10 00:30:00
      87.25
    
    
      2017-10-10 00:45:00
      87.00
    
    
      2017-10-10 01:00:00
      87.25



In [17]:

    
# Create an index for the fabricated sensor that spans the time range
# of temp1 but has 5 minute spacing.
start_ts = df_t1.index[0]      # first timestamp in df_t1
end_ts = df_t1.index[-1]       # last timestamp in df_t1
fab_index = pd.date_range(start_ts, end_ts, freq='5Min')
fab_index









    Out[17]:





DatetimeIndex(['2017-10-10 00:00:00', '2017-10-10 00:05:00',
               '2017-10-10 00:10:00', '2017-10-10 00:15:00',
               '2017-10-10 00:20:00', '2017-10-10 00:25:00',
               '2017-10-10 00:30:00', '2017-10-10 00:35:00',
               '2017-10-10 00:40:00', '2017-10-10 00:45:00',
               ...
               '2017-10-11 04:15:00', '2017-10-11 04:20:00',
               '2017-10-11 04:25:00', '2017-10-11 04:30:00',
               '2017-10-11 04:35:00', '2017-10-11 04:40:00',
               '2017-10-11 04:45:00', '2017-10-11 04:50:00',
               '2017-10-11 04:55:00', '2017-10-11 05:00:00'],
              dtype='datetime64[ns]', length=349, freq='5T')



In [18]:

    
# Create random temperature values, normally distributed, with a mean
# of 71.0 and a standard deviation of 3.0
vals = np.random.normal(71.0, 3.0, size=len(fab_index))

# make a new DataFrame with these values and the prior index
df_fab = pd.DataFrame(vals, index=fab_index, columns=['temp2'])
df_fab.head()









    Out[18]:







  
    
      
      temp2
    
  
  
    
      2017-10-10 00:00:00
      72.420794
    
    
      2017-10-10 00:05:00
      73.827610
    
    
      2017-10-10 00:10:00
      73.291292
    
    
      2017-10-10 00:15:00
      69.191258
    
    
      2017-10-10 00:20:00
      69.168449



In [19]:

    
# Combine the two DataFrames horizontally to make the final DataFrame
# that we will work with in this section
df_final = pd.concat([df_t1, df_fab], axis=1)
df_final.head(6)









    Out[19]:







  
    
      
      temp1
      temp2
    
  
  
    
      2017-10-10 00:00:00
      87.25
      72.420794
    
    
      2017-10-10 00:05:00
      NaN
      73.827610
    
    
      2017-10-10 00:10:00
      NaN
      73.291292
    
    
      2017-10-10 00:15:00
      87.00
      69.191258
    
    
      2017-10-10 00:20:00
      NaN
      69.168449
    
    
      2017-10-10 00:25:00
      NaN
      69.041880

So, let's say we want to calculate the temperature difference between temp1 and temp2 and create a new column with that value. We can just do the calculation like the following:



In [20]:

    
df1 = df_final.copy()     # make a copy of the DataFrame to work with in this example
df1['temp_diff'] = df1.temp1 - df1.temp2
df1.head(6)









    Out[20]:







  
    
      
      temp1
      temp2
      temp_diff
    
  
  
    
      2017-10-10 00:00:00
      87.25
      72.420794
      14.829206
    
    
      2017-10-10 00:05:00
      NaN
      73.827610
      NaN
    
    
      2017-10-10 00:10:00
      NaN
      73.291292
      NaN
    
    
      2017-10-10 00:15:00
      87.00
      69.191258
      17.808742
    
    
      2017-10-10 00:20:00
      NaN
      69.168449
      NaN
    
    
      2017-10-10 00:25:00
      NaN
      69.041880
      NaN

But, the temperature difference is only calculated at timestamps that are common to both sensors. The temp2 data at intervening timestamps has no influence on the calculation, so that information is totally neglected. This type of problem is very bad with wireless sensors that are not coordinated at all in when they report data; there could be no overlapping timestamps between two wireless sensors.

There are at least two reasonable ways to deal with the problem. First, you can average the data points into long enough intervals such that all the sensors have at least one data point in the interval. For this example, we can average the data into 15 minute intervals using the Pandas resample function:



In [21]:

    
df2 = df_final.copy()     # make a copy of the DataFrame to work with in this example
df2 = df2.resample('15min').mean()
df2.head()









    Out[21]:







  
    
      
      temp1
      temp2
    
  
  
    
      2017-10-10 00:00:00
      87.25
      73.179899
    
    
      2017-10-10 00:15:00
      87.00
      69.133862
    
    
      2017-10-10 00:30:00
      87.25
      72.241514
    
    
      2017-10-10 00:45:00
      87.00
      70.946824
    
    
      2017-10-10 01:00:00
      87.25
      72.474898

Each column is averaged into 15 minute intervals, and those intervals have a common timestamp (you can affect where that timestamp is located in the interval using the label and loffset parameters of resample). All of the temp2 values are used to come up with the average values for that sensor, so no data points are being thrown away. You now can subtract the two columns in the same way as before:



In [22]:

    
df2['temp_diff'] = df2.temp1 - df2.temp2
df2.head()









    Out[22]:







  
    
      
      temp1
      temp2
      temp_diff
    
  
  
    
      2017-10-10 00:00:00
      87.25
      73.179899
      14.070101
    
    
      2017-10-10 00:15:00
      87.00
      69.133862
      17.866138
    
    
      2017-10-10 00:30:00
      87.25
      72.241514
      15.008486
    
    
      2017-10-10 00:45:00
      87.00
      70.946824
      16.053176
    
    
      2017-10-10 01:00:00
      87.25
      72.474898
      14.775102

That's a pretty good solution in many situations. Sometimes, though, you would like to keep a higher time-resolution for your final DataFrame. To do this, instead of averaging the sensors, you can interpolate the values to fill in those timestamps that contain NaN's. In this example, no NaN's appear for temp2, so we don't need to interpolate that sensor, but we do need to interpolate temp1:



In [23]:

    
df3 = df_final.copy()     # make a copy of the DataFrame to work with in this example
df3.temp1.interpolate(inplace=True)
# inplace=True saves me from having to do:  df3.temp1 = df3.temp1.interpolate()
df3.head()









    Out[23]:







  
    
      
      temp1
      temp2
    
  
  
    
      2017-10-10 00:00:00
      87.250000
      72.420794
    
    
      2017-10-10 00:05:00
      87.166667
      73.827610
    
    
      2017-10-10 00:10:00
      87.083333
      73.291292
    
    
      2017-10-10 00:15:00
      87.000000
      69.191258
    
    
      2017-10-10 00:20:00
      87.083333
      69.168449

A linear interpolation was used to fill in the missing temp1 values. Other types of interpolation are available with the Pandas interpolate function. Now we can do the subtraction:



In [24]:

    
df3['temp_diff'] = df3.temp1 - df3.temp2
df3.head()









    Out[24]:







  
    
      
      temp1
      temp2
      temp_diff
    
  
  
    
      2017-10-10 00:00:00
      87.250000
      72.420794
      14.829206
    
    
      2017-10-10 00:05:00
      87.166667
      73.827610
      13.339056
    
    
      2017-10-10 00:10:00
      87.083333
      73.291292
      13.792042
    
    
      2017-10-10 00:15:00
      87.000000
      69.191258
      17.808742
    
    
      2017-10-10 00:20:00
      87.083333
      69.168449
      17.914885

You can compare the results of these approaches in the following graph. The first hour of data is shown.



In [25]:

    
df1.loc[:'2017-10-10 1:00:00', 'temp_diff'].dropna().plot(legend=True, marker='o', label='no processing')
df2.loc[:'2017-10-10 1:00:00', 'temp_diff'].plot(legend=True, marker='o', label='15 min averages')
df3.loc[:'2017-10-10 1:00:00', 'temp_diff'].plot(legend=True, marker='o', label='interpolation')
plt.title('Calculated Temperature Difference');

	bh_100e_tec_room_temp	bh_100w_tec_rm_rh	bhmb_no2_w	bh_101_tec_rm_co2	bhmb_blr_hgr	bh_uhe_tec_rm_rh	bhmb_hrv01_en	bhmb_100_occ	...	bh_102_tec_rm_rh	bhmb_101_occ	bh_uhe_tec_room_temp	bh_101_tec_rm_rh	bh_uhe_tec_rm_co2	bh_100w_tec_rm_co2	bh_uhw_tec_rm_rh	bhmb_hrv01_sat
2017-10-10 00:00:00	81.25	50.0	0.02	1000.0	188.78	31.2	NaN	NaN	...	50.0	NaN	71.25	50.0	306.0	1000.0	30.4	73.75
2017-10-10 00:15:00	81.25	50.0	0.01	1000.0	181.24	31.2	NaN	NaN	...	50.0	NaN	71.25	50.0	304.0	1000.0	30.4	73.78
2017-10-10 00:30:00	81.25	50.0	0.01	1000.0	179.66	31.2	NaN	NaN	...	50.0	NaN	71.25	50.0	307.0	1000.0	30.4	73.79
2017-10-10 00:45:00	81.50	50.0	0.02	1000.0	188.72	31.2	NaN	NaN	...	50.0	NaN	71.25	50.0	306.0	1000.0	30.4	73.80
2017-10-10 01:00:00	81.50	50.0	0.03	1000.0	181.39	31.2	NaN	NaN	...	50.0	NaN	71.25	50.0	308.0	1000.0	30.4	73.81

	bh_100e_tec_room_temp	bh_100w_tec_rm_rh	bhmb_no2_w	bh_101_tec_rm_co2	bhmb_blr_hgr	bh_uhe_tec_rm_rh	bhmb_hrv01_en	bhmb_100_occ	bhmb_co_w	bh_100w_tec_htg_loopout	...	bh_102_tec_rm_rh	bhmb_101_occ	bh_uhe_tec_room_temp	bh_uhw_tec_htg_loopout	bh_101_tec_rm_rh	bh_uhe_tec_rm_co2	bh_100w_tec_rm_co2	bh_uhw_tec_rm_rh	bhmb_co_e	bhmb_hrv01_sat
2017-10-12 03:30:00	81.00	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
2017-10-12 03:45:00	81.00	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
2017-10-12 04:00:00	80.75	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
2017-10-12 04:15:00	81.00	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
2017-10-12 04:30:00	80.75	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

	bh_100e_tec_room_temp	bh_100w_tec_rm_rh	bhmb_no2_w	bh_101_tec_rm_co2	bhmb_blr_hgr	bh_uhe_tec_rm_rh	bhmb_co_w	bh_100w_tec_htg_loopout	bhmb_hrv01_spd	bh_102_tec_room_temp	...	bhmb_hrv01_hcv	bh_102_tec_rm_rh	bh_uhe_tec_room_temp	bh_uhw_tec_htg_loopout	bh_101_tec_rm_rh	bh_uhe_tec_rm_co2	bh_100w_tec_rm_co2	bh_uhw_tec_rm_rh	bhmb_co_e	bhmb_hrv01_sat
count	195.000000	101.0	101.000000	101.0	101.000000	101.000000	101.0	101.0	101.0	101.000000	...	100.0	101.0	101.000000	101.0	101.0	101.000000	101.0	101.000000	101.0	101.000000
mean	80.608974	50.0	0.021386	1000.0	181.250495	32.578218	0.0	0.0	18.0	69.821782	...	0.0	50.0	70.856436	0.0	50.0	313.702970	1000.0	31.695050	0.0	73.243564
std	0.899232	0.0	0.012809	0.0	6.537937	0.905936	0.0	0.0	0.0	0.750280	...	0.0	0.0	0.260640	0.0	0.0	6.042424	0.0	0.867453	0.0	0.813843
min	74.000000	50.0	0.000000	1000.0	153.420000	31.200000	0.0	0.0	18.0	69.000000	...	0.0	50.0	70.250000	0.0	50.0	302.000000	1000.0	30.400000	0.0	71.820000
25%	80.500000	50.0	0.010000	1000.0	177.790000	31.200000	0.0	0.0	18.0	69.250000	...	0.0	50.0	70.750000	0.0	50.0	309.000000	1000.0	30.400000	0.0	72.630000
50%	80.750000	50.0	0.020000	1000.0	181.740000	32.800000	0.0	0.0	18.0	69.500000	...	0.0	50.0	70.750000	0.0	50.0	314.000000	1000.0	32.000000	0.0	73.090000
75%	81.000000	50.0	0.030000	1000.0	186.310000	33.200000	0.0	0.0	18.0	71.000000	...	0.0	50.0	71.250000	0.0	50.0	318.000000	1000.0	32.400000	0.0	73.940000
max	81.500000	50.0	0.050000	1000.0	191.550000	33.600000	0.0	0.0	18.0	71.250000	...	0.0	50.0	71.250000	0.0	50.0	324.000000	1000.0	32.800000	0.0	74.860000

	temp1
2017-10-10 00:00:00	87.25
2017-10-10 00:15:00	87.00
2017-10-10 00:30:00	87.25
2017-10-10 00:45:00	87.00
2017-10-10 01:00:00	87.25

	temp2
2017-10-10 00:00:00	72.420794
2017-10-10 00:05:00	73.827610
2017-10-10 00:10:00	73.291292
2017-10-10 00:15:00	69.191258
2017-10-10 00:20:00	69.168449

	temp1	temp2
2017-10-10 00:00:00	87.25	73.179899
2017-10-10 00:15:00	87.00	69.133862
2017-10-10 00:30:00	87.25	72.241514
2017-10-10 00:45:00	87.00	70.946824
2017-10-10 01:00:00	87.25	72.474898

	temp1	temp2
2017-10-10 00:00:00	87.250000	72.420794
2017-10-10 00:05:00	87.166667	73.827610
2017-10-10 00:10:00	87.083333	73.291292
2017-10-10 00:15:00	87.000000	69.191258
2017-10-10 00:20:00	87.083333	69.168449