T. Martz-Oberlander, 2015-11-12, CO2 and Speed of Sound

Formatting ENVIRONMENTAL CONDITIONS pipe organ data for Python operations

NOTE: Here, pitch and frequency are used interchangeably to signify the speed of sound from organ pipes.

The entire script looks for mathematical relationships between CO2 concentration changes and pitch changes from a pipe organ. This script uploads, cleans data and organizes new dataframes, creates figures, and performs statistical tests on the relationships between variable CO2 and frequency of sound from a note played on a pipe organ.

This uploader script:

1) Uploads CO2, temp, and RH data files;

2) Munges it (creates a Date Time column for the time stamps), establishes column contents as floats;

3) Calculates expected frequency, as per Cramer's equation;

4) Imports output from pitch_data.py script, the dataframe with measured frequency;

5) Plots expected frequency curve, CO2 (ppm) curve, and measured pitch points in a figure.

[ Here I pursue data analysis route 1 (as mentionted in my organ_pitch/notebook.md file), which involves comparing one pitch dataframe with one dataframe of environmental characteristics taken at one sensor location. Both dataframes are compared by the time of data recorded. ]


In [1]:
# I import useful libraries (with functions) so I can visualize my data
# I use Pandas because this dataset has word/string column titles and I like the readability features of commands and finish visual products that Pandas offers

import pandas as pd
import matplotlib.pyplot as plt
import re
import numpy as np

%matplotlib inline

#I want to be able to easily scroll through this notebook so I limit the length of the appearance of my dataframes 
from pandas import set_option
set_option('display.max_rows', 10)

Uploaded RH and temp data into Python

First I upload my data set(s). I am working with environmental data from different locations in the church at differnet dates. Files include: environmental characteristics (CO2, temperature (deg C), and relative humidity (RH) (%) measurements).

I can discard the CO2_2 column values since they are false measurements logged from an empty input jack in the CO2 HOBOWare ^(r) device.


In [12]:
#I import a temp and RH data file
env=pd.read_table('../Data/CO2May.csv', sep=',')

#assigning columns names
env.columns=[['test', 'time','temp C', 'RH %', 'CO2_1', 'CO2_2']]

#I display my dataframe
env


Out[12]:
test time temp C RH % CO2_1 CO2_2
0 1 04/17/10 11:00:00 AM 20.650 35.046 452.4 689.9
1 2 04/17/10 11:02:00 AM 20.579 35.105 450.5 677.0
2 3 04/17/10 11:04:00 AM 20.507 35.229 450.5 663.6
3 4 04/17/10 11:06:00 AM 20.460 35.291 448.7 652.0
4 5 04/17/10 11:08:00 AM 20.412 35.352 442.0 641.0
... ... ... ... ... ... ...
10853 10854 2005-02-10 12:46 21.581 44.604 501.2 483.5
10854 10855 2005-02-10 12:48 21.581 44.604 504.3 482.9
10855 10856 2005-02-10 12:50 21.581 44.604 503.7 482.3
10856 10857 2005-02-10 12:52 21.604 44.575 503.1 481.7
10857 10858 2005-02-10 12:54 21.604 44.575 498.8 480.5

10858 rows × 6 columns


In [3]:
#change data time variable to actual values of time. 
env['time']= pd.to_datetime(env['time'])

#print the new table and the type of data. 
print(env)

env.dtypes


        test                time  temp C    RH %  CO2_1  CO2_2
0          1 2010-04-17 11:00:00  20.650  35.046  452.4  689.9
1          2 2010-04-17 11:02:00  20.579  35.105  450.5  677.0
2          3 2010-04-17 11:04:00  20.507  35.229  450.5  663.6
3          4 2010-04-17 11:06:00  20.460  35.291  448.7  652.0
4          5 2010-04-17 11:08:00  20.412  35.352  442.0  641.0
...      ...                 ...     ...     ...    ...    ...
10853  10854 2005-02-10 12:46:00  21.581  44.604  501.2  483.5
10854  10855 2005-02-10 12:48:00  21.581  44.604  504.3  482.9
10855  10856 2005-02-10 12:50:00  21.581  44.604  503.7  482.3
10856  10857 2005-02-10 12:52:00  21.604  44.575  503.1  481.7
10857  10858 2005-02-10 12:54:00  21.604  44.575  498.8  480.5

[10858 rows x 6 columns]
Out[3]:
test               int64
time      datetime64[ns]
temp C           float64
RH %             float64
CO2_1            float64
CO2_2            float64
dtype: object

In [ ]:

Next

1. Create a function for expected pitch (frequency of sound waves) from CO2 data

2. Add expected_frequency to dataframe

Calculated pitch from CO2 levels

Here I use Cramer's equation for frequency of sound from CO2 concentration (1992).

freq = a0 + a1(T) + ... + (a9 +...) +... + a14(xc^2) where xc is the mole fraction of CO2 and T is temperature. Full derivation of these equations can be found in the "Doc" directory.

I will later plot measured pitch (frequency) data points from my "pitch" data frame on top of these calculated frequency values for comparison.


In [4]:
#Here I am trying to create a function for the above equation. 
#I want to plug in each CO2_ave value for a time stamp (row) from the "env" data frame above. 

#define coefficients (Cramer, 1992)
a0 = 331.5024
#a1 = 0.603055
#a2 = -0.000528
a9 = -(-85.20931) #need to account for negative values
#a10 = -0.228525
a14 = 29.179762

#xc =  CO2 values from dataframe

In [7]:
#test function
def test_cramer():
    assert a0 + ((a9)*400)/100 + a14*((400/1000000)**2) == 672.33964466, 'Equation failure'
    return()

test_cramer()


---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-7-f64a6c70166a> in <module>()
      4     return()
      5 
----> 6 test_cramer()

<ipython-input-7-f64a6c70166a> in test_cramer()
      1 #test function
      2 def test_cramer():
----> 3     assert a0 + ((a9)*400)/100 + a14*((400/1000000)**2) == 672., 'Equation failure'
      4     return()
      5 

AssertionError: Equation failure

In [21]:
#This function also converts ppm to mole fraction (just quantity as a proportion of total)
def cramer(data):
    '''Calculate pitch from CO2_1 concentration'''
 
    calc_freq = a0 + ((a9)*data)/100 + a14*((data/1000000)**2)
 
    return(calc_freq)

In [ ]:
#run the cramer values for the calculated frequency 
#calc_freq = cramer(env['calc_freq'])

In [ ]:
#define the new column as the output of the cramer function
#env['calc_freq'] = calc_freq

In [16]:
#Run the function for the input column (CO2 values)
env['calc_freq'] = cramer(env['CO2_1'])

cramer(env['CO2_1'])


Out[16]:
0        716.989324
1        715.370347
2        715.370347
3        713.836580
4        708.127556
            ...    
10853    758.571469
10854    761.212958
10855    760.701702
10856    760.190446
10857    756.526446
Name: CO2_1, dtype: float64

In [17]:
#check the dataframe
#calculated frequency values seem reasonable based on changes in CO2
env


Out[17]:
test time temp C RH % CO2_1 CO2_2 calc_freq
0 1 04/17/10 11:00:00 AM 20.650 35.046 452.4 689.9 716.989324
1 2 04/17/10 11:02:00 AM 20.579 35.105 450.5 677.0 715.370347
2 3 04/17/10 11:04:00 AM 20.507 35.229 450.5 663.6 715.370347
3 4 04/17/10 11:06:00 AM 20.460 35.291 448.7 652.0 713.836580
4 5 04/17/10 11:08:00 AM 20.412 35.352 442.0 641.0 708.127556
... ... ... ... ... ... ... ...
10853 10854 2005-02-10 12:46 21.581 44.604 501.2 483.5 758.571469
10854 10855 2005-02-10 12:48 21.581 44.604 504.3 482.9 761.212958
10855 10856 2005-02-10 12:50 21.581 44.604 503.7 482.3 760.701702
10856 10857 2005-02-10 12:52 21.604 44.575 503.1 481.7 760.190446
10857 10858 2005-02-10 12:54 21.604 44.575 498.8 480.5 756.526446

10858 rows × 7 columns


In [27]:
#Now I call in my measured pitch data, 
#to be able to visually compare calculated and measured

#Import the measured pitch values--the output of pitch_data.py script
measured_freq = pd.read_table('../Data/pitches.csv', sep=',')

#change data time variable to actual values of time. 
env['time']= pd.to_datetime(env['time'])

#I test to make sure I'm importing the correct data
measured_freq


Out[27]:
time div note freq1 freq2 freq3 freq4 freq5 freq6 freq7 freq8 freq9
0 2010-04-13 8:37 pedal c3 131.17 131.20 131.18 131.11 131.17 131.14 131.21 NaN NaN
1 2010-04-13 8:37 pedal c4 262.08 262.12 262.09 262.05 262.07 262.10 262.08 NaN NaN
2 2010-04-13 8:40 swell c3 131.42 131.47 131.45 131.47 131.50 131.47 131.45 NaN NaN
3 2010-04-13 8:40 swell c4 262.90 262.87 262.84 262.85 262.90 262.87 262.88 NaN NaN
4 2010-04-13 8:42 great c4 262.04 262.05 262.01 262.03 261.97 261.98 261.99 NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ...
52 2010-04-17 10:35 pedal c4 261.95 261.95 262.02 262.00 261.97 262.01 261.95 261.97 NaN
53 2010-04-17 10:37 great c4 261.69 261.69 261.68 261.71 261.74 261.66 261.68 261.69 261.67
54 2010-04-17 9:54 choir c5 NaN 523.73 523.61 523.66 523.77 523.63 523.65 523.69 NaN
55 2010-04-17 10:35 pedal c4 NaN 261.95 261.95 262.02 262.00 261.97 262.01 261.95 261.97
56 2010-04-17 10:37 great c4 NaN 261.69 261.69 261.68 261.71 261.74 261.66 261.68 261.69

57 rows × 12 columns


In [ ]:


In [ ]:


In [ ]:

Visualizing the expected pitch values by time

1. Plot calculated frequency, CO2 (ppm), and measured frequency values


In [28]:
print(calc_freq)


        calc_freq
0      716.989324
1      715.370347
2      715.370347
3      713.836580
4      708.127556
...           ...
10853  758.571469
10854  761.212958
10855  760.701702
10856  760.190446
10857  756.526446

[10858 rows x 1 columns]

In [29]:
#define variables from dataframe columns
CO2_1 = env[['CO2_1']]

calc_freq=env[['calc_freq']]

#measured_pitch = output_from_'pitch_data.py'

In [31]:
#want to set x-axis as date_time
#how do I format the ax2 y axis scale

def make_plot(variable_1, variable_2):
    '''Make a three variable plot with two axes'''

#plot title
    plt.title('CO2 and Calculated Pitch', fontsize='14')

#twinx layering
    ax1=plt.subplot()
    ax2=ax1.twinx()
    #ax3=ax1.twinx()

#call data for the plot
    ax1.plot(CO2_1, color='g', linewidth=1)
    ax2.plot(calc_freq, color= 'm', linewidth=1) 
    #ax3.plot(measured_freq, color = 'b', marker= 'x')

#axis labeling
    ax1.yaxis.set_tick_params(labelcolor='grey')
    ax1.set_xlabel('Sample Number')
    ax1.set_ylabel('CO2 (ppm)', fontsize=12, color = 'g')
    ax2.set_ylabel('Calculated Pitch (Hz)', fontsize=12, color='m') 
    #ax3.set_ylabel('Measured Pitch')

#axis limits
    ax1.set_ylim([400,1300])
    ax2.set_ylim([600, 1500])

    #plt.savefig('../Figures/fig1.pdf')

#Close function
    return()#'../Figures/fig1.pdf')


#Call my function to test it 
make_plot(CO2_1, calc_freq)


Out[31]:
()

In [ ]:
measured_freq.head()

In [ ]:
env.head()

In [ ]:
Freq vs. CO2

In [ ]:


In [ ]:


In [ ]:
plt.plot(env.CO2_1, measured_freq.time, color='g', linewidth=1)

In [ ]:


In [ ]:


In [ ]:


In [ ]:


In [ ]:


In [ ]:
#def make_fig(datasets, variable_1, variable_2, savename):

#twinx layering
ax1=plt.subplot()
ax2=ax1.twinx()

#plot 2 variables in predertermined plot above
ax1.plot(dataset.index, variable_1, 'k-', linewidth=2)
ax2.plot(dataset.index, variable_2, )

#moving plots lines
variable_2_spine=ax2.spines['right']
variable_2_spine.set_position(('axes', 1.2))

ax1.yaxi.set_tick_params(labelcolor='k')
ax1.set_ylabel(variable_1.name, fontsize=13, colour = 'k')
ax2.sey_ylabel(variable_2.name + '($^o$C)', fontsize=13, color='grey')

#plt.savefig(savename)
return(savename)

In [ ]:
fig = plt.figure(figsize=(11,14))
plt.suptitle('')

ax1.plot(colum1, colum2, 'k-', linewidth=2)
" "

ax1.set_ylim([0,1])
ax2.set_ylim([0,1])

ax1.set_xlabel('name', fontsize=14, y=0)
ax1.set_ylabel
ax2.set_ylabel

In [ ]:


In [ ]:


In [ ]:


In [ ]:


In [ ]:
#convert 'object' (CO2_1) to float 
new = pd.Series([env.CO2_1], name = 'CO2_1')

CO2_1 = new.tolist()

CO2_array = np.array(CO2_1)

#Test type of data in "CO2_1" column
env.CO2_1.dtypes

In [ ]:


In [ ]:


In [ ]:
#How can I format it so it's not an object?

cramer(CO2_array)

In [ ]:


In [ ]:


In [ ]:


In [ ]:
#'float' object not callable--the data in "CO2_1" are objects and cannot be called into the equation
#cramer(env.CO2_ave)

In [ ]:
env.dtypes

In [ ]:
env.CO2_1.dtypes

In [ ]:
new = pd.Series([env.CO2_1], name = 'CO2_1')

CO2_1 = new.tolist()

CO2_array = np.array(CO2_1)

#Test type of data in "CO2_1" column
env.CO2_1.dtypes

In [ ]:
cramer(CO2_array)

In [ ]:
type(CO2_array)

In [ ]:


In [ ]:


In [ ]:


In [ ]:


In [ ]:


In [ ]:
# To choose which CO2 value to use, I first visualize which seems normal 

#Create CO2-only dataframs
CO2 = env[['CO2_1', 'CO2_2']]

#Make a plot
CO2_fig = plt.plot(CO2)

plt.ylabel('CO2 (ppm)')
plt.xlabel('Sample number')
plt.title('Two CO2 sensors, same time and place')

#plt.savefig('CO2_fig.pdf')

In [ ]:


In [ ]:


In [ ]:


In [ ]:


In [ ]:


In [ ]:


In [11]:
input_file = env



#Upload environmental data file
env = pd.read_table('', sep=',')



#assigning columns names
env.columns=[['test', 'date_time','temp C', 'RH %', 'CO2_1', 'CO2_2']]

#change data time variable to actual values of time.
env['date_time']= pd.to_datetime(env['date_time'])

#test function
        #def test_cramer():
                #assert a0 + ((a9)*400)/100 + a14*((400/1000000)**2) == 672.339644669, 'Equation failure, math-mess-up'
                #return()

#Call the test function
        #test_cramer()

#pitch calculator function from Cramer equation
def cramer(data):
    '''Calculate pitch from CO2_1 concentration'''
    calc_freq = a0 + ((a9*data)/100) + a14*((data)**2)
    return(calc_freq)

#Run the function for the input column (CO2 values) to get a new column of calculated_frequency
env['calc_freq'] = cramer(env['CO2_1'])

#Import the measured pitch values--the output of pitch_data.py script
measured_freq = pd.read_table('../organ_pitch/Data/munged_pitch.csv', sep=',')

#change data time variable to actual values of time.
env['time']= pd.to_datetime(env['time'])

#Function to make and save a plot


---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
<ipython-input-11-a4e72689e698> in <module>()
      4 
      5 #Upload environmental data file
----> 6 env = pd.read_table(input_file, sep=',')
      7 
      8 #assigning columns names

/Users/shubbymartz-oberlander/anaconda/lib/python3.4/site-packages/pandas/io/parsers.py in parser_f(filepath_or_buffer, sep, dialect, compression, doublequote, escapechar, quotechar, quoting, skipinitialspace, lineterminator, header, index_col, names, prefix, skiprows, skipfooter, skip_footer, na_values, true_values, false_values, delimiter, converters, dtype, usecols, engine, delim_whitespace, as_recarray, na_filter, compact_ints, use_unsigned, low_memory, buffer_lines, warn_bad_lines, error_bad_lines, keep_default_na, thousands, comment, decimal, parse_dates, keep_date_col, dayfirst, date_parser, memory_map, float_precision, nrows, iterator, chunksize, verbose, encoding, squeeze, mangle_dupe_cols, tupleize_cols, infer_datetime_format, skip_blank_lines)
    489                     skip_blank_lines=skip_blank_lines)
    490 
--> 491         return _read(filepath_or_buffer, kwds)
    492 
    493     parser_f.__name__ = name

/Users/shubbymartz-oberlander/anaconda/lib/python3.4/site-packages/pandas/io/parsers.py in _read(filepath_or_buffer, kwds)
    266 
    267     # Create the parser.
--> 268     parser = TextFileReader(filepath_or_buffer, **kwds)
    269 
    270     if (nrows is not None) and (chunksize is not None):

/Users/shubbymartz-oberlander/anaconda/lib/python3.4/site-packages/pandas/io/parsers.py in __init__(self, f, engine, **kwds)
    581             self.options['has_index_names'] = kwds['has_index_names']
    582 
--> 583         self._make_engine(self.engine)
    584 
    585     def _get_options_with_defaults(self, engine):

/Users/shubbymartz-oberlander/anaconda/lib/python3.4/site-packages/pandas/io/parsers.py in _make_engine(self, engine)
    722     def _make_engine(self, engine='c'):
    723         if engine == 'c':
--> 724             self._engine = CParserWrapper(self.f, **self.options)
    725         else:
    726             if engine == 'python':

/Users/shubbymartz-oberlander/anaconda/lib/python3.4/site-packages/pandas/io/parsers.py in __init__(self, src, **kwds)
   1091         kwds['allow_leading_cols'] = self.index_col is not False
   1092 
-> 1093         self._reader = _parser.TextReader(src, **kwds)
   1094 
   1095         # XXX

pandas/parser.pyx in pandas.parser.TextReader.__cinit__ (pandas/parser.c:3229)()

pandas/parser.pyx in pandas.parser.TextReader._setup_parser_source (pandas/parser.c:6219)()

OSError: Expected file path name or file-like object, got <class 'pandas.core.frame.DataFrame'> type

In [ ]: