NOTE: Here, pitch and frequency are used interchangeably to signify the speed of sound from organ pipes.
The entire script looks for mathematical relationships between CO2 concentration changes and pitch changes from a pipe organ. This script uploads, cleans data and organizes new dataframes, creates figures, and performs statistical tests on the relationships between variable CO2 and frequency of sound from a note played on a pipe organ.
This uploader script:
1) Uploads CO2, temp, and RH data files;
2) Munges it (creates a Date Time column for the time stamps), establishes column contents as floats;
3) Calculates expected frequency, as per Cramer's equation;
4) Imports output from pitch_data.py script, the dataframe with measured frequency;
5) Plots expected frequency curve, CO2 (ppm) curve, and measured pitch points in a figure.
[ Here I pursue data analysis route 1 (as mentionted in my organ_pitch/notebook.md file), which involves comparing one pitch dataframe with one dataframe of environmental characteristics taken at one sensor location. Both dataframes are compared by the time of data recorded. ]
In [1]:
# I import useful libraries (with functions) so I can visualize my data
# I use Pandas because this dataset has word/string column titles and I like the readability features of commands and finish visual products that Pandas offers
import pandas as pd
import matplotlib.pyplot as plt
import re
import numpy as np
%matplotlib inline
#I want to be able to easily scroll through this notebook so I limit the length of the appearance of my dataframes
from pandas import set_option
set_option('display.max_rows', 10)
First I upload my data set(s). I am working with environmental data from different locations in the church at differnet dates. Files include: environmental characteristics (CO2, temperature (deg C), and relative humidity (RH) (%) measurements).
I can discard the CO2_2 column values since they are false measurements logged from an empty input jack in the CO2 HOBOWare ^(r) device.
In [12]:
#I import a temp and RH data file
env=pd.read_table('../Data/CO2May.csv', sep=',')
#assigning columns names
env.columns=[['test', 'time','temp C', 'RH %', 'CO2_1', 'CO2_2']]
#I display my dataframe
env
Out[12]:
In [3]:
#change data time variable to actual values of time.
env['time']= pd.to_datetime(env['time'])
#print the new table and the type of data.
print(env)
env.dtypes
Out[3]:
In [ ]:
Here I use Cramer's equation for frequency of sound from CO2 concentration (1992).
freq = a0 + a1(T) + ... + (a9 +...) +... + a14(xc^2) where xc is the mole fraction of CO2 and T is temperature. Full derivation of these equations can be found in the "Doc" directory.
I will later plot measured pitch (frequency) data points from my "pitch" data frame on top of these calculated frequency values for comparison.
In [4]:
#Here I am trying to create a function for the above equation.
#I want to plug in each CO2_ave value for a time stamp (row) from the "env" data frame above.
#define coefficients (Cramer, 1992)
a0 = 331.5024
#a1 = 0.603055
#a2 = -0.000528
a9 = -(-85.20931) #need to account for negative values
#a10 = -0.228525
a14 = 29.179762
#xc = CO2 values from dataframe
In [7]:
#test function
def test_cramer():
assert a0 + ((a9)*400)/100 + a14*((400/1000000)**2) == 672.33964466, 'Equation failure'
return()
test_cramer()
In [21]:
#This function also converts ppm to mole fraction (just quantity as a proportion of total)
def cramer(data):
'''Calculate pitch from CO2_1 concentration'''
calc_freq = a0 + ((a9)*data)/100 + a14*((data/1000000)**2)
return(calc_freq)
In [ ]:
#run the cramer values for the calculated frequency
#calc_freq = cramer(env['calc_freq'])
In [ ]:
#define the new column as the output of the cramer function
#env['calc_freq'] = calc_freq
In [16]:
#Run the function for the input column (CO2 values)
env['calc_freq'] = cramer(env['CO2_1'])
cramer(env['CO2_1'])
Out[16]:
In [17]:
#check the dataframe
#calculated frequency values seem reasonable based on changes in CO2
env
Out[17]:
In [27]:
#Now I call in my measured pitch data,
#to be able to visually compare calculated and measured
#Import the measured pitch values--the output of pitch_data.py script
measured_freq = pd.read_table('../Data/pitches.csv', sep=',')
#change data time variable to actual values of time.
env['time']= pd.to_datetime(env['time'])
#I test to make sure I'm importing the correct data
measured_freq
Out[27]:
In [ ]:
In [ ]:
In [ ]:
In [28]:
print(calc_freq)
In [29]:
#define variables from dataframe columns
CO2_1 = env[['CO2_1']]
calc_freq=env[['calc_freq']]
#measured_pitch = output_from_'pitch_data.py'
In [31]:
#want to set x-axis as date_time
#how do I format the ax2 y axis scale
def make_plot(variable_1, variable_2):
'''Make a three variable plot with two axes'''
#plot title
plt.title('CO2 and Calculated Pitch', fontsize='14')
#twinx layering
ax1=plt.subplot()
ax2=ax1.twinx()
#ax3=ax1.twinx()
#call data for the plot
ax1.plot(CO2_1, color='g', linewidth=1)
ax2.plot(calc_freq, color= 'm', linewidth=1)
#ax3.plot(measured_freq, color = 'b', marker= 'x')
#axis labeling
ax1.yaxis.set_tick_params(labelcolor='grey')
ax1.set_xlabel('Sample Number')
ax1.set_ylabel('CO2 (ppm)', fontsize=12, color = 'g')
ax2.set_ylabel('Calculated Pitch (Hz)', fontsize=12, color='m')
#ax3.set_ylabel('Measured Pitch')
#axis limits
ax1.set_ylim([400,1300])
ax2.set_ylim([600, 1500])
#plt.savefig('../Figures/fig1.pdf')
#Close function
return()#'../Figures/fig1.pdf')
#Call my function to test it
make_plot(CO2_1, calc_freq)
Out[31]:
In [ ]:
measured_freq.head()
In [ ]:
env.head()
In [ ]:
Freq vs. CO2
In [ ]:
In [ ]:
In [ ]:
plt.plot(env.CO2_1, measured_freq.time, color='g', linewidth=1)
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
#def make_fig(datasets, variable_1, variable_2, savename):
#twinx layering
ax1=plt.subplot()
ax2=ax1.twinx()
#plot 2 variables in predertermined plot above
ax1.plot(dataset.index, variable_1, 'k-', linewidth=2)
ax2.plot(dataset.index, variable_2, )
#moving plots lines
variable_2_spine=ax2.spines['right']
variable_2_spine.set_position(('axes', 1.2))
ax1.yaxi.set_tick_params(labelcolor='k')
ax1.set_ylabel(variable_1.name, fontsize=13, colour = 'k')
ax2.sey_ylabel(variable_2.name + '($^o$C)', fontsize=13, color='grey')
#plt.savefig(savename)
return(savename)
In [ ]:
fig = plt.figure(figsize=(11,14))
plt.suptitle('')
ax1.plot(colum1, colum2, 'k-', linewidth=2)
" "
ax1.set_ylim([0,1])
ax2.set_ylim([0,1])
ax1.set_xlabel('name', fontsize=14, y=0)
ax1.set_ylabel
ax2.set_ylabel
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
#convert 'object' (CO2_1) to float
new = pd.Series([env.CO2_1], name = 'CO2_1')
CO2_1 = new.tolist()
CO2_array = np.array(CO2_1)
#Test type of data in "CO2_1" column
env.CO2_1.dtypes
In [ ]:
In [ ]:
In [ ]:
#How can I format it so it's not an object?
cramer(CO2_array)
In [ ]:
In [ ]:
In [ ]:
In [ ]:
#'float' object not callable--the data in "CO2_1" are objects and cannot be called into the equation
#cramer(env.CO2_ave)
In [ ]:
env.dtypes
In [ ]:
env.CO2_1.dtypes
In [ ]:
new = pd.Series([env.CO2_1], name = 'CO2_1')
CO2_1 = new.tolist()
CO2_array = np.array(CO2_1)
#Test type of data in "CO2_1" column
env.CO2_1.dtypes
In [ ]:
cramer(CO2_array)
In [ ]:
type(CO2_array)
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
# To choose which CO2 value to use, I first visualize which seems normal
#Create CO2-only dataframs
CO2 = env[['CO2_1', 'CO2_2']]
#Make a plot
CO2_fig = plt.plot(CO2)
plt.ylabel('CO2 (ppm)')
plt.xlabel('Sample number')
plt.title('Two CO2 sensors, same time and place')
#plt.savefig('CO2_fig.pdf')
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [11]:
input_file = env
#Upload environmental data file
env = pd.read_table('', sep=',')
#assigning columns names
env.columns=[['test', 'date_time','temp C', 'RH %', 'CO2_1', 'CO2_2']]
#change data time variable to actual values of time.
env['date_time']= pd.to_datetime(env['date_time'])
#test function
#def test_cramer():
#assert a0 + ((a9)*400)/100 + a14*((400/1000000)**2) == 672.339644669, 'Equation failure, math-mess-up'
#return()
#Call the test function
#test_cramer()
#pitch calculator function from Cramer equation
def cramer(data):
'''Calculate pitch from CO2_1 concentration'''
calc_freq = a0 + ((a9*data)/100) + a14*((data)**2)
return(calc_freq)
#Run the function for the input column (CO2 values) to get a new column of calculated_frequency
env['calc_freq'] = cramer(env['CO2_1'])
#Import the measured pitch values--the output of pitch_data.py script
measured_freq = pd.read_table('../organ_pitch/Data/munged_pitch.csv', sep=',')
#change data time variable to actual values of time.
env['time']= pd.to_datetime(env['time'])
#Function to make and save a plot
In [ ]: