T. Martz-Oberlander, 2015-11-12, CO2 and Speed of Sound

Formatting PITCH pipe organ data for Python operations

The entire script looks for mathematical relationships between CO2 concentration changes and pitch changes from a pipe organ. This script uploads, cleans data and organizes new dataframes, creates figures, and performs statistical tests on the relationships between variable CO2 and frequency of sound from a note played on a pipe organ.

This uploader script:

1) Uploads organ note pitch data files

2) Munges it (creates a Date Time column for the time stamps), establishes column contents as floats

Here I pursue data analysis route 1 (as mentionted in my notebook.md file), which involves comparing one pitch dataframe with one dataframe of environmental characteristics taken at one sensor location. Both dataframes are compared by the time of data recorded.



In [1]:

    
# I import useful libraries (with functions) so I can visualize my data
# I use Pandas because this dataset has word/string column titles and I like the readability features of commands and finish visual products that Pandas offers

import pandas as pd
import matplotlib.pyplot as plt
import re
import numpy as np

%matplotlib inline

#I want to be able to easily scroll through this notebook so I limit the length of the appearance of my dataframes 
from pandas import set_option
set_option('display.max_rows', 10)

Uploaded data into Python

First I upload my data sets. I am working with two: one for pitch measurements and another for environmental characteristics (CO2, temperature (deg C), and relative humidity (RH) (%) measurements). My data comes from environmental sensing logger devices in the "Choir Division" section of the organ consul.



In [11]:

    
#I import a pitch data file

#comment by nick changed the path you upload that data from making in compatible with clone copies of your project
pitch=pd.read_table('../Data/pitches.csv', sep=',')

#assigning columns names
pitch.columns=[['date_time','section','note','freq1','freq2','freq3', 'freq4', 'freq5', 'freq6', 'freq7', 'freq8', 'freq9']]

#I display my dataframe
pitch









    Out[11]:






  
    
      
      date_time
      section
      note
      freq1
      freq2
      freq3
      freq4
      freq5
      freq6
      freq7
      freq8
      freq9
    
  
  
    
      0
      2010-04-13 8:37
      pedal
      c3
      131.17
      131.20
      131.18
      131.11
      131.17
      131.14
      131.21
      NaN
      NaN
    
    
      1
      2010-04-13 8:37
      pedal
      c4
      262.08
      262.12
      262.09
      262.05
      262.07
      262.10
      262.08
      NaN
      NaN
    
    
      2
      2010-04-13 8:40
      swell
      c3
      131.42
      131.47
      131.45
      131.47
      131.50
      131.47
      131.45
      NaN
      NaN
    
    
      3
      2010-04-13 8:40
      swell
      c4
      262.90
      262.87
      262.84
      262.85
      262.90
      262.87
      262.88
      NaN
      NaN
    
    
      4
      2010-04-13 8:42
      great
      c4
      262.04
      262.05
      262.01
      262.03
      261.97
      261.98
      261.99
      NaN
      NaN
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      52
      2010-04-17 10:35
      pedal
      c4
      261.95
      261.95
      262.02
      262.00
      261.97
      262.01
      261.95
      261.97
      NaN
    
    
      53
      2010-04-17 10:37
      great
      c4
      261.69
      261.69
      261.68
      261.71
      261.74
      261.66
      261.68
      261.69
      261.67
    
    
      54
      2010-04-17 9:54
      choir
      c5
      NaN
      523.73
      523.61
      523.66
      523.77
      523.63
      523.65
      523.69
      NaN
    
    
      55
      2010-04-17 10:35
      pedal
      c4
      NaN
      261.95
      261.95
      262.02
      262.00
      261.97
      262.01
      261.95
      261.97
    
    
      56
      2010-04-17 10:37
      great
      c4
      NaN
      261.69
      261.69
      261.68
      261.71
      261.74
      261.66
      261.68
      261.69
    
  

57 rows × 12 columns



In [13]:

    
#Tell python that my date_time column has a "datetime" values, so it won't read as a string or object
pitch['date_time']= pd.to_datetime(env_choir_div['Date_time'])

#print the new table and the type of data to check that all columns are in line with the column names
print(pitch)

#Check the type of data in each column. This shows there are integers and floats, and datetime. This is good for analysing.
pitch.dtypes









    



             date_time section note   freq1   freq2   freq3   freq4   freq5  \
0  2010-04-17 11:00:00   pedal   c3  131.17  131.20  131.18  131.11  131.17   
1  2010-04-17 11:02:00   pedal   c4  262.08  262.12  262.09  262.05  262.07   
2  2010-04-17 11:04:00   swell   c3  131.42  131.47  131.45  131.47  131.50   
3  2010-04-17 11:06:00   swell   c4  262.90  262.87  262.84  262.85  262.90   
4  2010-04-17 11:08:00   great   c4  262.04  262.05  262.01  262.03  261.97   
..                 ...     ...  ...     ...     ...     ...     ...     ...   
52 2010-04-17 12:44:00   pedal   c4  261.95  261.95  262.02  262.00  261.97   
53 2010-04-17 12:46:00   great   c4  261.69  261.69  261.68  261.71  261.74   
54 2010-04-17 12:48:00   choir   c5     NaN  523.73  523.61  523.66  523.77   
55 2010-04-17 12:50:00   pedal   c4     NaN  261.95  261.95  262.02  262.00   
56 2010-04-17 12:52:00   great   c4     NaN  261.69  261.69  261.68  261.71   

     freq6   freq7   freq8   freq9  
0   131.14  131.21     NaN     NaN  
1   262.10  262.08     NaN     NaN  
2   131.47  131.45     NaN     NaN  
3   262.87  262.88     NaN     NaN  
4   261.98  261.99     NaN     NaN  
..     ...     ...     ...     ...  
52  262.01  261.95  261.97     NaN  
53  261.66  261.68  261.69  261.67  
54  523.63  523.65  523.69     NaN  
55  261.97  262.01  261.95  261.97  
56  261.74  261.66  261.68  261.69  

[57 rows x 12 columns]






    Out[13]:





date_time    datetime64[ns]
section              object
note                 object
freq1               float64
freq2               float64
                  ...      
freq5               float64
freq6               float64
freq7               float64
freq8               float64
freq9               float64
dtype: object

1. Find the average pitch value for each date_time

2. Select out the pitch data for one division at a time. Make an argument

3. Append other pitch files



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [ ]: