The purpose of this notebook is to demonstrate how to import and run the various portions of our eRivers project.
Here are two ways to install packages:
conda install bokeh
This will install the most recent published Bokeh release from the Continuum Analytics Anaconda repository, along with all dependencies.
pip install bokeh
In [3]:
%cd Analysis/
from regression import *
from data_cleaning_utils import *
import pandas as pd
import matplotlib.pyplot as plt
from IPython.display import Image
%matplotlib inline
For this walkthrough, we will use a .csv that contains in-situ chemical data collected form the Mississippi River. See image below for data on how the samples were collected.
The first portion of our code deals with importing the data, recognizing the time variable and providing tools to allow the user to clean the data.
In [4]:
%cd ../Data/mississippi
%ls
The data_import function attempts to automatically recognize the date format, but asks the user for input as well. It returns a pandas dataframe of the raw imported data indexed by the datetime column.
In [42]:
raw_data_miss = import_data('UMR_Day42015-08-04_Nulls_Removed.csv')
%cd ../Amazon/
raw_data_amz = import_data('TROCAS2_clean.csv')
In [43]:
raw_data_miss = nullRemover(raw_data_miss)
raw_data_amz = nullRemover(raw_data_amz)
We created a smoothing function that applys a running mean to our time series data. This tool allows the user to input the window for the running mean. It also outputs a simple time-series of the data before and after the smoothing to allow the user to determine the optimum window size
In [44]:
cleaned_miss= test_smooth_data("pH", raw_data_miss)
Using a 45 second window removes a significant amount of noise from the data without removing the rapidly occuring trends that we see in the data.
In [46]:
cleaned_miss = nullRemover(cleaned_miss)
In [47]:
resample = reducer(cleaned_miss)
In [17]:
%cd ../../Analysis/timeseries
import TimeSeries as ts
%cd ../../Data/mississippi/
data = pd.read_csv('UMR_Day42015-08-04_Nulls_Removed.csv')
data.head()
Out[17]:
In [49]:
ts.timeplot(data)