In [1]:
%%html
<script type="text/javascript">
show=true;
function toggle(){
if (show){$('div.input').hide();}else{$('div.input').show();}
show = !show}
</script>
<h2><a href="javascript:toggle()" target="_self">Click to toggle code input</a></h2>
In [2]:
%pylab inline
In [3]:
import seaborn as sns
sns.set_context("notebook", font_scale=1.5, rc={"lines.linewidth": 2.5})
cmap = sns.cubehelix_palette(light=1, as_cmap=True)
sns.palplot(sns.cubehelix_palette(light=1))
In [4]:
import pandas as pd
pd.set_option('display.max_columns', 100)
Carleen, Sierra, and I all went to CNM to use the Cary5000. We took many repeated empty measurements to measure the baseline and variance drift. We had to stop one measurement in the middle, so it will have missing data.
| Parameter | value |
|---|---|
| $\Delta\lambda$ (nm) | 1255-1775 |
| $\delta\lambda$ (nm) | 2 |
| Sampling (nm) | 10 |
| $N$ samples | 53 |
| $N$ columns | 61 |
| $N$ baselines | 2 |
| $N$ non-baselines | 59 |
In [5]:
#! subl ../data/20150206_cary5000.csv
Here is the code when you need to parse the raw data file. It extracts the relevant column names and picks out those columns.
fn = '../data/20150206_cary5000.csv'
nrows = 53
dat = pd.read_csv(fn, sep=',', skiprows=[1], nrows=nrows, engine='python')
vals = dat.columns[0:-1:2].values
n_cols = vals.shape[0]
cols = [0] + range(1,n_cols*2, 2)
names = ['wavelength'] + vals.tolist()
df = pd.read_csv(fn, sep=',', skiprows=[0,1], na_values='',
nrows=nrows, engine='python', usecols=cols, names=names)
df.set_index('wavelength', inplace=True)
df=df/100.0
del df['empty_4_2']
df.to_csv('../data/cln_20150206_cary5000.csv')
Read in the cleaned dataset.
In [6]:
nrows = 53
df = pd.read_csv('../data/cln_20150206_cary5000.csv')
df.set_index('wavelength', inplace=True)
df.head()
Out[6]:
In [7]:
plt.figure(figsize=(12,20));
sns.heatmap(df.iloc[::1,::1], vmin=0, vmax=1.05);
plt.yticks(rotation=0);
plt.title(u'Transmission through bonded Si');
One of the columns, 'empty_4_2' was cancelled mid-way-through. Let's simply delete that one.
In [8]:
del df['empty_4_2']
Let's look at the variance of the first 15 measurements.
In [9]:
empty_cols = [col_name[0:5] == 'empty' for col_name in df.columns.values]
df[df.columns[empty_cols]].head()
Out[9]:
In [10]:
empty = df[df.columns[empty_cols]]
In [11]:
plt.plot(empty.index, empty)
plt.ylabel('$T$')
plt.xlabel('$\lambda$ (nm)')
plt.ylim(0.9, 1.1)
Out[11]:
In [12]:
n_empty = len(empty.columns)
print n_empty
t = np.arange(0, n_empty) # rank order of time
In [13]:
cmap=sns.diverging_palette(-100, 0, as_cmap=True)
sns.palplot(sns.diverging_palette(-100, 0))
In [14]:
for i in range(0, nrows, 1):
plt.scatter(t, empty.iloc[i]/empty.iloc[i, 24], c=[empty.index[i]]*n_empty, alpha=0.35, cmap=cmap, vmin=1250, vmax=1780)
plt.plot([-10, 100], [1.0, 1.0], 'k--')
plt.ylabel("$T$")
plt.xlabel("$t$ (rank order)")
plt.xlim(0, 30)
plt.ylim(0.98, 1.002)
plt.legend(loc='best')
plt.colorbar()
Out[14]:
In [15]:
for j in range(0, n_empty, 1):
#plt.scatter(empty.index, empty[[j]]/np.mean(empty[[j]]), c=[t[j]]*nrows, alpha=0.35, cmap=cmap, vmin=0, vmax=27)
plt.scatter(empty.index, empty[[j]], c=[t[j]]*nrows, alpha=1.0, cmap=cmap, vmin=0, vmax=27)
#plt.plot([-10, 100], [1.0, 1.0], 'k--')
plt.ylabel("$T$")
plt.xlabel("$\lambda$ (nm)")
plt.xlim(1250, 1780)
plt.ylim(0.98, 1.02)
#plt.legend(loc='best')
plt.colorbar()
Out[15]:
In [16]:
print df.columns.values
In [17]:
plt.errorbar(range(len(df.columns)),
df.apply(np.mean, axis=0),
yerr=df.apply(np.std, axis=0), fmt='.')
plt.ylim(0.99, 1.02)
Out[17]:
Read in the time information from the original data file.