Python Coffee, November 5, 2015

Import required libraries


In [ ]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

%matplotlib inline

The previous import code requires that you have pandas, numpy and matplotlib installed. If you are using conda you already have all of this libraries installed. Otherwise, use pip to install them. The magic command %matplotlib inline loads the required variables and tools needed to embed matplotlib figures in a ipython notebook.

Import optional libraries to use plotly.

Plot.ly is a cloud based visualization tool, which has a mature python API. It is very useful to create profesional looking and interactive plots, that are shared publicly on the cloud; so be careful on publishing only data that you want (and can) share.

Installing plot.ly is done easily with pip or conda, but it requires you to create an account and then require a API token. If you don't want to install it, you can jump this section.


In [ ]:
import plotly.tools as tls
import plotly.plotly as py
import cufflinks as cf
import plotly
plotly.offline.init_notebook_mode()
cf.offline.go_offline()

Import data file with pandas


In [ ]:
df = pd.read_csv('data_files/baseline_channels_phase.txt', sep=' ')

df is an instance of the pandas object (data structure) pandas.DataFrame. A DataFrame instance has several methods (functions) to operate over the object. For example, is easy to display the data for a first exploration of what it contains using .head()


In [ ]:
df.head()

A DataFrame can be converted into a numpy array by using the method .values:


In [ ]:
df.values

For numpy expert, you have also methods to access the data using the numpy standards. If you want to extract the data at the coordinate (0,1) you can do:


In [ ]:
df.iloc[0,1]

But also you can use the column names and index keys, to substract, for example, the name of the first antenna in a baseline pair from row 3:


In [ ]:
df.ix[3, 'ant1name']

DataFrame are objects containgin tabular data, that can be grouped by columns and then used to aggreate data. Let's say you want to obtaing the mean frequency for the baselines and the number of channels used:


In [ ]:
data_group = df.groupby(['ant1name', 'ant2name'])
df2 = data_group.agg({'freq': np.mean, 'chan': np.count_nonzero}).reset_index()
df2.head()

In [ ]:
data_raw = df.groupby(['ant1name', 'ant2name', 'chan']).y.mean()
data_raw.head(30)

In [ ]:
data_raw.unstack().head(20)

In [ ]:
pd.options.display.max_columns = 200
data_raw.unstack().head(20)

In [ ]:
data_raw = data_raw.unstack().reset_index()
data_raw.head()

In [ ]:
data_raw.to_excel('test.xls', index=False)

In [ ]:
todegclean = np.degrees(np.arcsin(np.sin(np.radians(data_raw.iloc[:,2:]))))

In [ ]:
todegclean.head()

In [ ]:
todegclean['mean'] = todegclean.mean(axis=1)

In [ ]:
todegclean.head()

In [ ]:
data_clean = todegclean.iloc[:,:-1].apply(lambda x: x - todegclean.iloc[:,-1])
data_clean.head(20)

In [ ]:
data_ready = pd.merge(data_raw[['ant1name', 'ant2name']], todegclean, left_index=True, right_index=True)
data_ready.head()

Plot.ly


In [ ]:
data_clean2 = data_clean.unstack().reset_index().copy()

In [ ]:
data_clean2.query('100 < level_1 < 200')

In [ ]:
data_clean2.query('100 < level_1 < 200').iplot(kind='scatter3d', x='chan', y='level_1', mode='markers', z=0, size=6, 
                  title='Phase BL', filename='phase_test', width=1, opacity=0.8, colors='blue', symbol='circle',
                  layout={'scene': {'aspectratio': {'x': 1, 'y': 3, 'z': 0.7}}})

In [ ]:
ploting = data_clean2.query('100 < level_1 < 200').figure(kind='scatter3d', x='chan', y='level_1', mode='markers', z=0, size=6, 
                  title='Phase BL', filename='phase_test', width=1, opacity=0.8, colors='blue', symbol='circle',
                  layout={'scene': {'aspectratio': {'x': 1, 'y': 3, 'z': 0.7}}})

In [ ]:
# ploting

In [ ]:
ploting.data[0]['marker']['color'] = 'blue'
ploting.data[0]['marker']['line'] = {'color': 'blue', 'width': 0.5}
ploting.data[0]['marker']['opacity'] = 0.5

In [ ]:
plotly.offline.iplot(ploting)

Matplotlib


In [ ]:
fig=plt.figure()
ax=fig.gca(projection='3d')

X = np.arange(0, data_clean.shape[1],1)
Y = np.arange(0, data_clean.shape[0],1)

X, Y = np.meshgrid(X,Y)

surf = ax.scatter(X, Y, data_clean, '.', c=data_clean,s=2,lw=0,cmap='winter')

In [ ]:
%matplotlib notebook

In [ ]:
fig=plt.figure()
ax=fig.gca(projection='3d')

X = np.arange(0, data_clean.shape[1],1)
Y = np.arange(0, data_clean.shape[0],1)

X, Y = np.meshgrid(X,Y)

surf = ax.scatter(X, Y, data_clean, '.', c=data_clean,s=2,lw=0,cmap='winter')

In [ ]:
data_clean2.plot(kind='scatter', x='chan', y=0)

In [ ]:
import seaborn as sns

In [ ]:
data_clean2.plot(kind='scatter', x='level_1', y=0)

In [ ]:
data_ready['noise'] = todegclean.iloc[:,2:].std(axis=1)

In [ ]:
data_ready[['ant1name', 'ant2name', 'noise']].head(10)

In [ ]:
corr = data_ready[['ant1name', 'ant2name', 'noise']].pivot_table(index=['ant1name'], columns=['ant2name'])

In [ ]:
corr.columns.levels[1]

In [ ]:
corr2 = pd.DataFrame(corr.values, index=corr.index.values, columns=corr.columns.levels[1].values)

In [ ]:
corr2.head(10)

In [ ]:
f, ax = plt.subplots(figsize=(11, 9))
cmap = sns.diverging_palette(220, 10, as_cmap=True)
sns.heatmap(corr2, cmap=cmap,
            square=True, xticklabels=5, yticklabels=5,
            linewidths=.5, cbar_kws={"shrink": .5}, ax=ax)

In [ ]:
?sns.heatmap