In [ ]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
%matplotlib inline
The previous import code requires that you have pandas, numpy and matplotlib installed. If you are using conda
you already have all of this libraries installed. Otherwise, use pip
to install them. The magic
command %matplotlib inline
loads the required variables and tools needed to embed matplotlib figures in a ipython notebook.
Plot.ly is a cloud based visualization tool, which has a mature python API. It is very useful to create profesional looking and interactive plots, that are shared publicly on the cloud; so be careful on publishing only data that you want (and can) share.
Installing plot.ly is done easily with pip
or conda
, but it requires you to create an account and then require a API token. If you don't want to install it, you can jump this section.
In [ ]:
import plotly.tools as tls
import plotly.plotly as py
import cufflinks as cf
import plotly
plotly.offline.init_notebook_mode()
cf.offline.go_offline()
In [ ]:
df = pd.read_csv('data_files/baseline_channels_phase.txt', sep=' ')
df
is an instance of the pandas object (data structure) pandas.DataFrame. A DataFrame instance has several methods (functions) to operate over the object. For example, is easy to display the data for a first exploration of what it contains using .head()
In [ ]:
df.head()
A DataFrame can be converted into a numpy array by using the method .values
:
In [ ]:
df.values
For numpy expert, you have also methods to access the data using the numpy standards. If you want to extract the data at the coordinate (0,1) you can do:
In [ ]:
df.iloc[0,1]
But also you can use the column names and index keys, to substract, for example, the name of the first antenna in a baseline pair from row 3:
In [ ]:
df.ix[3, 'ant1name']
DataFrame are objects containgin tabular data, that can be grouped by columns and then used to aggreate data. Let's say you want to obtaing the mean frequency for the baselines and the number of channels used:
In [ ]:
data_group = df.groupby(['ant1name', 'ant2name'])
df2 = data_group.agg({'freq': np.mean, 'chan': np.count_nonzero}).reset_index()
df2.head()
In [ ]:
data_raw = df.groupby(['ant1name', 'ant2name', 'chan']).y.mean()
data_raw.head(30)
In [ ]:
data_raw.unstack().head(20)
In [ ]:
pd.options.display.max_columns = 200
data_raw.unstack().head(20)
In [ ]:
data_raw = data_raw.unstack().reset_index()
data_raw.head()
In [ ]:
data_raw.to_excel('test.xls', index=False)
In [ ]:
todegclean = np.degrees(np.arcsin(np.sin(np.radians(data_raw.iloc[:,2:]))))
In [ ]:
todegclean.head()
In [ ]:
todegclean['mean'] = todegclean.mean(axis=1)
In [ ]:
todegclean.head()
In [ ]:
data_clean = todegclean.iloc[:,:-1].apply(lambda x: x - todegclean.iloc[:,-1])
data_clean.head(20)
In [ ]:
data_ready = pd.merge(data_raw[['ant1name', 'ant2name']], todegclean, left_index=True, right_index=True)
data_ready.head()
In [ ]:
data_clean2 = data_clean.unstack().reset_index().copy()
In [ ]:
data_clean2.query('100 < level_1 < 200')
In [ ]:
data_clean2.query('100 < level_1 < 200').iplot(kind='scatter3d', x='chan', y='level_1', mode='markers', z=0, size=6,
title='Phase BL', filename='phase_test', width=1, opacity=0.8, colors='blue', symbol='circle',
layout={'scene': {'aspectratio': {'x': 1, 'y': 3, 'z': 0.7}}})
In [ ]:
ploting = data_clean2.query('100 < level_1 < 200').figure(kind='scatter3d', x='chan', y='level_1', mode='markers', z=0, size=6,
title='Phase BL', filename='phase_test', width=1, opacity=0.8, colors='blue', symbol='circle',
layout={'scene': {'aspectratio': {'x': 1, 'y': 3, 'z': 0.7}}})
In [ ]:
# ploting
In [ ]:
ploting.data[0]['marker']['color'] = 'blue'
ploting.data[0]['marker']['line'] = {'color': 'blue', 'width': 0.5}
ploting.data[0]['marker']['opacity'] = 0.5
In [ ]:
plotly.offline.iplot(ploting)
In [ ]:
fig=plt.figure()
ax=fig.gca(projection='3d')
X = np.arange(0, data_clean.shape[1],1)
Y = np.arange(0, data_clean.shape[0],1)
X, Y = np.meshgrid(X,Y)
surf = ax.scatter(X, Y, data_clean, '.', c=data_clean,s=2,lw=0,cmap='winter')
In [ ]:
%matplotlib notebook
In [ ]:
fig=plt.figure()
ax=fig.gca(projection='3d')
X = np.arange(0, data_clean.shape[1],1)
Y = np.arange(0, data_clean.shape[0],1)
X, Y = np.meshgrid(X,Y)
surf = ax.scatter(X, Y, data_clean, '.', c=data_clean,s=2,lw=0,cmap='winter')
In [ ]:
data_clean2.plot(kind='scatter', x='chan', y=0)
In [ ]:
import seaborn as sns
In [ ]:
data_clean2.plot(kind='scatter', x='level_1', y=0)
In [ ]:
data_ready['noise'] = todegclean.iloc[:,2:].std(axis=1)
In [ ]:
data_ready[['ant1name', 'ant2name', 'noise']].head(10)
In [ ]:
corr = data_ready[['ant1name', 'ant2name', 'noise']].pivot_table(index=['ant1name'], columns=['ant2name'])
In [ ]:
corr.columns.levels[1]
In [ ]:
corr2 = pd.DataFrame(corr.values, index=corr.index.values, columns=corr.columns.levels[1].values)
In [ ]:
corr2.head(10)
In [ ]:
f, ax = plt.subplots(figsize=(11, 9))
cmap = sns.diverging_palette(220, 10, as_cmap=True)
sns.heatmap(corr2, cmap=cmap,
square=True, xticklabels=5, yticklabels=5,
linewidths=.5, cbar_kws={"shrink": .5}, ax=ax)
In [ ]:
?sns.heatmap