Python has a very strong community in the data analytics and scientific computing world. There are a lot of great Python packages to support different analyses, but there are a few very key packages:
You will have access to all of these after installing Anaconda and installing the additional packages described in Session 0. (The additional packages relate to spatial analysis - you can skip them if you don't need them)
Where possible, veneer-py functions will accept and return objects that are directly usable by these packages. In particular, time series and other tabular data structures are returned as pandas DataFrame objects.
This session gives very brief introductions to most of these packages. In most cases, the links in Session 0 are relevant for more information.
numpy represents multi-dimensional arrays and operations on those arrays. The arrays are typed (eg float, double precision float, integer, etc) and are indexed by integers (one per dimension).
In veneer-py, we use pandas Data Frames more than numpy arrays, but the basics of the array operations in numpy are the foundations on which pandas is built.
You can create an array of random numbers using functions under the np.random
namespace. The following example creates 100 random floats using a normal distribution
Note: numpy is typically imported as np
.
In [3]:
import numpy as np
random = np.random.normal(size=100)
random
Out[3]:
The functions in np.random
return one dimensional arrays. You can check this with .shape
and change it with .reshape()
In [4]:
random.shape
Out[4]:
In [6]:
threed = random.reshape(10,5,2)
threed
Out[6]:
You can perform basic arithmetic on arrays, using scalars or other arrays.
For example, given the following two arrays
In [7]:
a1 = np.array([20.0,12.0,77.0,77.0])
a2 = np.array([25.0,6.0,80.0,80.0])
In [8]:
# You can add:
a1 + a2
Out[8]:
In [9]:
# Multiply (element wise):
a1 * a2
Out[9]:
In [10]:
# Compute a dot product
a1.dot(a2)
Out[10]:
In [17]:
# You can also perform matrix operations
# First tell numpy that your array is a matrix,
# Then transpose to compatible shapes
# Then multiply
np.matrix(a1).transpose() * np.matrix(a2)
Out[17]:
In [ ]:
Pandas DataFrame objects are one of the key data types used in veneer-py.
A DataFrame is a tabular, two-dimensional data structure, which can be indexed in a range of ways, including a date and date/time index. DataFrames are aranged in named columns, each with a particular type (eg double, string, integer) and in this sense they are more flexible than numpy arrays.
Each column in a DataFrame is a pandas Series, which is useful in its own right.
In [20]:
import veneer
v = veneer.Veneer(port=9876)
downstream_flow_vol = v.retrieve_multiple_time_series(criteria={'RecordingVariable':'Downstream Flow Volume'})
Pandas DataFrames have a tabular presentation in the Jupyter notebook.
It's also possible to slice subsets of rows
In [22]:
downstream_flow_vol[0:10] # <-- Look at first 10 rows (timesteps)
Out[22]:
In [27]:
downstream_flow_vol[0::3000] # <-- Look at every 3000th timestep
Out[27]:
You can quickly get stats for each column in a DataFrame
In [29]:
downstream_flow_vol.mean()
Out[29]:
You can get the same stats along rows:
In [31]:
downstream_flow_vol.mean(axis=1)
Out[31]:
In [ ]:
It is worth spending some time exploring the capabilities of the Jupyter notebook.
In terms of managing your work:
At this stage, most visualisation in Python notebooks is handled by matplotlib.
Matplotlib is powerful, but the learning curve can be steep.
In [38]:
import matplotlib.pyplot as plt
%matplotlib inline
Typically, you'll create a single plot from a single cell
In [40]:
plt.hist(np.random.normal(size=500))
Out[40]:
... But the matplotlib subplots functionality allows you to create matrices of plots.
In [51]:
methods=[np.random.uniform,np.random.normal,np.random.exponential]
n=len(methods)
# Create n sets of random numbers, where n is the number of methods specified
random_sets = [method(size=1000) for method in methods]
for i in range(n):
# Arrange subplots 2 rows x 3 columns
# Access the i'th column on the first row
ax = plt.subplot(2,3,i+1)
# Plot the random numbers
ax.plot(random_sets[i])
# Access the i'th column on the second row
ax = plt.subplot(2,3,n+i+1)
# Plot a histogram of the corresponding numbers
ax.hist(random_sets[i])
In [34]:
In [45]:
In [ ]: