Python is a interpreted, high-level programming language that is meant to be easily understandable and usable for a multitude of purposes. It is composed of libraries that contain useful tools for you to do quick and efficient data analysis and visualization. These libraries are like Lego blocks - you can pick and choose which ones you want to build your end product. The Scientific Python Ecosystem is composed of key libraries (i.e. NumPy, SciPy, Pandas, Matplotlib) that serve as a basis for most other libraries (i.e. MetPy). In this notebook, we'll briefly touch on several of these foundational libraries of the SciPy Ecosystem.
Anaconda provides distributions of Python and the main third-party packages either as a full distribution or as a lighter-weight verison, "Miniconda". We recommend using Anaconda to build and maintain your Python stack, as it provides command line tools to download and update Python libraries. You can check it out at https://www.anaconda.com/distribution/.
The Jupyter library provides "literate programming" interfaces for Python and other programming languages. This file is displayed using the Jupyter library, either within Jupyter Notebook or Lab. It incorporates code, prose, and other text (equations, HTML) to make a seamless document for your analysis or presentation by working in small blocks. This also allows for quick prototyping and debugging of code as you write!
While Python is the basis for everything, this figure demonstrates how packages build on top of each other (causing dependencies). Additionally, packages are constantly under development, so this structure does have some transient nature to it, as the SciPy world continue to expand (see Dask as a recent addition to this framework).
In [ ]:
import numpy as np
x = np.arange(1,11)
y = np.arange(100,110)
mean_x_y = np.mean([x,y])
print(mean_x_y)
In [ ]:
import pandas as pd
df = pd.read_csv('../Pandas/Jan17_CO_ASOS.txt', sep='\t')
df.head()
In [ ]:
import xarray as xr
ds = xr.open_dataset('../../data/NARR_19930313_0000.nc')
ds
Dask is a parallel-computing library in Python. You can use it on your laptop, cloud environment, or on a high-performance computer (NCAR's Cheyenne for example). It allows for lazy evaluations so that computations only occur after you've chained all of your operations together. Additionally, it has a built-in scheduler to scale with your computational demand to optimize your parellel resources.
In [ ]:
import matplotlib.pyplot as plt
plt.plot(x,y)
plt.title('Demo of Matplotlib')
plt.show()
For more information on the SciPy Ecosystem, check out these links: https://www.scipy.org/about.html and https://scipy-lectures.org/intro/intro.html