If you're using anaconda, you probably already have most (if not all) of these installed. If you installed miniconda:
conda install numpy
Conda also has channels which allows anybody to distribute their own conda packages. There is an "astropy" channel for AstroPy affiliated packages. You can do:
conda install -c astropy astroml
To check if a package is available on conda:
conda search numpy
Many smaller packages are not available via the conda
package manager. For these, use pip
:
pip install --no-deps corner
conda is an actual package manager that will take care of resolving dependencies optimally.
In [1]:
from __future__ import print_function
import math
import numpy as np
If you use Python for any amount of time, you'll quickly find that there are some things it is not so good at. In particular, performing repeated operations via loops is one of its weaknesses.
For example, in pure Python:
In [2]:
def add_one(x):
return [xi + 1 for xi in x]
In [3]:
x = list(range(1000000))
%timeit add_one(x)
Using numpy we would do:
In [4]:
x = np.arange(1000000)
%timeit np.add(x, 1)
In [5]:
# Point coordinates
x = np.random.rand(100000)
y = np.random.rand(100000)
In [7]:
%%timeit
dist = np.empty(len(x))
for i in range(len(x)):
dist[i] = math.sqrt(x[i]**2 + y[i]**2)
In [9]:
%%timeit
dist = np.sqrt(x**2 + y**2)
Aside: How many arrays are created in the above cell?
Sometimes you have to get a little creative to "vectorize" things:
In [12]:
x = np.arange(10)**2
x
Out[12]:
In [14]:
# difference between adjacent elements
x[1:] - x[:-1]
Out[14]:
In [16]:
# by the way, this is basically the implementation of `np.ediff1d`
np.ediff1d??
In [17]:
x = np.arange(5)
y = np.arange(1, 6)
x + y
Out[17]:
All operators (like +
) actuall call an underlying numpy function: in this case np.add
:
In [18]:
np.add(x, y)
Out[18]:
These ufuncs have some interesting and useful properties:
In [19]:
np.add.accumulate(x)
Out[19]:
In [20]:
np.multiply.accumulate(x)
Out[20]:
In [21]:
np.multiply.accumulate(y)
Out[21]:
In [24]:
x
Out[24]:
In [28]:
np.maximum.accumulate(x[::-1])
Out[28]:
In [29]:
np.multiply.identity
Out[29]:
In [30]:
np.add.outer(x, x)
Out[30]:
In [31]:
np.multiply.outer(x, x)
Out[31]:
In [32]:
z = np.arange(10, dtype=np.float64).reshape((2, 5))
z
Out[32]:
In [40]:
np.sum(z, axis=0)
Out[40]:
In [34]:
# alternate spelling:
z.sum()
Out[34]:
In [36]:
np.mean(z)
Out[36]:
In [38]:
np.min(z), np.max(z)
Out[38]:
In [44]:
np.sum(z, axis=0)
Out[44]:
In [41]:
# could also use ufunc
np.add.reduce(z, axis=0)
Out[41]:
In [45]:
x = np.arange(15)
x
Out[45]:
In [46]:
x[0:5]
Out[46]:
In [47]:
x[0:10:2] # with a stride
Out[47]:
In [48]:
x[10:0:-2] # reversed
Out[48]:
This sort of indexing does not make a copy:
In [49]:
y = x[0:10:2]
y[0] = 100. # modify y
y
Out[49]:
In [50]:
# x is modified:
x
Out[50]:
In [51]:
x = np.arange(15)
y = x[[1, 2, 4]]
y
Out[51]:
In [52]:
y[0] = 100
y
Out[52]:
In [53]:
# x is not modified
x
Out[53]:
In [54]:
x = np.arange(5)
x
Out[54]:
In [55]:
mask = np.array([True, True, False, True, False])
x[mask]
Out[55]:
In [56]:
# creates a copy
y = x[mask]
y[0] = 100.
print(y)
print(x)
How do you remember which type of indexing creates a copy?
NumPy arrays are stored as a chunk of data and a set of strides in each dimension. Boolean and arbitrary indicies cannot be represented this way, so numpy must make a copy.
In [57]:
x = np.array([1, 2, 3, -999, 2, 4, -999])
How might you clean this array, setting all negative values to, say, zero?
In [58]:
for i in range(len(x)):
if x[i] < 0:
x[i] = 0
x
Out[58]:
In [59]:
x = np.array([1, 2, 3, -999, 2, 4, -999])
mask = (x < 0)
mask
Out[59]:
And the mask can be used directly to set the value you desire:
In [60]:
x[mask] = 0
x
Out[60]:
Often you'll see this done in a separate step:
In [62]:
x = np.array([1, 2, 3, -999, 2, 4, -999])
x[x < 0] = 0
x
Out[62]:
In [68]:
# additional boolean operations: invert
x = np.array([1, 2, 3, -999, 2, 4, -999])
x[~(x < 0)] = 0
x
Out[68]:
In [71]:
x = np.array([1, 2, 3, -999, 2, 4, -999])
x[(x < 0) | (x > 3)] = 0
x
Out[71]:
In [72]:
x = np.arange(4)
x
Out[72]:
In [73]:
x + 3
Out[73]:
In [74]:
x = np.array([[0, 0, 0],
[10, 10, 10],
[20, 20, 20],
[30, 30, 30]])
y = np.array([0, 1, 2])
print("x shape:", x.shape)
print("y shape: ", y.shape)
In [75]:
# If x and y are different dimensions, shape is padded on left with 1s
# before broadcasting.
x + y
Out[75]:
In [76]:
x = np.array([[0],
[10],
[20],
[30]])
y = np.array([0, 1, 2])
print("x shape:", x.shape)
print("y shape: ", y.shape)
In [81]:
x + y
Out[81]:
Broadcasting rules:
If the two arrays differ in their number of dimensions, the shape of the array with fewer dimensions is padded with ones on its leading (left) side.
If the shape of the two arrays does not match in any dimension, the array with shape equal to 1 in that dimension is stretched to match the other shape.
If in any dimension the sizes disagree and neither is equal to 1, an error is raised.
Assume you have $N$ points in $D$ dimensions, represented by an array of shape (N, D)
, where there are some mising values scattered throughout the points.
Count the number of points (rows) with no missing values, using np.any
or np.all
.
Clean the array of the points with missing values.
Construct the matrix M
, the centered and normalized version of the X
array: $$ M_{ij} = (X_{ij} - \mu_j) / \sigma_j $$ using np.mean
and np.std
. This is one version of whitening the array.
In [83]:
np.random.seed(0)
X = np.random.rand(5000)
X[np.random.randint(0, 5000, size=500)] = np.nan # ~10% missing
X = X.reshape((1000, 5)) # 1000 points in 5 dimensions
In [112]:
# 1. Compute the number of points (rows) with no missing values, using `np.any` or `np.all`.
mask = np.all(~np.isnan(X), axis=1)
mask.sum()
Out[112]:
In [96]:
# 2. Clean the array, leaving only rows with no missing values
Y = X[mask]
Y.shape
Out[96]:
In [99]:
# 3. Compute the whitened version of the array using np.mean and np.std.
(Y - np.mean(Y, axis=0)) / np.std(Y, axis=0)
Out[99]:
In [ ]:
print(np.random.__doc__)
In [101]:
# contents of scipy:
import scipy
print(scipy.__doc__)
Note the overlap:
numpy.fft
/ scipy.fft
numpy.linalg
/ scipy.linalg
Why the duplication? The scipy routines are based on Fortran libraries, whereas numpy is C-only.
Project started in 2011, in response to increasing duplication in Python astronomy ecosystem.
Initially brought together several existing Python packages:
astropy.io.fits
(formerly pyfits
)astropy.io.ascii
(formerly asciitable
)astropy.wcs
(formerly pywcs
)astropy.cosmology
(formerly cosmolopy
)Now also contains:
astropy.table
(Table class and table operations)astropy.units
(Quantity: arrays with units)astropy.coordinates
(astronomical coordinate systems)astropy.time
(UTC, UT, MJD, etc)astropy.stats
(additional stats not in scipy)astropy.modeling
(simple model fitting)astropy.vo
(virtual observatory)astropy.io.votable
astropy.analytic_functions
In [102]:
from astropy import coordinates as coords
from astropy import units as u
In [103]:
ra = 360. * np.random.rand(100)
dec = -90. + 180. * np.random.rand(100)
print("RA:", ra[:5], "...")
print("Dec:", dec[:5], "...")
In [104]:
c = coords.SkyCoord(ra, dec, unit=(u.deg, u.deg))
c
Out[104]:
In [105]:
# convert to galactic
g = c.galactic
g
Out[105]:
In [106]:
# access longitude or latitude
g.l
Out[106]:
In [107]:
type(g.l)
Out[107]:
In [108]:
# get underlying numpy array
g.l.degree
Out[108]:
In [ ]: