Two dimensional data arrays are normally stored in column-major or row-major order. In row-major order adjacent elements in a row are stored next to each other in memory. In column-major order adjacent elements in a column are stored next to each other in memory. See also https://en.wikipedia.org/wiki/Matrix_representation
For the usual mathematical matrix notation $A_{ij}$, where $i$ is the row, and $j$ the column, we have in the case of a $3x4$ matrix: $$ A = \begin{bmatrix} a_{11} & a_{12} & a_{13} & a_{14}\\ a_{21} & a_{22} & a_{23} & a_{24}\\ a_{31} & a_{32} & a_{33} & a_{34}\\ \end{bmatrix} $$ Classic languages such as Fortran store their arrays in so-called column-major order. FDATA(NR,NC), and indices started at 1 with the first versions. More modern language, such a C, store their arrays in row-major order, CDATA[NR][NC], with indices starting at 0.
col major: fdata(1,1), fdata(2,1), ... first index runs fastest
row major: cdata[0][0], cdata[0][1], ... last index runs fastest
Examples of column major are: Fortran, [FITS], MatLab, IDL, R, Julia
Examples of row major are: C, Python, (java)
Images are often referred to in X and Y coordinates, like a mathematical system. The origin would be at (0,0) in the lower left corner. Image processing software normally puts the (0,0) origin at the top left corner, which corresponds a bit how the matrix above is printed. This, together with row-major and column-major can make it challenging to interchange data and plot them on the screen.
Add to this that for very large data, re-ordering axes can be a very expensive operation.
See also https://en.wikipedia.org/wiki/Iliffe_vector for another view on storing data in multi-dimensional arrays.
In [ ]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
# import pyfits as fits # deprecated
from astropy.io import fits
Get the Header-Data-Units (hdu's) from a fits file. This particular one only has 1.
In [ ]:
hdu = fits.open('../data/cube432.fits')
print(len(hdu))
In [ ]:
h = hdu[0].header
d = hdu[0].data
print(d.shape)
print(d)
This 4x3x2 matrix can actually also be generated from scratch using basic numpy:
In [ ]:
d1 = np.zeros(2*3*4).reshape(2,3,4)
for z in range(2):
for y in range(3):
for x in range(4):
d1[z,y,x] = x + 10*y + 100*z
print(d1)
In [ ]:
print(d1.flatten())
In [ ]:
# are two arrays the same (or close enough?)
np.allclose(d,d1)
We now want to take a plane from this cube, and plot this in a heatmap or contour map. We are now faced deciding how columns and rows translate to X and Y on a plot. Math, Astronomy, Geography and Image Processing groups all differ a bit how they prefer to see this, so numpy comes with a number of function to help you with this:
the important thing to realize is that they all give a new view of the array, which often is more efficient as moving the actual values.
In [ ]:
p0 = d[0,:,:]
p1 = d[1,:,:]
print(np.flipud(p0))
plt.imshow(p0)
plt.colorbar()
In [ ]:
plt.matshow(p0,origin='lower')
Note that for a small 4x3 matrix this image has been artificially made smooth by interpolating in imshow(); however you can already see that the integer coordinates are at the center of a cell: (0.0) is the center of the lower left cell. This is a little more obvious when you turn off interpolation:
In [ ]:
plt.imshow(p0,interpolation='none')
plt.colorbar()
if you want to print the array values on the terminal with 0 at the bottom left, use the np.flipup() function:
In [ ]:
print(np.flipud(p0))
plt.imshow(np.flipud(p0),interpolation='none')
plt.colorbar()
Arrays in numpy are in C-order (row-major) by default, but you can actually change it to Fortran-order (column-major):
In [ ]:
d2 = np.arange(3*4).reshape(3,4,order='C')
d3 = np.arange(3*4).reshape(3,4,order='F')
print('C\n',d2)
print('F\n',d3)
d3.transpose()
CASA is a python package used in radio astronomy (ALMA, VLA etc.), but is peculiar in the sense that it caters to astronomers with a fortran background, or mathematicians with a DATA(x,y) expectation: CASA uses column-major arrays with an index starting at 0. CASA images can also store a mask alongside the data, but the logic is the reverse from the masking used in numpy.ma: in CASA a True means a good data point, in numpy it means a bad point!
Notebooks don't work within casa (yet), but if you install casacore in your local python, the examples below should work. The kernsuite software should give you one easy option to install casacore, another way is to compile the code directly from https://github.com/casacore/casacore
Hence the example here is shown inline, and not in the notebook form yet. (note CASA currently uses python2)
casa
ia.open('../data/cube432.fits')
d1 = ia.getchunk()
d1.shape
(4,3,2)
d1[3,2,1]
123.0
print d1
[[[ 0. 100.]
[ 10. 110.]
[ 20. 120.]]
[[ 1. 101.]
[ 11. 111.]
[ 21. 121.]]
[[ 2. 102.]
[ 12. 112.]
[ 22. 122.]]
[[ 3. 103.]
[ 13. 113.]
[ 23. 123.]]]
p0 = d1[:,:,0]
print p0
[[ 0. 10. 20.]
[ 1. 11. 21.]
[ 2. 12. 22.]
[ 3. 13. 23.]]
print np.flipud(np.rot90(p0))
[[ 0. 1. 2. 3.]
[ 10. 11. 12. 13.]
[ 20. 21. 22. 23.]]
print np.flipud(np.rot90(p0)).flatten()
[ 0. 1. 2. 3. 10. 11. 12. 13. 20. 21. 22. 23.]
# mask boolean in CASA is the opposite of the one in numpy.ma
d1m = ia.getchunk(getmask=True)
print d1[0,0,0],d1m[0,0,0]
0.0 True
# or create the array from scratch
ia.fromshape(shape=[4,3,2])
p2 = ia.getchunk()
p2.shape
(4,3,2)
etc.etc.
In [ ]:
try:
import casacore.images.image as image
print("we have casacore")
im = image('../data/cube432.fits')
print(im.shape()) # -> [2, 3, 4]
print(im.datatype()) # -> 'float'
d=im.getdata()
m=im.getmask()
print(d.shape) # -> (2,3,4)
print(d[0,0,0],m[0,0,0])
"""
[[[[ 0. 1. 2. 3.]
[ 10. 11. 12. 13.]
[ 20. 21. 22. 23.]]
[[ 100. 101. 102. 103.]
[ 110. 111. 112. 113.]
[ 120. 121. 122. 123.]]
"""
except:
print("no casacore")
In [ ]:
import numpy.ma as ma
a = np.arange(4)
am = ma.masked_equal(a,2)
print(a.sum(),am.sum())
print(am.data,am.mask)
In [ ]:
%%time
n = 100
n1 = n
n2 = n+1
n3 = n+2
np.random.seed(123)
a = np.random.normal(size=n1*n2*n3).reshape(n1,n2,n3)
print(len(a.flatten()))
print(a[0,0,0])
a.flatten()[0]=-1
print(a[0,0,0]) # how come?
In [ ]:
%%time
b = a.transpose()
# note B is another view of A
In [ ]:
%%time
n = 2
m = n+1
np.random.seed(123)
a = np.random.normal(size=m*n).reshape(m,n)
x = np.random.normal(size=n)
print(x[0])
#
#a = np.arange(n*n).reshape(n,n)
#x = np.arange(n)
In [ ]:
%%time
b = np.matmul(a,x)
print(a.shape,x.shape,b.shape)
In [ ]:
%%time
b1 = np.zeros(m)
for i in range(m):
for j in range(n):
b1[i] = b1[i] + a[i,j]*x[j]
In [ ]:
%%time
b2 = np.zeros(m)
for i in range(m):
ai = a[i,:]
b2[i] = np.inner(ai,x)
In [ ]:
%%time
b3 = np.zeros(m)
for j in range(n):
for i in range(m):
b3[i] = b3[i] + a[i,j]*x[j]
In [ ]:
if n < 3:
print('a',a,'\nx',x)
print('b',b,'\nb1',b1,'\nb2',b2,'\nb3',b3)
else:
print(n)
In [ ]:
from numpy.linalg import inv
In [ ]:
n = 2
a1 = np.random.normal(size=n*n).reshape(n,n)
In [ ]:
%%time
ainv = inv(a1)
In [ ]:
print(a1)
print(ainv)
In [ ]:
i1=np.matmul(a1,ainv)
In [ ]:
i0=np.eye(n)
In [ ]:
print(np.allclose(i0,i1,atol=1e-10))
print(i1)
The last execution should be In[28]
In [ ]: