NDArray Tutorial

In MXNet, NDArray is the core datastructure for all mathematical computations. An NDArray represents a multidimensional, fixed-size homogenous array. If you're familiar with the scientific computing python package NumPy, you might notice that mxnet.ndarray is similar to numpy.ndarray. Like the corresponding NumPy data structure, MXNet's NDArray enables imperative computation.

So you might wonder, why not just use NumPy? MXNet offers two compelling advantages. First, MXNet's NDArray supports fast execution on a wide range of hardware configurations, including CPU, GPU, and multi-GPU machines. MXNet also scales to distribute systems in the cloud. Second, MXNet's NDArray executes code lazily, allowing it to automatically parallelize multiple operations across the available hardware.

The basics

An NDArray is a multidimensional array of numbers with the same type. We could represent the coordinates of a point in 3D space, e.g. [2, 1, 6] as a 1D array with shape (3). Similarly, we could represent a 2D array. Below, we present an array with length 2 along the first axis and length 3 along the second axis.

[[0, 1, 2]
 [3, 4, 5]]

Note that here the use of "dimension" is overloaded. When we say a 2D array, we mean an array with 2 axes, not an array with two components.

Each NDArray supports some important attributes that you'll often want to query:

ndarray.shape: The dimensions of the array. It is a tuple of integers indicating the length of the array along each axis. For a matrix with n rows and m columns, its shape will be (n, m).
ndarray.dtype: A numpy type object describing the type of its elements.
ndarray.size: the total number of components in the array - equal to the product of the components of its shape
ndarray.context: The device on which this array is stored, e.g. cpu() or gpu(1).

Array Creation

There are a few different ways to create an NDArray.

We can create an NDArray from a regular Python list or tuple by using the array function:



In [1]:

    
import mxnet as mx
# create a 1-dimensional array with a python list
a = mx.nd.array([1,2,3])
# create a 2-dimensional array with a nested python list 
b = mx.nd.array([[1,2,3], [2,3,4]])
{'a.shape':a.shape, 'b.shape':b.shape}









    Out[1]:





{'a.shape': (3,), 'b.shape': (2, 3)}

We can also create an MXNet NDArray from an numpy.ndarray object:



In [2]:

    
import numpy as np
import math
c = np.arange(15).reshape(3,5)
# create a 2-dimensional array from a numpy.ndarray object
a = mx.nd.array(c)
{'a.shape':a.shape}









    Out[2]:





{'a.shape': (3, 5)}

We can specify the element type with the option dtype, which accepts a numpy type. By default, float32 is used:



In [3]:

    
# float32 is used in deafult
a = mx.nd.array([1,2,3])
# create an int32 array
b = mx.nd.array([1,2,3], dtype=np.int32)
# create a 16-bit float array
c = mx.nd.array([1.2, 2.3], dtype=np.float16)
(a.dtype, b.dtype, c.dtype)









    Out[3]:





(numpy.float32, numpy.int32, numpy.float16)

If we know the size of the desired NDArray, but not the element values, MXNet offers several functions to create arrays with placeholder content:



In [4]:

    
# create a 2-dimensional array full of zeros with shape (2,3) 
a = mx.nd.zeros((2,3))
# create a same shape array full of ones
b = mx.nd.ones((2,3))
# create a same shape array with all elements set to 7
c = mx.nd.full((2,3), 7)
# create a same shape whose initial content is random and 
# depends on the state of the memory
d = mx.nd.empty((2,3))

Printing Arrays

When inspecting the contents of an NDArray, it's often convenient to first extract its contents as a numpy.ndarray using the asnumpy function. Numpy uses the following layout:

The last axis is printed from left to right,
The second-to-last is printed from top to bottom,
The rest are also printed from top to bottom, with each slice separated from the next by an empty line.



In [5]:

    
b = mx.nd.arange(18).reshape((3,2,3))
b.asnumpy()









    Out[5]:





array([[[  0.,   1.,   2.],
        [  3.,   4.,   5.]],

       [[  6.,   7.,   8.],
        [  9.,  10.,  11.]],

       [[ 12.,  13.,  14.],
        [ 15.,  16.,  17.]]], dtype=float32)

Basic Operations

When applied to NDArrays, the standard arithmetic operators apply elementwise calculations. The returned value is a new array whose content contains the result.



In [6]:

    
a = mx.nd.ones((2,3))
b = mx.nd.ones((2,3))
# elementwise plus
c = a + b
# elementwise minus
d = - c 
# elementwise pow and sin, and then transpose
e = mx.nd.sin(c**2).T
# elementwise max
f = mx.nd.maximum(a, c)  
f.asnumpy()









    Out[6]:





array([[ 2.,  2.,  2.],
       [ 2.,  2.,  2.]], dtype=float32)

As in NumPy, * represents element-wise multiplication. For matrix-matrix multiplication, use dot.



In [7]:

    
a = mx.nd.arange(4).reshape((2,2))
b = a * a
c = mx.nd.dot(a,a)
print("b: %s, \n c: %s" % (b.asnumpy(), c.asnumpy()))









    



b: [[ 0.  1.]
 [ 4.  9.]], 
 c: [[  2.   3.]
 [  6.  11.]]

The assignment operators such as += and *= modify arrays in place, and thus don't allocate new memory to create a new array.



In [8]:

    
a = mx.nd.ones((2,2))
b = mx.nd.ones(a.shape)
b += a
b.asnumpy()









    Out[8]:





array([[ 2.,  2.],
       [ 2.,  2.]], dtype=float32)

Indexing and Slicing

The slice operator [] applies on axis 0.



In [9]:

    
a = mx.nd.array(np.arange(6).reshape(3,2))
a[1:2] = 1
a[:].asnumpy()









    Out[9]:





array([[ 0.,  1.],
       [ 1.,  1.],
       [ 4.,  5.]], dtype=float32)

We can also slice a particular axis with the method slice_axis



In [10]:

    
d = mx.nd.slice_axis(a, axis=1, begin=1, end=2)
d.asnumpy()









    Out[10]:





array([[ 1.],
       [ 1.],
       [ 5.]], dtype=float32)

Shape Manipulation

Using reshape, we can manipulate any arrays shape as long as the size remains unchanged.



In [11]:

    
a = mx.nd.array(np.arange(24))
b = a.reshape((2,3,4))
b.asnumpy()









    Out[11]:





array([[[  0.,   1.,   2.,   3.],
        [  4.,   5.,   6.,   7.],
        [  8.,   9.,  10.,  11.]],

       [[ 12.,  13.,  14.,  15.],
        [ 16.,  17.,  18.,  19.],
        [ 20.,  21.,  22.,  23.]]], dtype=float32)

The concatenate method stacks multiple arrays along the first axis. (Their shapes must be the same along the other axes).



In [12]:

    
a = mx.nd.ones((2,3))
b = mx.nd.ones((2,3))*2
c = mx.nd.concatenate([a,b])
c.asnumpy()









    Out[12]:





array([[ 1.,  1.,  1.],
       [ 1.,  1.,  1.],
       [ 2.,  2.,  2.],
       [ 2.,  2.,  2.]], dtype=float32)

Reduce

Some functions, like sum and mean reduce arrays to scalars.



In [13]:

    
a = mx.nd.ones((2,3))
b = mx.nd.sum(a)
b.asnumpy()









    Out[13]:





array([ 6.], dtype=float32)

We can also reduce an array along a particular axis:



In [14]:

    
c = mx.nd.sum_axis(a, axis=1)
c.asnumpy()









    Out[14]:





array([ 3.,  3.], dtype=float32)

Broadcast

We can also broadcast an array. Broadcasting operations, duplicate an array's value along an axis with length 1. The following code broadcasts along axis 1:



In [15]:

    
a = mx.nd.array(np.arange(6).reshape(6,1))
b = a.broadcast_to((6,4))  # 
b.asnumpy()









    Out[15]:





array([[ 0.,  0.,  0.,  0.],
       [ 1.,  1.,  1.,  1.],
       [ 2.,  2.,  2.,  2.],
       [ 3.,  3.,  3.,  3.],
       [ 4.,  4.,  4.,  4.],
       [ 5.,  5.,  5.,  5.]], dtype=float32)

It's possible to simultaneously broadcast along multiple axes. In the following example, we broadcast along axes 1 and 2:



In [16]:

    
c = a.reshape((2,1,1,3))
d = c.broadcast_to((2,2,2,3))
d.asnumpy()









    Out[16]:





array([[[[ 0.,  1.,  2.],
         [ 0.,  1.,  2.]],

        [[ 0.,  1.,  2.],
         [ 0.,  1.,  2.]]],


       [[[ 3.,  4.,  5.],
         [ 3.,  4.,  5.]],

        [[ 3.,  4.,  5.],
         [ 3.,  4.,  5.]]]], dtype=float32)

Broadcasting can be applied automatically when executing some operations, e.g. * and + on arrays of different shapes.



In [17]:

    
a = mx.nd.ones((3,2))
b = mx.nd.ones((1,2))
c = a + b
c.asnumpy()









    Out[17]:





array([[ 2.,  2.],
       [ 2.,  2.],
       [ 2.,  2.]], dtype=float32)

Copies

When assigning an NDArray to another Python variable, we copy a reference to the same NDArray. However, we often need to maek a copy of the data, so that we can manipulate the new array without overwriting the original values.



In [18]:

    
a = mx.nd.ones((2,2))
b = a  
b is a









    Out[18]:





True

The copy method makes a deep copy of the array and its data:



In [19]:

    
b = a.copy()
b is a









    Out[19]:





False

The above code allocates a new NDArray and then assigns to b. When we do not want to allocate additional memory, we can use the copyto method or the slice operator [] instead.



In [20]:

    
b = mx.nd.ones(a.shape)
c = b
c[:] = a
d = b
a.copyto(d)
(c is b, d is b)









    Out[20]:





(True, True)

Advanced Topics

MXNet's NDArray offers some advanced features that differentiate it from the offerings you'll find in most other libraries.

GPU Support

By default, NDArray operators are executed on CPU. But with MXNet, it's easy to switch to another computation resource, such as GPU, when available. Each NDArray's device information is stored in ndarray.context. When MXNet is compiled with flag USE_CUDA=1 and the machine has at least one NVIDIA GPU, we can cause all computations to run on GPU 0 by using context mx.gpu(0), or simply mx.gpu(). When we have access to two or more GPUs, the 2nd GPU is represented by mx.gpu(1), etc.



In [21]:

    
def f():
    a = mx.nd.ones((100,100))
    b = mx.nd.ones((100,100))
    c = a + b
    print(c)
# in default mx.cpu() is used
f()  
# change the default context to the first GPU
with mx.Context(mx.gpu()):  
    f()









    



[[ 2.  2.  2. ...,  2.  2.  2.]
 [ 2.  2.  2. ...,  2.  2.  2.]
 [ 2.  2.  2. ...,  2.  2.  2.]
 ..., 
 [ 2.  2.  2. ...,  2.  2.  2.]
 [ 2.  2.  2. ...,  2.  2.  2.]
 [ 2.  2.  2. ...,  2.  2.  2.]]
<NDArray 100x100 @cpu(0)>

[[ 2.  2.  2. ...,  2.  2.  2.]
 [ 2.  2.  2. ...,  2.  2.  2.]
 [ 2.  2.  2. ...,  2.  2.  2.]
 ..., 
 [ 2.  2.  2. ...,  2.  2.  2.]
 [ 2.  2.  2. ...,  2.  2.  2.]
 [ 2.  2.  2. ...,  2.  2.  2.]]
<NDArray 100x100 @gpu(0)>

We can also explicitly specify the context when creating an array:



In [22]:

    
a = mx.nd.ones((100, 100), mx.gpu(0))
a









    Out[22]:





[[ 1.  1.  1. ...,  1.  1.  1.]
 [ 1.  1.  1. ...,  1.  1.  1.]
 [ 1.  1.  1. ...,  1.  1.  1.]
 ..., 
 [ 1.  1.  1. ...,  1.  1.  1.]
 [ 1.  1.  1. ...,  1.  1.  1.]
 [ 1.  1.  1. ...,  1.  1.  1.]]
<NDArray 100x100 @gpu(0)>

Currently, MXNet requires two arrays to sit on the same device for computation. There are several methods for copying data between devices.



In [23]:

    
import mxnet as mx
a = mx.nd.ones((100,100), mx.cpu())
b = mx.nd.ones((100,100), mx.gpu())
c = mx.nd.ones((100,100), mx.gpu())
a.copyto(c)  # copy from CPU to GPU
d = b + c
e = b.as_in_context(c.context) + c  # same to above
{'d':d, 'e':e}









    Out[23]:





{'d': 
 [[ 2.  2.  2. ...,  2.  2.  2.]
  [ 2.  2.  2. ...,  2.  2.  2.]
  [ 2.  2.  2. ...,  2.  2.  2.]
  ..., 
  [ 2.  2.  2. ...,  2.  2.  2.]
  [ 2.  2.  2. ...,  2.  2.  2.]
  [ 2.  2.  2. ...,  2.  2.  2.]]
 <NDArray 100x100 @gpu(0)>, 'e': 
 [[ 2.  2.  2. ...,  2.  2.  2.]
  [ 2.  2.  2. ...,  2.  2.  2.]
  [ 2.  2.  2. ...,  2.  2.  2.]
  ..., 
  [ 2.  2.  2. ...,  2.  2.  2.]
  [ 2.  2.  2. ...,  2.  2.  2.]
  [ 2.  2.  2. ...,  2.  2.  2.]]
 <NDArray 100x100 @gpu(0)>}

Serialize From/To (Distributed) Filesystems

MXNet offers two simple ways to save (load) data to (from) disk. The first way is to use pickle, as you might with any other Python objects. NDArray is pickle-compatible.



In [24]:

    
import pickle as pkl
a = mx.nd.ones((2, 3))
# pack and then dump into disk
data = pkl.dumps(a)
pkl.dump(data, open('tmp.pickle', 'wb'))
# load from disk and then unpack 
data = pkl.load(open('tmp.pickle', 'rb'))
b = pkl.loads(data)
b.asnumpy()









    Out[24]:





array([[ 1.,  1.,  1.],
       [ 1.,  1.,  1.]], dtype=float32)

The second way is to directly dump to disk in binary format by using the save and load methods. We can save/load a single NDArray, or a list of NDArrays:



In [25]:

    
a = mx.nd.ones((2,3))
b = mx.nd.ones((5,6))               
mx.nd.save("temp.ndarray", [a,b])
c = mx.nd.load("temp.ndarray")
c









    Out[25]:





[
 [[ 1.  1.  1.]
  [ 1.  1.  1.]]
 <NDArray 2x3 @cpu(0)>, 
 [[ 1.  1.  1.  1.  1.  1.]
  [ 1.  1.  1.  1.  1.  1.]
  [ 1.  1.  1.  1.  1.  1.]
  [ 1.  1.  1.  1.  1.  1.]
  [ 1.  1.  1.  1.  1.  1.]]
 <NDArray 5x6 @cpu(0)>]

It's also possible to save or load a dict of NDArrays in this fashion:



In [26]:

    
d = {'a':a, 'b':b}
mx.nd.save("temp.ndarray", d)
c = mx.nd.load("temp.ndarray")
c









    Out[26]:





{'a': 
 [[ 1.  1.  1.]
  [ 1.  1.  1.]]
 <NDArray 2x3 @cpu(0)>, 'b': 
 [[ 1.  1.  1.  1.  1.  1.]
  [ 1.  1.  1.  1.  1.  1.]
  [ 1.  1.  1.  1.  1.  1.]
  [ 1.  1.  1.  1.  1.  1.]
  [ 1.  1.  1.  1.  1.  1.]]
 <NDArray 5x6 @cpu(0)>}

The load and save methods are preferable to pickle in two respects

When using these methods, you can save data from within the Python interface and then use it later from another lanuage's binding. For example, if we save the data in Python:
```
a = mx.nd.ones((2, 3))
mx.save("temp.ndarray", [a,])
```
we can later load it from R:
```
a <- mx.nd.load("temp.ndarray")
as.array(a[[1]])
##      [,1] [,2] [,3]
## [1,]    1    1    1
## [2,]    1    1    1
```

When a distributed filesystem such as Amazon S3 or Hadoop HDFS is set up, we can directly save to and load from it.

mx.nd.save('s3://mybucket/mydata.ndarray', [a,])  # if compiled with USE_S3=1
mx.nd.save('hdfs///users/myname/mydata.bin', [a,])  # if compiled with USE_HDFS=1

Lazy Evaluation and Automatic Parallelization *

MXNet uses lazy evaluation to achieve superior performance. When we run a=b+1 in Python, the Python thread just pushes this operation into the backend engine and then returns. There are two benefits for to this approach:

The main Python thread can continue to execute other computations once the previous one is pushed. It is useful for frontend languages with heavy overheads.
It is easier for the backend engine to explore further optimization, such as auto parallelization.

The backend engine can resolve data dependencies and schedule the computations correctly. It is transparent to frontend users. We can explicitly call the method wait_to_read on the result array to wait until the computation finishes. Operations that copy data from an array to other packages, such as asnumpy, will implicitly call wait_to_read.



In [27]:

    
# @@@ AUTOTEST_OUTPUT_IGNORED_CELL
import time

def do(x, n):
    """push computation into the backend engine"""
    return [mx.nd.dot(x,x) for i in range(n)]
def wait(x):
    """wait until all results are available"""
    for y in x:
        y.wait_to_read()
        
tic = time.time()
a = mx.nd.ones((1000,1000))
b = do(a, 50)
print('time for all computations are pushed into the backend engine:\n %f sec' % (time.time() - tic))
wait(b)
print('time for all computations are finished:\n %f sec' % (time.time() - tic))









    



time for all computations are pushed into the backend engine:
 0.004316 sec
time for all computations are finished:
 0.200367 sec

Besides analyzing data read and write dependencies, the backend engine is able to schedule computations with no dependency in parallel. For example, in the following code:

a = mx.nd.ones((2,3))
b = a + 1
c = a + 2
d = b * c

the second and third lines can be executed in parallel. The following example first runs on CPU and then on GPU:



In [29]:

    
# @@@ AUTOTEST_OUTPUT_IGNORED_CELL
n = 10
a = mx.nd.ones((1000,1000))
b = mx.nd.ones((6000,6000), mx.gpu(1))
tic = time.time()
c = do(a, n)
wait(c)
print('Time to finish the CPU workload: %f sec' % (time.time() - tic))
d = do(b, n)
wait(d)
print('Time to finish both CPU/CPU workloads: %f sec' % (time.time() - tic))









    



Time to finish the CPU workload: 0.075873 sec
Time to finish both CPU/CPU workloads: 0.809018 sec

Now we issue all workloads at the same time. The backend engine will try to parallel the CPU and GPU computations.



In [30]:

    
# @@@ AUTOTEST_OUTPUT_IGNORED_CELL
tic = time.time()
c = do(a, n)
d = do(b, n)
wait(c)
wait(d)
print('Both as finished in: %f sec' % (time.time() - tic))









    



Both as finished in: 0.742493 sec

Current Status

Whenever possible, we try to keep the NDArray API as similar as possible to the NumPy API. But you should note that NDArray is not fully yet fully NumPy-compatible. Here we summary the major differences, which we hope to be resolve quickly. (If you'd like to contribute, fork us on Github)

Slice and Index.
- NDArray can only slice along one dimension at a time. Namely, we cannot use x[:, 1] to slice both dimensions.
- Only continuous indices are supported, we cannot do x[1:2:3]
- Boolean indices are not supported, e.g. x[y==1].
Lack of reduce functions such as max, min...

Futher Readings

NDArray API Documents for all NDArray methods.
MinPy on-going project, fully numpy compatible with GPU and auto differentiation supports



In [ ]: