IntArray is a data structure to store and manipulate an array of unsigned integers using as small space as possible. Each element in the IntArray{w} type is packed using w bits. So you can store integers between 0x00 and 0x03 with 2 bits per element while Array{UInt8} requires 8 bits: the required space is one-quarter!

Types


In [1]:
using IntArrays

Only one type is exported from this package: IntArray. The IntArray type takes three type parameters:

  • w: the width (or bits) of integers
  • T: the element type of an array
  • n: the number of dimensions of an array.

For example, IntArray{2,UInt8,1} is a one-dimensional array (or vector) of UInt8 integers packed in 2 bits.

Like the Vector{T} and Matrix{T} types in the standard library, IntVector{w,T} and IntMatrix{w,T} are type aliases of IntArray{w,T,1} and IntArray{w,T,2}, respectively.


In [2]:
srand(123456);

Constructions

You can construct an IntArray object from an existing array with given width:


In [3]:
# construct an array
vec = rand(0x00:0x03, 8)


Out[3]:
8-element Array{UInt8,1}:
 0x03
 0x00
 0x01
 0x02
 0x03
 0x03
 0x01
 0x03

In [4]:
# convert it to IntVector with 2-bit packing
ivec = IntArray{2}(vec)


Out[4]:
8-element IntArrays.IntArray{2,UInt8,1}:
 0x03
 0x00
 0x01
 0x02
 0x03
 0x03
 0x01
 0x03

The size is one-quarter of the original array:


In [5]:
sizeof(vec)


Out[5]:
8

In [6]:
sizeof(ivec)


Out[6]:
2

You can allocate an IntArray object with given width, element type, and size:


In [7]:
# allocate a vector object
ivec = IntArray{4,UInt8}(4)


Out[7]:
4-element IntArrays.IntArray{4,UInt8,1}:
 0x00
 0x00
 0x04
 0x0f

In [8]:
# allocate a matrix object
imat = IntArray{9,UInt16}(4, 5)


Out[8]:
4x5 IntArrays.IntArray{9,UInt16,2}:
 0x00d0  0x0000  0x00a5  0x0000  0x0174
 0x0096  0x0000  0x0143  0x0000  0x0083
 0x0067  0x0000  0x0042  0x0000  0x0000
 0x0021  0x0160  0x0000  0x0041  0x0000

The elements of the allocated matrix are uninitialized, so you should fill them if necessary:


In [9]:
fill!(imat, 0)


Out[9]:
4x5 IntArrays.IntArray{9,UInt16,2}:
 0x0000  0x0000  0x0000  0x0000  0x0000
 0x0000  0x0000  0x0000  0x0000  0x0000
 0x0000  0x0000  0x0000  0x0000  0x0000
 0x0000  0x0000  0x0000  0x0000  0x0000

You can also use the IntVector and IntMatrix type aliases to construct a one-/two-dimensional array:


In [10]:
ivec = IntVector{2,UInt8}(6)


Out[10]:
6-element IntArrays.IntArray{2,UInt8,1}:
 0x00
 0x00
 0x02
 0x03
 0x02
 0x01

In [11]:
imat = IntMatrix{2,UInt8}(3, 4)


Out[11]:
3x4 IntArrays.IntArray{2,UInt8,2}:
 0x00  0x01  0x00  0x00
 0x00  0x01  0x00  0x03
 0x01  0x01  0x02  0x01

Operations

The IntArray{w,T,n} type inherits from the AbstractArray{T,n} type. That means an IntArray object behaves like a common array in many ways. Let's create an IntVector object with 2-bit encoding and see available operations on it.


In [12]:
ivec = IntVector{2}([0x00, 0x01, 0x02, 0x03])


Out[12]:
4-element IntArrays.IntArray{2,UInt8,1}:
 0x00
 0x01
 0x02
 0x03

IntArray supports random access like normal arrays:


In [13]:
ivec[2]


Out[13]:
0x01

In [14]:
ivec[2] = 0x03
ivec


Out[14]:
4-element IntArrays.IntArray{2,UInt8,1}:
 0x00
 0x03
 0x02
 0x03

In [15]:
ivec[2] = 1
ivec


Out[15]:
4-element IntArrays.IntArray{2,UInt8,1}:
 0x00
 0x01
 0x02
 0x03

Please keep in mind that elements are truncated to w bits without any warnings:


In [16]:
# 0x05 cannot be encoded in 2 bits!
ivec[1] = 0x05
# in this case, 0x05 is truncated to 0x01 because 0x05 && 0b11 = 0x01
ivec


Out[16]:
4-element IntArrays.IntArray{2,UInt8,1}:
 0x01
 0x01
 0x02
 0x03

IntVector behaves like a Vector:


In [17]:
ivec = IntVector{2}([0x00])


Out[17]:
1-element IntArrays.IntArray{2,UInt8,1}:
 0x00

In [18]:
push!(ivec, 0x01)


Out[18]:
2-element IntArrays.IntArray{2,UInt8,1}:
 0x00
 0x01

In [19]:
append!(ivec, [0x02, 0x03, 0x00])


Out[19]:
5-element IntArrays.IntArray{2,UInt8,1}:
 0x00
 0x01
 0x02
 0x03
 0x00

In [20]:
pop!(ivec)


Out[20]:
0x00

In [21]:
reverse!(ivec)


Out[21]:
4-element IntArrays.IntArray{2,UInt8,1}:
 0x03
 0x02
 0x01
 0x00

In [22]:
minimum(ivec), maximum(ivec)


Out[22]:
(0x00,0x03)

In [23]:
extrema(ivec)


Out[23]:
(0x00,0x03)

In [24]:
findnext(ivec, 2)


Out[24]:
2

In [25]:
findmin(ivec)


Out[25]:
(0x00,4)

Performance

Packing and unpacking integers require several CPU clocks. Therefore, in general, random access operations over IntArray objects are slower:


In [42]:
vec = rand(0x00:0x0f, 100000)
ivec = IntVector{4}(vec);

In [43]:
sort(vec)
@time sort(vec);


  0.003387 seconds (8 allocations: 98.016 KB)

In [44]:
sort(ivec);
@time sort(ivec);


  0.012687 seconds (18 allocations: 49.672 KB)

In [45]:
vec = rand(0x00:0x03, 10000000)
ivec = IntVector{2}(vec);

In [46]:
sort(vec)
@time sort(vec);


  0.315935 seconds (8 allocations: 9.537 MB)

In [47]:
sort(ivec)
@time sort(ivec);


  1.392724 seconds (18 allocations: 2.385 MB)

But this performance gap may diminish when a specialized algorithm can be applied. For example, radix sort runs faster when the bit width is small enough:


In [48]:
radixsort(ivec)
@time radixsort(ivec);


  0.216997 seconds (16 allocations: 2.385 MB)


In [49]:
versioninfo()


Julia Version 0.4.0-dev+6759
Commit c4c9010* (2015-08-16 03:39 UTC)
Platform Info:
  System: Darwin (x86_64-apple-darwin14.4.0)
  CPU: Intel(R) Core(TM) i5-4288U CPU @ 2.60GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas
  LIBM: libopenlibm
  LLVM: libLLVM-3.3