We have already encountered some collections in the previous chapter – arrays, tuples, dicts and sets.
This chapter examines what we can do with collections and how to best use them for maximum effectiveness. A collection may be indexable, associative or neither. This refers primarily to the way we access individual elements within the collection.
collection[index]
, e.g. shopping_list[5]
. Unusually for most programming languages and in sharp contrast to other languages like Python, indices begin with 1, rather than 0. [1, 3, 5, 7] == [3, 7, 1, 5]
yields, as one would expect, false
.
In [14]:
shopping_list = ["eggs","flour","jam","syrup","milk"]
Out[14]:
In [15]:
shopping_list[5]
Out[15]:
In [12]:
backup_shopping_list = sort(shopping_list,rev = true)
backup_shopping_list == shopping_list
Out[12]:
collection[index]
form but rather in the collection[key]
form.
In [82]:
phonebook = Dict("Mike" => 5551932, "Jane" => 5551933, "John" => 55519324) # Dictionaries use "" exclusively
Out[82]:
In [83]:
phonebook["Mike"] # This wouldn't work with 'Mike'
Out[83]:
In [84]:
phonebook[1] # Raises error since it's an associative collection; no index
push!()
and pop!()
. The latter, in case you were wondering, returns items in a random order, since the absence of an index means sets are not ordered; you could pop!()
any value off a set. In addition, collections may be mutable or non-mutable, which references their ability to be changed. Put quite simply, a mutable collection is one where you can change particular values after creation, while in an immutable collection, you cannot do so.
The typical immutable collections are, of course, tuples – once you have assigned a value to a tuple, you cannot change that value (although you can assign some other tuple or indeed an entirely different value to the variable that holds the tuple).
Sets again represent a special case – they are what could be best described as pseudo-immutable – there is no way to access values in a way that could change them, since you normally access an element of a set by its value (which is sufficient in a set thanks to the uniqueness constraint).
Mutable | Immutable | |
---|---|---|
Indexable | Arrays | Tuples |
Associative | Dicts | |
Non-indexable and non-associative | Sets |
In [85]:
prime_array = [2, 3, 5, 7, 11]
Out[85]:
In [86]:
prime_array[3] # Get elements at index 3
Out[86]:
In Julia, a range of numbers is written as start:end
or start:steps:end
. You can use a range to access a range of elements:
In [87]:
prime_array[2:3] # Get elements at index 2 to 3, with the starting index being 1
Out[87]:
A range always returns a collection, even if it has the length 1.
This is exemplified by the difference between prime_array[3]
, the call we made above, and prime_array[3:3]
, which returns:
In [34]:
prime_array[3:3]
Out[34]:
Which is not the same as the single value 5
we returned before from prime_array[3]
:
In [35]:
prime_array[3:3] == prime_array[3]
Out[35]:
You can access the last element of an indexable collection using [end]
:
In [36]:
prime_array[end] # Get me the last value
Out[36]:
Incidentally, end
behaves like a number – so prime_array[end-1]
returns the penultimate element of the collection.
In [37]:
prime_array[end - 2] # Get me the value 2 indexes before the last value
Out[37]:
If the indexable collection you are using is also mutable (e.g. an array), any of these methods will act as a pseudo-variable and will allow you to assign a value to it. Thus prime_array[end] = 12
would change the last element of prime_array
to 12. You also aren't restricted to a single element: calling prime_array[2:4] = 0
would result in
In [42]:
prime_array
Out[42]:
And, of course, you can use an array or another indexable collection to replace values:
In [43]:
prime_array[2:4] = [13,15,17] # Change values from index 3 to 4 to new primes
Out[43]:
In [44]:
prime_array # Show me the new array
Out[44]:
In [45]:
actors = ["Ian McKellen", "Martin Freeman", "Elijah Wood"] # Define an array of strings
Out[45]:
In [48]:
gandalf, bilbo, frodo = actors # Associate those strings to variables
Out[48]:
In [51]:
gandalf
Out[51]:
They don't even have to be the same length; we can associate as many variables as there are elements available:
In [59]:
a, b = actors
println(a)
println(b)
But you cannot have more; the collection does not wrap around:
In [58]:
a, b, c, d = actors
println(a)
println(b)
Unpacking can also be used to swap the contents of variables:
In [60]:
firstname = "Irving"
lastname = "Washington"
Out[60]:
In [62]:
firstname, lastname = lastname, firstname # Reassociate variables to other defitions in order
Out[62]:
In [63]:
lastname # Show the changed variable
Out[63]:
Note: Functions with an exclamation mark !
indicate that they modify their argument. Consider the following:
In [92]:
arr = [5,4,6]
println("With sort, the old array is unchanged: $(arr), but the new array is $(sort(arr))") # Create a copy of the array and sort it
println("With sort!, the old array is changed: $(sort!(arr)), and new array is $(sort!(arr))") # Change the array by sorting it
In [104]:
array = [1,2,3,4]
Out[104]:
In [105]:
push!(array, 5) # Push the new value into the end of the collection
Out[105]:
In [106]:
pop!(array) # Pop the last value off the collection
Out[106]:
In [107]:
array # Do we have the original array again?
Out[107]:
In [108]:
array2 = [5,6,7]
Out[108]:
In [109]:
append!(array, array2)
Out[109]:
In [110]:
append!(array, 1) # append cannot take single values; use push instead
In [111]:
append!(array, [1]) # append only takes arrays, even of length 1
Out[111]:
In [112]:
array
Out[112]:
In [113]:
shift!(array) # Shift the first element off the array
Out[113]:
In [114]:
unshift!(array, 7) # Unshift the array by adding a new element to it's start
Out[114]:
There's a set of functions starting with find — such as find()
, findfirst()
, and findnext()
— that you can use to get the index or indices of values within an indexable collection that match a specific value, or pass a test. These functions share three properties.
Their result is the index or indices of the value sought or tested for, with the nth element's index being n, not n-1 as you might be used to from other languages (Since we start from 1).
You can use these functions in two principal forms: you can test for a value or you can test for a function, in which case the results will be the values for which the function returns true
.
The find
functions' way of telling you they haven't found anything is returning zero, since there is no element of index zero.
Let's try to find things within the following Array
: primes_and_one = [1,2,3,5,7,11,13,17,19,23]
In [116]:
primes_and_one = [1,2,3,5,7,11,13,17,19,23]
Out[116]:
In [118]:
findfirst(primes_and_one, 5) # Where is the first 5 in my array?
Out[118]:
As noted above, we can feed the find
functions a function as well – it will return values for which the function would return a true
value.
We have not really discussed functions, but the general idea should be familiar to you. A function of the form x -> x == 13
results in true
of the value of x
is 13 and false
otherwise.
Let's try to see which prime number is the first to equal 13 (don't expect big surprises):
In [126]:
findfirst(x -> x == 13, primes_and_one) # find the first index that matches the function
Out[126]:
In [127]:
primes_and_one[7]
Out[127]:
In [128]:
findfirst(x -> x >= 3, primes_and_one) # If more than one value matches a function, returns index of the first match
Out[128]:
In [129]:
primes_and_one[3]
Out[129]:
You might have noticed that unlike in the case where you were searching for a particular value, where you're searching by a function, the function comes first (findfirst(array, value)
vs. findfirst(function, array)
)
This is a little idiosyncrasy, but has to do with the way Julia determines what to do based on what it is provided with. A lot more of this will be explored in a later Chapter. .
In [138]:
find(isinteger, primes_and_one) # Show me all the indices of primes that are integers
Out[138]:
In [139]:
find(x -> x >=3, primes_and_one) # Show me all the indices of primes that are greater than 3
Out[139]:
In [140]:
primes_and_one[find(x -> x >=3, primes_and_one)] # Wrap it up together and show me the values, not just the indices
Out[140]:
In [141]:
findnext(isodd, primes_and_one, 4) # Find me the next odd prime index from the 4th index onwards
Out[141]:
Wait, why 4
? As you might remember, Julia is 1-indexed, not 0-indexed, and inclusive. Therefore, an index 'begins before' the number.
The number after the first index is the first number in the sequence and so on. As such, the number after the third item in a collection is the item next to (= following) the index 4
, not 3
.
You might also have noticed that when you use a function as an argument, you do not use the parentheses you would normally use to call a function. That is because function()
means call this function
while function
is merely a reference to the function object itself.
So far, we've only been getting indices. How do we retrieve the actual elements? The answer is, of course, by using our magical []
(square brackets) syntax. We'll also use this as a good opportunity to introduce a very useful function, isprime()
, which returns true
for primes and false otherwise:
In [285]:
find(isprime, primes_and_one) # What indices of this array are prime?
Out[285]:
In [286]:
primes_and_one[find(isprime,primes_and_one)] # What are the values at the indices of this array that are prime?
Out[286]:
In [287]:
filter(isodd, primes_and_one) # What are the values of this array that are odd?
Out[287]:
filter()
can be used in-place, by using the !
after the name of the function (thereby changing the original variable).
Using filter!()
, altering the actual array rather than returning a filtered copy. Note, however, that functions ending with !
modify the object, so, obviously immutable type they act on must be mutable – you would, therefore, not be able to filter!()
a tuple, even though you would be able to filter()
it.
In [288]:
sort([-3, 2, 1, 7])
Out[288]:
You can specify the sort criterion with a function using by
– in this case, we will be using the absolute value function abs
(remember not to use the parentheses symbolising function call, just the name of the function):
In [289]:
sort([-3,2,1,7], by=abs) # Sort this aray by absolute value
Out[289]:
You can change the order of sorting using rev
:
In [149]:
sort([-3,2,1,7], by=abs, rev=true) # Sort this array by absolute value and reverse it.
Out[149]:
In [150]:
sort([-3,2,1,7], by=abs, rev=true, alg=MergeSort) # Sort the array by absolute value using MergeSort, and reverse it
Out[150]:
For mutable indexable collections, such as arrays, you can use sort!()
, which sorts 'in place'.
You can also sort non-numeric elements, or indeed anything for which the isless()
function is defined, which sorting uses internally:
In [152]:
sort!(["Bayes", "Laplace", "Poisson", "Gauss"]) # Sort this array in place
Out[152]:
In [156]:
count(isodd, primes_and_one) # How many values in this array match the isodd function?
Out[156]:
all()
and any()
implement two of the mathematical concepts known as quantifiers, with all()
representing the universal quantifier "for all" (∀), while any()
implements the existential quantifier "for any" (∃x).
These functions test whether all or any, respectively, of a collection satisfies a certain criterion, and return a single truth value.
In [159]:
all(x -> x >= 3, [1:10]) # Does f(x) apply for all values in the defined range (x >= 3 ∀ {1:10})?
Out[159]:
In [189]:
any(x -> x >= 3, [1:10]) # Does f(x) apply for any values in the defined range (x >= 3 ∃x {1:10})?
Out[189]:
In [168]:
in(3, primes_and_one) # Is there a 2 in this collection?
Out[168]:
Somewhat strangely, in the in()
syntax, the needle comes before the haystack, i.e. in(value, array)
, where value
denotes the value you are looking for, just like other function references.
Arrays (the ones we used in our examples so far in this section) are mutable indexable collections. The type Array{T,N}
indicates an N
-dimensional array which elements' types are subtypes of T
.
For instance, Array{Number, 2}
is an 2-dimensional array. Its elements' types descend from Number
(e.g.Int
, Float64
).
How do we access elements in a multidimensional array, a special form of indexable collection? Simple – in a multidimensional array, indexes go down each row, then from left to right. Therefore, this array
In [170]:
md_array = ["A" "B"; "C" "D"]
Out[170]:
would be indexed as follows:
In [171]:
println(md_array[1])
println(md_array[2])
println(md_array[3])
println(md_array[4])
This is a little counterintuitive and different from the usual row/column notation, where you would use array[row][column]
.
In Julia, to retrieve a cell by row and column, use array[row, column]
:
In [172]:
md_array[1,2]
Out[172]:
This generalizes for higher dimension arrays.
A tuple is an ordered sequence of elements, like an array. A tuple is represented by parentheses and commas, rather than the square brackets used by arrays. The important difference between arrays and tuples is that tuples are immutable: you can't change the elements of a tuple, or the tuple itself, after creating it.
Tuples are generally used for small fixed-length collections — they're ubiquitous across Julia, for example as argument lists. Where a function returns multiple values, which, as we see, is pretty often the case, the result is a tuple.
A corollary of immutability is that none of the !
functions work on tuples - but at the same time, using functions such as push()
is perfectly acceptable, since it returns a copy of the tuple with the added element, which does not alter the original tuple.
An associative collection (typically Dictionaries) is a kind of non-indexed collection that stores (usually) pairs of values.
The indexable collections you have encountered correspond to real-life examples such as a shopping list or a sequential list of train stations. Associative collections, on the other hand, have a key and a value (for this reason, they are sometimes referred to as key-value pairs).
Julia, like many other programming languages, implements associative collections in an object called a dict
(short for dictionary
), which corresponds to 'maps', 'hash tables' or 'dictionaries' in other programming languages.
A dict, as we have seen, is usually created using the dict literal:
In [200]:
dict = Dict("a" => 1, "b" => 2, "c" => 3)
Out[200]:
The key of a key-value pair is unique, meaning that while several keys might point at the same value (and a key might point at a collection as a value), you cannot have duplicate keys (in database terminology, you might have heard this referred to as a one-to-many relationship).
In [194]:
[i => sqrt(i) for i = 1:2:15] # The => defines the key -value relationship, and returns the result as a dict
Out[194]:
This creates a dict with the square root of every odd number from 1 to 15. In this case, i
can be any iterable – while ranges are the most frequently used, there is no reason why.
The following would not be equally valid:
In [198]:
Dict(i => sqrt(i) for i = [2, 5, 6, 8, 12, 64]) # Dict only takes key pairs,
Note: Comprehension syntax is generally applicable to most collections:
In [197]:
[sqrt(i) for i = 1:2:15] # This is just list comprehesnion; the key-pair can be dropped to return an array
Out[197]:
Secondly, you can also create an empty dictionary. Dict()
will construct an empty dictionary that permits any elements, while Dict{type1, type2}()
will create an empty dictionary that permits any elements with keys of type1
and values of type2
.
Finally, earlier versions of Julia used to support what is sometimes referred to as zip creation of a dict, namely entering two equal-length tuples, one for keys and one for values. This is now regarded as deprecated – it still works, but you should not use it. Instead, the correct syntax is Dict(zip(ks, vs))
:
In [201]:
ks = ("a", "b", "c")
vs = ("1", "2", "3")
Out[201]:
In [202]:
Dict(zip(ks,vs))
Out[202]:
In [205]:
statisticians = Dict("Gosset" => "1876-1937", "Pearson" => "1857-1936", "Galton" => "1822-1911")
Out[205]:
In [206]:
statisticians["Gosset"]
Out[206]:
One drawback of the bracket syntax is that if there is no entry for the key provided, Julia will raise an error:
In [207]:
statisticians["Kendall"]
An alternative form of accessing a dictionary is using the get()
function, which accepts a default value:
In [214]:
get(statisticians, "Pearson", "I'm sorry, I don't know when this person lived.")
Out[214]:
In [215]:
get(statisticians, "Kendall", "I'm sorry, I don't know when this person lived.")
Out[215]:
This is particularly helpful for key-error handling; any not found entries all inherit this default response.
Unlike in some other programming languages, a default is not optional for Julia:
In [216]:
get(statisticians, "Kendall")
In [211]:
get!(statisticians, "Kendall", "I'm sorry, I don't know when this person lived.")
Out[211]:
In [212]:
statisticians
Out[212]:
In [220]:
pop!(statisticians, "Kendall")
Out[220]:
In [221]:
statisticians
Out[221]:
In [222]:
statisticians["Kendall"] = "1907-1983"
Out[222]:
In [231]:
statisticians
Out[231]:
In [224]:
haskey(statisticians, "Galton") # Does this Dict have this key?
Out[224]:
In [230]:
haskey(statisticians, "Bayes")
Out[230]:
You can also check for the existence of a key-value pair in a dict using the in()
function you might be familiar with from arrays using the associative array symbol =>
:
In [242]:
statisticians
Out[242]:
In [243]:
in(("Galton" => "1822-1911"), statisticians) # Is this key-pair in this Dict?
Out[243]:
In [244]:
in(("Bayes" => "1702-1761"), statisticians)
Out[244]:
In [245]:
keys(statisticians)
Out[245]:
You can retrieve values, predictably, by using the values()
function:
In [246]:
values(statisticians)
Out[246]:
In [249]:
[i => sqrt(i) for i = 1:1:15]
Out[249]:
This is because dicts are not indexable, therefore there is no ordering that would make inherent sense.
However, sometimes, we like dictionaries sorted. Disappointingly, sorting dicts is not as easy as sorting arrays: sort(statisticians)
tells us that 'sort' has no method matching sort(::Dict{ASCIIString,ASCIIString})
.
Therefore, you have to write your own sort function that first converts statisticians
from a dict into an array of 2-element tuples. This is because the sort()
function has no defined methods for dicts, but it can sort arrays, including tuples, where it sorts by the first element in the tuple.
We can emulate this functionality with sort(collect(statisticians))
which converts this Dict into another (typically an Array:
In [261]:
sort(collect(statisticians))
Out[261]:
We then wrap this back up in a Dict:
In [262]:
sort(collect(statisticians))
Out[262]:
In [263]:
mathematicians = Dict("Gauss" => "1777-1855", "Leibniz" => "1646-1716", "Abel" => "1802-1829")
Out[263]:
In [264]:
merge(mathematicians, statisticians)
Out[264]:
Its bang counterpart, merge!()
, merges them in place, overwriting the first dict mentioned while leaving the second intact.
In [265]:
merge!(mathematicians, statisticians)
Out[265]:
In [266]:
mathematicians
Out[266]:
In [267]:
statisticians
Out[267]:
Sets are the primary collection of this type.
You might be familiar with the idea of sets from maths/set theory. A set is a non-indexable, non-associative and non-mutable collection that also has unique elements. No element may occur twice, so an element's value identifies it conclusively.
In [269]:
primes_set = Set()
Out[269]:
Or you can specify what sort of data types it would accept:
In [273]:
primes_set = Set{Int64}()
Out[273]:
You can create and fill sets in one go by listing elements surrounded by curly braces {}
, and if you surround the elements with square brackets []
instead of curly braces {}
Julia will guess the type:
mersenne_primes_set = Set([3, 7, 31, 127])
Sets have some unique functions that accommodate certain problems well-known from set theory: the functions union()
, intersect()
and setdiff()
each, respectively, implement the union, intersection and difference of sets. They are also immutable, so they cannot be changed; they can only generate new sets.
Let's see how we can use this to find some similarities between the cast of two blockbusters, The Lord of the Rings and The Matrix.
First, let's create two sets with some actors from each movie:
In [290]:
lotr_actors = Set(["Elijah Wood", "Ian McKellen", "Viggo Mortensen", "Hugo Weaving"])
matrix_actors = Set(["Keanu Reeves", "Lawrence Fishburne", "Hugo Weaving"])
Out[290]:
To find shared actors, we can use intersect()
:
In [291]:
intersect(lotr_actors, matrix_actors)
Out[291]:
To find actors who only starred in Lord of the Rings but not in The Matrix, we can use setdiff()
, which shows all elements that are in the first Set
but not the second:
In [280]:
setdiff(lotr_actors, matrix_actors)
Out[280]:
Finally, we can see actors who played in either of the movies, by using union()
:
In [281]:
union(lotr_actors, matrix_actors)
Out[281]:
Until now, we have generally created collections using literals, and with precious little regard to the types of information that go in them. While types will be discussed in quite a bit of detail later on, what we do know about types is that they are individual categories of data.
Julia operates what is called type inference: unless you tell it explicitly what type something is, it tries to figure it out best as it can. We see this in operation when we create a new collection. When a collection is created and Julia is not told that this is going to be a collection containing elements of only a particular kind or particular kinds of values, it makes an educated guess. The REPL tells us this much:
In [292]:
mersenne_primes = [3, 7, 31, 127, 8191, 131071]
Out[292]:
Upon creating the array, the REPL reports to us that it's an array consisting of six elements, all of type Int64
– a type of signed 64-bit integer (don't worry if that means quite little to you just yet, we will be discussing various integer types in Chapter [X]). It also, helpfully, reports to us that we've got a 1-dimensional array.
In [293]:
not_really_mersenne_primes = [3, 7, "potato", 127, π, "wallaby"]
Out[293]:
As you have guessed, Julia is at a loss as to what to do with this, since we've got a mix of integers, strings and a constant thrown in for good measure. Therefore, it tells us that it has inferred the type of the collection to be Any
– a type that applies to all objects.
In [284]:
empty_set = []
Out[284]:
In this chapter, we learned about the way Julia's handles collections, whether they are Indexable, Associative, or neither. An indexable array is referenced by its index value (starting from 1 in Julia), and provides a bunch of operations for changing, setting, modifying, finding, and even testing these collections. Associative arrays are based on the notion of key-value pairs, and are almost always dictionaries. They do not have an order, and rely on the notion of retrieving a value for a given key, not indices. Sets are neither indexable nor associative; they are a collection that deals exclusvily with unique values. In any case, they all share one important property – to act as 'envelopes' for multiple elements, each with their distinct type.