Sorting

The sort method takes a list and shuffles the data around such that the list is sorted into some order. The sort method has two optional arguments, reverse (which can take the value True or False) and key (which requires a function).

By default, Python sorts from smallest-to-largest value, but obviously if you set reverse=True the returned list will be ordered largest-to-smallest.

Let’s start with a simple example involving numbers:



In [1]:

    
import random
list_1 = list(range(1, 16))   # creates a list 1...15

random.shuffle(list_1)     # shuffles the list 
print( "shuffled list:    ", list_1)

list_1.sort()
print("sorted list:      ", list_1)

list_1.sort(reverse= True)
print("sorted (reverse): ", list_1)









    



shuffled list:     [10, 15, 6, 12, 11, 5, 3, 4, 9, 13, 7, 8, 1, 14, 2]
sorted list:       [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
sorted (reverse):  [15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1]

Sorting can get a bit more tricky when dealing with different data types, for example, how do you sort a list of strings? or a list of lists? Well, Python will have a default behaviour in each case, and if you ever need to know I'd recommend google.

As a quick word of caution, I'd be careful about sorting lists containing multiple data types. The logic being your code is more likely to throw errors. Far worse still, instead of errors your data may just end up being junk!

Simple example:



In [2]:

    
our_list = [True, False, -4, -3, -2, 2, 3, 4]
our_list.sort()
print(our_list)









    



[-4, -3, -2, False, True, 2, 3, 4]

In this case, we have a list containing booleans (more on those later) and integers. Before we even start to question what Python is doing it might be worth asking ourselves the following question:

“How would we sort a list like this?”

Does it really make sense to sort by size and then shove True, False in the middle? A more 'human' approach might be to sort by type and then by size.

[False, True, -4, -3, -2, 2, 3, 4]

In any case, I can explain to you Pythons weird look decision to put True, False right in the smack of the list by spending a few moments explaining that Booleans (i.e. True/False) are NOT ‘special’ in Python. As a matter of fact, True is just the number 1 and False is just the number 0. Here, let me show you:



In [8]:

    
print(True + True + True)   # 1 + 1 + 1 = 3
print(True + 20)            # 1 + 20  = 21
print((True + 20) * False)  # (20 + 1) * 0 = 0

Once you understand that True/False are actually numbers in Python (specifically 1 and 0) its easy to understand why the sort function put these two values smack centre in our list of numbers.

So yeah, to protect yourself against "junk" output you have to really careful about what you are doing; a good question to ask yourself is:

"Does this task I'm asking of Python even make sense?"

Alright, before wrapping up this section on sorting lets checkout the optional argument 'key'. Basically, the sort function, when given this optional argument will sort the data according to the specified key. Its probably easiest if I just show you...



In [16]:

    
the_list = [8, 8, 8, 0, 0, 9, 1, 2, 3, 4, 5, 55, 55, 55, 55, 324343434, 40000, 50, 40, 1, 1, 1, 1, 1, 3, 33, 33, 98]

by_default    = sorted(the_list)
by_count      = sorted(the_list, key= lambda x: the_list.count(x)) # Lambda is explained below!
by_digits_num = sorted(the_list, key= lambda x: len(str(x)))


print("Default sort (small to large)...", by_default,  
      "\nSort by 'count' (i.e. how many times each number occurs)...", by_count,
      "\nSort by 'number of digits'...", by_digits_num, sep="\n")









    



Default sort (small to large)...
[0, 0, 1, 1, 1, 1, 1, 1, 2, 3, 3, 4, 5, 8, 8, 8, 9, 33, 33, 40, 50, 55, 55, 55, 55, 98, 40000, 324343434]

Sort by 'count' (i.e. how many times each number occurs)...
[9, 2, 4, 5, 324343434, 40000, 50, 40, 98, 0, 0, 3, 3, 33, 33, 8, 8, 8, 55, 55, 55, 55, 1, 1, 1, 1, 1, 1]

Sort by 'number of digits'...
[8, 8, 8, 0, 0, 9, 1, 2, 3, 4, 5, 1, 1, 1, 1, 1, 3, 55, 55, 55, 55, 50, 40, 33, 33, 98, 40000, 324343434]

So, you may notice that the 'sort by n digits' list only checks how many digits are in a number and doesn't sort beyond that. In other words, because the sort puts all the one digit numbers e.g before the two digit numbers but the numbers are not in any order beyond that (e.g one digit numbers are NOT sorted from smallest-to-largest).

If you want to by two categories then things get a bit complicated. But here's a quick example:



In [2]:

    
the_list = [8, 8, 8, 0, 0, 9, 1, 2, 3, 4, 5, 55, 55, 55, 55, 324343434, 40000, 50, 40, 1, 1, 1, 1, 1, 3, 33, 33, 98]

by_len_then_size = sorted(the_list, key=lambda x: (len(str(x)), x))
by_len_then_size_large_first = sorted(the_list, key=lambda x: (-len(str(x)), -x))

print(by_len_then_size)
print(by_len_then_size_large_first)









    



[0, 0, 1, 1, 1, 1, 1, 1, 2, 3, 3, 4, 5, 8, 8, 8, 9, 33, 33, 40, 50, 55, 55, 55, 55, 98, 40000, 324343434]
[324343434, 40000, 98, 55, 55, 55, 55, 50, 40, 33, 33, 9, 8, 8, 8, 5, 4, 3, 3, 2, 1, 1, 1, 1, 1, 1, 0, 0]

This works by creating a small two element structure as our key. We sort by the first item, and then, if there is a tie we sort by the second item. So for example the pair (2,0) comes before the pair (2,1) because 1 is greater than 0. Likewise (6,-10) comes after (2,0) and (2,1) because 6 is greater than both 0 and 1.

If I want the values in the opposite order (large-to-small), then a quick trick I can employ is the flip the sign of the number when creating the key. This means that large numbers become very small numbers and small numbers become large. Thus, in this case Python is still sorting from small-to-large; we made it work by changing the numbers.

in the example above, I think it would have been better to write the following:

by_len_then_size_large_first = sorted(the_list, key=lambda x: (len(str(x)), x), reverse=True)

However this "flip the sign" trick can be useful if you ever what to sort category X by largest first and category Y by smallest first (or vice-versa). For example, maybe you want to sort by age first and then by largest salary.



In [3]:

    
## Each mini list is a person, salary first and age second. For example the first person earns £100 and is 23yrs old.
age_and_salaries = [[100, 23], [100, 19], [200, 27], [200, 19], [300, 33]]

age_then_wealth = sorted(age_and_salaries, key=lambda x: (x[1], -x[0]))
wealth_then_age = sorted(age_and_salaries, key=lambda x: (-x[1], x[0]))

print(age_then_wealth)
print(wealth_then_age)









    



[[200, 19], [100, 19], [100, 23], [200, 27], [300, 33]]
[[300, 33], [200, 27], [100, 23], [100, 19], [200, 19]]

Is a quick final point, I'd like to stress than any function that takes one input ('x') and returns a number can be used as our sorting method.

And so, if we wanted shuffle a list into a random order we could actually sort the list. Which may sound counter-intuitive at first, but really, when you think about it, it makes some sense; all sorting is doing is arranging some numbers according so some rule, the rule itself could be anything we like.



In [6]:

    
import random

the_list = [8, 8, 8, 0, 0, 9, 1, 2, 3, 4, 5, 55, 55, 55, 55, 324343434, 40000, 50, 40, 1, 1, 1, 1, 1, 3, 33, 33, 98]
random_shuffle = sorted(the_list, key=lambda x: random.random())

print(the_list)
print(random_shuffle)









    



[8, 8, 8, 0, 0, 9, 1, 2, 3, 4, 5, 55, 55, 55, 55, 324343434, 40000, 50, 40, 1, 1, 1, 1, 1, 3, 33, 33, 98]
[2, 8, 1, 1, 0, 50, 33, 3, 9, 40000, 8, 40, 8, 55, 4, 1, 0, 1, 1, 1, 33, 55, 3, 324343434, 55, 5, 55, 98]

I mention this because its a good 'brain-teaser' type excerise. In Computer Science, a large number of difficult problems can be solved once you change your perpective.

Lambda functions

There is nothing particularly special about lambda functions, they are just another way to build a function. lambda's are mainly used when you are trying to create a simple function quickly and you don't think you will need to call again.

is_zero = lambda x: x == 0
add = lambda x, y: x + y

The first lambda function above is simply asking if x is equal to zero. In the syntax you are more familiar with it would be written like this:

def is_zero(x):
    return x == 0

similarly the second lambda adds x to y. In 'normal' function syntax:

def add(x, y):
    return x + y



In [ ]: