Column 1

Setup - Environment


In [42]:
import os
if not os.path.exists("data/column1"):
    os.makedirs("data/column1")

Setup - Producing random test data

The column (problem 4) challenges the reader to write its own test function that would generate the random test data used during the exercise. Those need to be random, positive integers.


In [13]:
from random import randint, random, randrange

INIT_NUMBER = 1
RANDOM_RANGE = 20
TOTAL_RECORD = 1000000

current_number = INIT_NUMBER
numbers = []

def randnum(): 
    return randint(1, randrange(2, RANDOM_RANGE))

for i in range(0, TOTAL_RECORD):
    current_number += randnum()
    numbers.append((random(), str(current_number)))

numbers.sort()

with open("data/column1/numbers.txt", "w") as f:
    for r, line in numbers:
        f.write(line + "\n")

This will generate 1 000 000 random unique numbers. You can tweak the values of the constants yourself to tune the variability and range of the test data.

First Iteration

The first exercise consists of ordering a file that contains phone numbers. These are uniques, positive integers that can go up from minimum 0 to 1 000 000 000 maximum. The use of a bitmap is encouraged as a solution as this can represent a wide range of sequential numbers with minimal space. The use of a bitmap data structure will also naturally order the numbers, which is the main purpose of this exercise.


In [14]:
from bitarray import bitarray

BITMAP_RANGE = INIT_NUMBER + RANDOM_RANGE * TOTAL_RECORD
bitmap = bitarray(BITMAP_RANGE)
bitmap.setall(False)

with open("data/column1/numbers.txt", "r") as f:
    for line in f:
        number = int(line)
        bitmap[number] = True

Let's get some information on the bitmap itself, to see how much space it takes in memory.


In [15]:
bitmap_size = bitmap.buffer_info()[1]
bitmap_unused = bitmap.buffer_info()[3]
bitmap_alloc = bitmap.buffer_info()[4]

print("The bitmap range is {0}.".format(BITMAP_RANGE))
print("There are {0} true values in the bitmap.".format(bitmap.count()))
print("The bitmap buffer is {0} long, {1} unused, {2} allocated.".format(bitmap_size, bitmap_unused, bitmap_alloc))


The bitmap range is 20000001.
There are 1000000 true values in the bitmap.
The bitmap buffer is 2500001 long, 7 unused, 2500001 allocated.

This is an optional step for if you want to generate the binary bitmap file and take a look at it.


In [12]:
with open("data/column1/numbers.bmp", "w") as f:
    bitmap.tofile(f)

Let's save in a file the sorted numbers stored in the bitmap with their positional number representation.


In [16]:
import itertools
irange = lambda stop: iter(itertools.count().next, stop)

with open("data/column1/sorted-numbers.txt", "w") as f:
    for i in xrange(BITMAP_RANGE):
        f.write(str(i) + "\n") if bitmap[i] is True else None

That should produce a file that contains the same 1 000 000 numbers as in numbers.txt but in sorted order.


In [ ]: