In [42]:
import os
if not os.path.exists("data/column1"):
os.makedirs("data/column1")
The column (problem 4) challenges the reader to write its own test function that would generate the random test data used during the exercise. Those need to be random, positive integers.
In [13]:
from random import randint, random, randrange
INIT_NUMBER = 1
RANDOM_RANGE = 20
TOTAL_RECORD = 1000000
current_number = INIT_NUMBER
numbers = []
def randnum():
return randint(1, randrange(2, RANDOM_RANGE))
for i in range(0, TOTAL_RECORD):
current_number += randnum()
numbers.append((random(), str(current_number)))
numbers.sort()
with open("data/column1/numbers.txt", "w") as f:
for r, line in numbers:
f.write(line + "\n")
This will generate 1 000 000 random unique numbers. You can tweak the values of the constants yourself to tune the variability and range of the test data.
The first exercise consists of ordering a file that contains phone numbers. These are uniques, positive integers that can go up from minimum 0 to 1 000 000 000 maximum. The use of a bitmap is encouraged as a solution as this can represent a wide range of sequential numbers with minimal space. The use of a bitmap data structure will also naturally order the numbers, which is the main purpose of this exercise.
In [14]:
from bitarray import bitarray
BITMAP_RANGE = INIT_NUMBER + RANDOM_RANGE * TOTAL_RECORD
bitmap = bitarray(BITMAP_RANGE)
bitmap.setall(False)
with open("data/column1/numbers.txt", "r") as f:
for line in f:
number = int(line)
bitmap[number] = True
Let's get some information on the bitmap itself, to see how much space it takes in memory.
In [15]:
bitmap_size = bitmap.buffer_info()[1]
bitmap_unused = bitmap.buffer_info()[3]
bitmap_alloc = bitmap.buffer_info()[4]
print("The bitmap range is {0}.".format(BITMAP_RANGE))
print("There are {0} true values in the bitmap.".format(bitmap.count()))
print("The bitmap buffer is {0} long, {1} unused, {2} allocated.".format(bitmap_size, bitmap_unused, bitmap_alloc))
This is an optional step for if you want to generate the binary bitmap file and take a look at it.
In [12]:
with open("data/column1/numbers.bmp", "w") as f:
bitmap.tofile(f)
Let's save in a file the sorted numbers stored in the bitmap with their positional number representation.
In [16]:
import itertools
irange = lambda stop: iter(itertools.count().next, stop)
with open("data/column1/sorted-numbers.txt", "w") as f:
for i in xrange(BITMAP_RANGE):
f.write(str(i) + "\n") if bitmap[i] is True else None
That should produce a file that contains the same 1 000 000 numbers as in numbers.txt but in sorted order.
In [ ]: