Exercise Solutions

Selected Exercise Solutions

3. Thinking in Binary


A simple solution:

In [1]:
def convert(number):
    return str(number), bin(number), hex(number)


('9', '0b1001', '0x9')

The exercise says that we should only output the two other formats. As far as I'm aware, it's not possible for a Python script to distinguish between integer literals (without hacking the Python parser), so we'll use a string as the input instead:

In [2]:
def convert2(string_number):
    if string_number[1] == "x":
        num = int(string_number, 16)
        return str(num), bin(num)
    elif string_number[1] == "b":
        num = int(string_number, 2)
        return str(num), hex(num)
        num = int(string_number)
        return bin(num), hex(num)

('9', '0x9')


There are several ways of doing this, but the fastest way I've found so far (that doesn't involve writing C code or anything crazy) uses bitwise operations. This encoder is based on code for a base32 encoder taken from the base64 module.

We first need to generate a base64 dictionary. To avoid mistakes, I've copied out the standard base64 dictionary, but you could generate this programmatically:

In [3]:
b64 = {0:"A", 1:"B", 2:"C", 3:"D", 4:"E", 5:"F", 6:"G", 7:"H", 8:"I", 9:"J", 10:"K", 11:"L", 12:"M", 13:"N", 14:"O", 15:"P", 16:"Q", 17:"R", 18:"S", 19:"T", 20:"U", 21:"V", 22:"W", 23:"X", 24:"Y", 25:"Z", 26:"a", 27:"b", 28:"c", 29:"d", 30:"e", 31:"f", 32:"g", 33:"h", 34:"i", 35:"j", 36:"k", 37:"l", 38:"m", 39:"n", 40:"o", 41:"p", 42:"q", 43:"r", 44:"s", 45:"t", 46:"u", 47:"v", 48:"w", 49:"x", 50:"y", 51:"z", 52:"0", 53:"1", 54:"2", 55:"3", 56:"4", 57:"5", 58:"6", 59:"7", 60:"8", 61:"9", 62:"+", 63:"/"}

We also need the struct module:

In [4]:
import struct

Remember that if the string's character length isn't a multiple of three, we'll have to pad the encoded string with =. Notice how we build the encoded string by analyzing it in chunks of three bytes (24 bits) at a time. The struct.unpack(3B, string) function is the same as calling ord on all three characters in string (but faster). The bitwise operations simply shift and concatenate each set of 24 bits to convert them from three sets of 8 bits to four sets of 6 bits.

In [5]:
def slow_b64encode(string):
    encode = []
    quanta, leftover = divmod(len(string), 3)

    if leftover:
        string += '\0' * (3 - leftover)
        quanta += 1
    for i in xrange(quanta):
        b0, b1, b2 = struct.unpack('3B', string[i*3:(i+1)*3])
        encode.extend([b64[b0 >> 2], 
                       b64[(b0 & 0b11) << 4 | b1 >> 4], 
                       b64[(b1 & 0b1111) << 2 | b2 >> 6], 
                       b64[(b2 &  0b111111)]])
    encoded = ''.join(encode)
    if leftover == 1:
        return encoded[:-2] + "=="
    elif leftover == 2:
        return encoded[:-1] + "="
    return encoded


In [6]:
from base64 import b64encode

string = "Please encode me"
slow_b64encode(string) == b64encode(string)


The decoder is left for you to try on your own!


In [7]:
def ROR(num, size=None):

    if size is None:
        size = num.bit_length()
        size = max(size, num.bit_length())
    return bin((num >> 1) | (num & 1) << (size - 1))

The function ROR takes an integer num and rotates it to right by one bit. If size is not given, the function assumes that the intended size of the integer is num.bit_length(). If you specify a different size, the function will use that size as the integer bit length (with max used to avoid truncation in case the given size is smaller than num.bit_length()). For example:

In [8]:


In [9]:
ROR(0b1001, size=8)


If you're having trouble understanding the bitwise operations, think of them this way:

1. Identify the rightmost bit with (num & 1).
2. Create a new number by shifting the rightmost bit to the left by (size - 1) spots. For example, if the last bit is 1 and size = 8, then we have 0b10000000.
3. Shift the original number to the right by one spot with (num >> 1) so the last bit "falls off" the end. 
4. Combine the numbers found in steps 2 and 3. The original rightmost bit is now the leftmost bit. 

The equivalent ROL function is slightly more complicated:

In [10]:
def ROL(num, size=None):
    if size is None:
        size = num.bit_length()
        size = max(size, num.bit_length())

    return bin((num << 1) & ((2**size) - 1) | ((num & (2**(size - 1))) >> (size - 1)))

2**X is a shortcut for getting the binary number with 1 at the (X + 1)th position, and zeros elsewhere. (2**X) - 1 returns the binary number of size X with 1s in all positions. We use these shortcuts to get masks for the leftmost bit, and for every bit except the leftmost bit, respectively.

In [11]:


In [12]:
ROL(0b1001, size=16)


Note that we could also write functions that efficiently shift more than one bit at the same time. For instance, see https://www.falatic.com/index.php/108/python-and-bitwise-rotation


One way to do this is to register a new encoding error option with the codecs module:

In [13]:
from __future__ import unicode_literals
import codecs
import unicodedata

def unicode_name(ex):
    """The object `ex` follows the 
    UnicodeError exception format. See:
    char = ex.object[ex.start:ex.end]
    return "{" + unicodedata.name(char, "unknown") + "}", ex.end

codecs.register_error("name", unicode_name)

unicode_name is an error handler that takes a UnicodeError object as an argument, and returns the name of the Unicode character that raised the encoding error. See https://docs.python.org/2/library/codecs.html#codecs.register_error for more information. The unicodedata module is used to produce the name of the Unicode character.

to_asicii encodes string using the ASCII encoding, handling errors with unicode_name, which we registered previously as an error option called name:

In [14]:
def to_ascii(string):
    return string.encode("ascii", errors="name")

In [15]:
print to_ascii(u"† and ►")
print to_ascii(u"\u00A5 and \u00A3")


If you're on a "narrow" Python build (i.e., UCS-2 under the hood), unicodedata.name will not support Unicode code points above \uffff, so this function won't work for, say, \U0001F330. You can find the highest-supported Unicode code point on your system by calling sys.maxunicode:

In [16]:
import sys
print hex(sys.maxunicode)


Python 3 has no such limitations, as most Python 3 versions (>=3.3) support code points up to 0x10FFFF (the maximum possible code point). Since our script is compatible with Python 3, we can run it with the cell magic %%python3 (you may need to install Python 3 on your system):

In [17]:
# from __future__ import unicode_literals
import codecs
import unicodedata

def unicode_name(ex):
    """The object `ex` follows the UnicodeError exception format. See:
    char = ex.object[ex.start:ex.end]
    return "{" + unicodedata.name(char, "unknown") + "}", ex.end

codecs.register_error("name", unicode_name)

def to_ascii(string):
    return string.encode("ascii", errors="name")

print(to_ascii(u"A \U0001F330"))


4. Cryptography


Given four sets of characters (lowercase, uppercase, digits, punctuation) there are 15 different combinations of character sets we can choose (assuming we have to choose at least one set). One way to indicate our choice is to use a binary mask, where a number like 0b1010 indicates that we'd like a random password with lowercase letters and digits only. Likewise, the number 0b11 indicates we only want digits and punctuation. Here's one way to implement this securely:

In [18]:
from itertools import compress
import random
import string

char_list = [string.ascii_lowercase, string.ascii_uppercase,
                 string.digits, string.punctuation + " "]

def gen_password(char_no, length):

    # 0b1010 -> [1, 0, 1, 0]
    bitmask = [(char_no >> bit) & 1 for bit in range(3, -1, -1)]

    # apply bitmask to char_list
    chars = "".join(compress(char_list, bitmask))

    # generate a random password
    return "".join(random.SystemRandom().choice(chars) for _ in range(length))

In [19]:
gen_password(0b1010, 20)


In [20]:
gen_password(0b1100, 60)


In [21]:
gen_password(0b11, 16)


We use a list comprehension to turn the binary char_no into a "bit array" (so 0b1010 becomes [1, 0, 1, 0]). The itertools.compress function applies the bitmask to the list of character sets.

If you need a quick command line function, you can use the following one-liner (with modifications depending on your desired password length and character set):

In [22]:
''.join(random.SystemRandom().choice(string.printable) for _ in range(20))



a. We need to generate a list of dictionaries of all possible permutations with 3-bit block size (so using numbers in the range 0 through 7):

In [23]:
from itertools import permutations

p = ({i:j for i, j in zip(range(8), i)} for i in permutations(range(8)))

b. Since we're using a 3-bit block size, we have to encrypt our data 3 bits at a time. We won't go through the trouble of building a function that regroups bytes (see exercise 1.5, which shows you how to make 6-bit groups out of regular bytes using bitwise operations). This encrypt function works with 3-bit numbers directly:

In [24]:
from itertools import islice

def encrypt(data, key):
    """data is a 3-bit number and key is 
    an integer in range(40321)"""
    p = ({i:j for i, j in zip(range(8), i)} for i in permutations(range(8)))
    return next(islice(p, key, None))[data]

In [25]:
encrypt(4, 2210)


c. Permutations grow extremely fast. A 3-bit block cipher has (2**3)! = 40320 possible mappings (dictionaries) but a 4-bit block cipher has around 2.1e13 mappings, which would put a significant strain on the computer's resources. Python's generators are helpful because they won't store the entire list of dictionaries in memory, but accessing arbitrary dictionaries is still a very slow operation for large indices (keys).


The modulus operation mod n only affects numbers that are larger than n. If m**e < n, the mod operation during encryption does nothing. For example:

In [26]:
m = 2
e = 3
n = 221

m**e < n


In [27]:
c = pow(m, e, n)
c == pow(m, e)


This means we can easily reverse the encryption process by taking the eth root of the ciphertext:

In [28]:
pow(c, 1./e) == m


5. Networking


An IPv4 address has the form X.X.X.X, where each X represents a byte. You can easily convert this to an integer:

In [29]:
def IPv4_to_int(address):
    return reduce(lambda x, y: (int(x) << 8) | int(y), address.split("."))

In [30]:


Likewise for IPv6:

In [31]:
def IPv6_to_int(address):
    addr = [int(chunk, 16) for chunk in address.split(":")]
    return reduce(lambda x, y: (x << 16) | y, addr)

In [32]:


For a module with IP utilities, see ipaddress, which is part of the Python 3 standard library, and is available for Python 2 with $ pip install ipaddress.


There are a few ways you could do this. I prefer to use the dig command because it tends to be the most informative of all DNS utilities. For the root servers:

In [33]:
! dig

; <<>> DiG 9.8.3-P1 <<>>
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 21704
;; flags: qr rd ra; QUERY: 1, ANSWER: 13, AUTHORITY: 0, ADDITIONAL: 14

;.				IN	NS

.			6145	IN	NS	i.root-servers.net.
.			6145	IN	NS	j.root-servers.net.
.			6145	IN	NS	k.root-servers.net.
.			6145	IN	NS	l.root-servers.net.
.			6145	IN	NS	m.root-servers.net.
.			6145	IN	NS	a.root-servers.net.
.			6145	IN	NS	b.root-servers.net.
.			6145	IN	NS	c.root-servers.net.
.			6145	IN	NS	d.root-servers.net.
.			6145	IN	NS	e.root-servers.net.
.			6145	IN	NS	f.root-servers.net.
.			6145	IN	NS	g.root-servers.net.
.			6145	IN	NS	h.root-servers.net.

j.root-servers.net.	3770	IN	A
k.root-servers.net.	13544	IN	A
l.root-servers.net.	3770	IN	A
m.root-servers.net.	9837	IN	A
m.root-servers.net.	2879	IN	AAAA	2001:dc3::35
a.root-servers.net.	1859	IN	A
a.root-servers.net.	8520	IN	AAAA	2001:503:ba3e::2:30
b.root-servers.net.	3769	IN	A
c.root-servers.net.	3769	IN	A
c.root-servers.net.	4860	IN	AAAA	2001:500:2::c
d.root-servers.net.	3770	IN	A
e.root-servers.net.	3769	IN	A
e.root-servers.net.	4860	IN	AAAA	2001:500:a8::e
f.root-servers.net.	3770	IN	A

;; Query time: 52 msec
;; WHEN: Mon May 22 13:06:33 2017
;; MSG SIZE  rcvd: 500

For the com. servers:

In [34]:
! dig com. NS

; <<>> DiG 9.8.3-P1 <<>> com. NS
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 44692
;; flags: qr rd ra; QUERY: 1, ANSWER: 13, AUTHORITY: 0, ADDITIONAL: 3

;com.				IN	NS

com.			7487	IN	NS	h.gtld-servers.net.
com.			7487	IN	NS	k.gtld-servers.net.
com.			7487	IN	NS	f.gtld-servers.net.
com.			7487	IN	NS	m.gtld-servers.net.
com.			7487	IN	NS	e.gtld-servers.net.
com.			7487	IN	NS	l.gtld-servers.net.
com.			7487	IN	NS	i.gtld-servers.net.
com.			7487	IN	NS	c.gtld-servers.net.
com.			7487	IN	NS	b.gtld-servers.net.
com.			7487	IN	NS	j.gtld-servers.net.
com.			7487	IN	NS	d.gtld-servers.net.
com.			7487	IN	NS	g.gtld-servers.net.
com.			7487	IN	NS	a.gtld-servers.net.

k.gtld-servers.net.	13982	IN	A
f.gtld-servers.net.	12137	IN	A
e.gtld-servers.net.	3178	IN	A

;; Query time: 51 msec
;; WHEN: Mon May 22 13:06:34 2017
;; MSG SIZE  rcvd: 293


In [35]:
import urlparse

def simplify_url(url, size=1):
    """ size is an integer that controls
    the complexity of the output URL:
    < 1 : no change
    1 : query removed
    2 : query and path removed
    > 2 : query, path, and scheme removed
    p = urlparse.urlparse(url)
    if size < 1:
        return url
    elif size == 1:
        return "{uri.scheme}://{uri.hostname}{uri.path}".format(uri=p)
    elif size == 2:
        return "{uri.scheme}://{uri.hostname}".format(uri=p)
    elif size > 2:
        return p.hostname

In [36]:
url = "http://www.example.com/path/to/page.html?arg1=a&arg2=b"

for i in range(4):
    print simplify_url(url, size=i)



In [37]:
import socket

# carriage return + line feed
CRLF = "\r\n"

# create a socket using the AF_INET (*IPv4*) address family
# and the SOCK_STREAM transport protocol (*TCP*)
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

# connect to port 80 (the HTTP port) on www.google.com
s.connect(('www.google.com', 80))

# print the remote IP address
print "IP Address:", s.getpeername()[0]

# form and send an HTTP GET request
s.send("GET / HTTP/1.1{n}Host: www.google.com{n}{n}".format(n=CRLF))

# receive data in two chunks of 4096 bytes, 
# which should be enough to get the <title>
response = s.recv(4096)
response += s.recv(4096)

# finally, close the TCP connection

IP Address:

In [38]:
print response[:2000]

HTTP/1.1 200 OK
Date: Mon, 22 May 2017 17:06:34 GMT
Expires: -1
Cache-Control: private, max-age=0
Content-Type: text/html; charset=ISO-8859-1
P3P: CP="This is not a P3P policy! See https://www.google.com/support/accounts/answer/151657?hl=en for more info."
Server: gws
X-XSS-Protection: 1; mode=block
X-Frame-Options: SAMEORIGIN
Set-Cookie: NID=103=eOEAA2srIUBDpggdKbh2QD6R59reK8LWb3h1UZHJKzgOMMqUnfbi1qyg24XfrmMmdEjRdy0x9XvKxVmEloJx5-glyLIX8SBTJPTHT-aqQcfxV5Tcx4jyYboriKkFFUFidBbeBQuk0wYyzkwxXQ; expires=Tue, 21-Nov-2017 17:06:34 GMT; path=/; domain=.google.com; HttpOnly
Accept-Ranges: none
Vary: Accept-Encoding
Transfer-Encoding: chunked

<!doctype html><html itemscope="" itemtype="http://schema.org/WebPage" lang="en"><head><meta content="Search the world's information, including webpages, images, videos and more. Google has many special features to help you find exactly what you're looking for." name="description"><meta content="noodp" name="robots"><meta content="text/html; charset=UTF-8" http-equiv="Content-Type"><meta content="/logos/doodles/2017/richard-oakes-75th-birthday-5656537195347968.3-l.png" itemprop="image"><meta content="Richard Oakes&#8217; 75th Birthday" property="twitter:title"><meta content="Richard Oakes&#8217; 75th Birthday #GoogleDoodle" property="twitter:description"><meta content="Richard Oakes&#8217; 75th Birthday #GoogleDoodle" property="og:description"><meta content="summary_large_image" property="twitter:card"><meta content="@GoogleDoodles" property="twitter:site"><meta content="https://www.google.com/logos/doodles/2017/richard-oakes-75th-birthday-5656537195347968.3-l.png" property="twitter:image"><meta content="https://www.google.com/logos/doodles/2017/richard-oakes-75th-birthday-5656537195347968.3-l.png" property="og:
image"><meta content="500" property="og:image:width"><meta content="200" property="og:image:height"><title>Google</title><script>(function(){window.google={kEI:'mhojWbPsE4fqUuqBlIAC',kEXPI:'201761,3700

In [39]:
from bs4 import BeautifulSoup

soup = BeautifulSoup(response, "lxml")
print soup.title.text