Profiling


In [ ]:
# EXERCISE:
# Execute the following command:
!python -m timeit '"-".join([str(n) for n in range(100)])'

# Now execute the following:
!python -m timeit '"-".join(map(str, range(100)))'

# Now execute:
!python -m timeit --setup 'func = lambda n: "-".join(map(str, range(n)))' 'func(100)'

# And finally:
!python -m timeit --setup 'func = lambda n: "-".join(map(str, xrange(n)))' 'func(100)'

timeit module:

  • Provides a simple way to time the execution of Python statements.
  • Provides both command line and programatic interfaces.

In [ ]:
import timeit
print timeit.timeit(stmt='func(100)', setup='func = lambda n: "-".join(map(str, xrange(n)))', number=10000)

In [ ]:
def fibonacci(n):
    """Return the nth fibonacci number"""
    if n < 2:
        return n
    return fibonacci(n - 1) + fibonacci(n - 2)


def fib_15():
    return fibonacci(15)

In [ ]:
print timeit.timeit(stmt=fib_15, number=100)

In [ ]:
# Actually, a Timer class is provided inside timeit module

t = timeit.Timer(stmt=fib_15)
print t.repeat(repeat=3, number=100)

In [ ]:
# EXERCISE:
# Execute the following command:
!python -m cProfile fib_fac.py

# Now execute the following:
!python -m cProfile -s time fib_fac.py

# Now execute:
!python -m cProfile -s cumulative fib_fac.py

# And finally:
!python -m cProfile -s calls fib_fac.py

cProfile:

  • Deterministic profiling of Python programs.
  • C extension with reasonable overhead.
  • Provides both command line and programatic interfaces.

  • There is a pure Python alternative module with the same interface: profile


In [ ]:
import cProfile
import pstats


filename = "cprofile_fib_fac.log"
max_num_lines = 3

In [ ]:
# Note that in normal execution the import is not needed inside the statement string (incompatibility with pydemo)
cProfile.run(statement="from fib_fac import fib_fac; fib_fac()", filename=filename)

In [ ]:
stats = pstats.Stats(filename)
stats.strip_dirs().sort_stats('time').print_stats(max_num_lines)
stats.strip_dirs().sort_stats('cumulative').print_stats(max_num_lines)
stats.strip_dirs().sort_stats('calls').print_stats(max_num_lines)

Use pstats.Stats to parse and print cProfile output


In [ ]:
# Exercise: which option is better
def opc1():
    fruits = tuple(str(i) for i in xrange(100))
    out = ''
    for fruit in fruits:
        out += fruit +':'
    return out

def opc2():
    format_str = '%s:' * 100
    fruits = tuple(str(i) for i in xrange(100))
    out = format_str % fruits
    return out

def opc3():
    format_str = '{}:' * 100
    fruits = tuple(str(i) for i in xrange(100))
    out = format_str.format(*fruits)
    return out

def opc4():
    fruits = tuple(str(i) for i in xrange(100))
    out = ':'.join(fruits)
    return out

Networking

Standard library provides some modules for network operation:

  • socket: provides access to the low-level C BSD socket interface, includes a 'socket' class and some useful functions

  • urllib2: a library to perform HTTP requests (get, post, multipart...)

  • httplib: client side libraries of HTTP and HTTPS protocols, used by urllib2

  • urlparse: library with functions to parse URLs

  • Note that in Py3k urlparse, urllib and urllib2 have been merged in package urllib


In [ ]:
import socket


# In addition to typical socket class, some useful functions are provided
print socket.gethostname()
print socket.getfqdn()
print socket.gethostbyname(socket.getfqdn())

In [ ]:
#Let's see how to perform HTTP requests


import requests  # Requests is much better than any other standard library alternative

In [ ]:
location = "41.41,2.22"
key = "5nrhptjvus6gdnf9e6x75as9"
num_days = 3
url_pattern = "http://api.worldweatheronline.com/free/v1/weather.ashx?q={loc}&format=json&num_of_days={days}&key={key}"
r = requests.get(url=url_pattern.format(loc=location, days=num_days, key=key),
                 headers={'content-type': 'application/json'})  # It supports all HTTP methods, auth, proxies, post multipart...

In [ ]:
# Let's check the response
print r.status_code
print r.encoding
print r.text

In [ ]:
# And of course it parses the JSON
print type(r.json())  # Uses simplejson or std lib json

In [ ]:
from pprint import pprint
pprint(r.json()["data"]["current_condition"][0])

compare it with using urllib2

https://gist.github.com/kennethreitz/973705

  • For low level socket operations use 'socket'
  • Use 'requests' always if possible for HTTP operation
  • Use 'urllib2' or 'httplib' as a fallback for special behaviour

In [ ]:
# Implement a connection pool with requests
requestsSession = requests.session()
httpAdapter = requests.adapters.HTTPAdapter(pool_connections=10,
                                            pool_maxsize=15)
requestsSession.mount('http://', httpAdapter)

In [ ]:
requestsSession.get(url=url_pattern.format(loc=location, days=num_days, key=key),
                 headers={'content-type': 'application/json'})

In [ ]: