Homework 2: Survival Driven Development

Survival Driven Development (SDD) is the newest software development fad. In this development framework, you specify what the software is supposed to do, then randomly generate source code to fulfill these requirements. Most of these attempts will fail, but hopefully one will succeed.

Your task is to use SDD to make a function to approximate x**2 + x.

Hint 1: Randomly generate lambda functions using a restricted vocabulary of source fragments.
vocab = ['x', 'x', ' ', '+', '-', '*', '/', '1', '2', '3']

Hint 2: Only evaluate x at a small-ish number of values and save the difference between those answers and the true value of x**2 + x.

Hint 3: SDD is error prone. Be sure to catch your errors!


In [1]:
import numpy
import random

In [2]:
vocab = ['x', 'x', ' ', '+', '-', '*', '/', '1', '2', '3']
n_chars = 10  # how many characters to try
n_tries = 10000  # how many trials before we give up

x = numpy.arange(-3, 3, 0.4)
known_y = x**2 + x

In [3]:
def build_exp(voc, n_chars):
    """Build a (str) expression of n_chars from the vocabular list voc"""
    exp = ''
    for n in range(n_chars):
        i = random.randint(0, len(voc)-1)
        exp += voc[i]
    return exp

In [4]:
def get_err(y, Y):
    """Compute the aggregate error between y and Y"""
    sq = numpy.abs(y - Y)
    return numpy.sum(sq)

In [8]:
err = None
exp = None
exp_log = []
TOL = 1.0e-6

for n in xrange(n_tries):
    exp = build_exp(vocab, n_chars)

    # build a string to define a lambda function
    temp = 'f = lambda x: ' + exp
    temp += '\n'
    # evaluate the new function with argument x, store the result as rez
    temp += 'rez = f(x)'

    try:
        exec(temp)
        err = get_err(known_y, rez)
    except Exception as e:
        # failed to compute function or error. move on to test try
        continue

    exp_log.append((err, exp))

    if err < TOL:
        print 'success'
        break
else:
    print 'failure'

exp_log = sorted(exp_log)
best_err, best_exp = exp_log[0]

print 'best error', err
print 'best exp  ', best_exp


failure
best error 50.4
best exp    1+x+x * x

If you'd rather, you can do this all within one function.


In [17]:
import numpy as np
from random import randint

def generate_function(X,Y, voc, max_try=1000000, max_chars=10):
    ''' find the analytic form that describes Y on X '''
    tries = []
    for n in xrange(max_try):
        ## make some random function using the vocabulary
        thefunc = "".join([voc[randint(0,len(voc)-1)] for x in range(randint(1,max_chars))])
        ## construct two python statement, declaring the lambda function and evaluating it at X
        mylam = "y = lambda x: " + thefunc + "\n"
        mylam += "rez = y(X)"
        try:
            ## this may be volitile so be warned!
            ## Couch everything in error statements, and
            ##  simply throw away functions that aren't reasonable
            exec(mylam)
        except:
            continue
        try: 
            tries.append( ( (abs(rez - Y).sum()) ,thefunc))
            if (abs(rez - Y)).sum() < 0.0001:
                ## we got something really close
                break
        except:
            pass
        del rez
        del y
        
    ### numpy arrays handle NaN and INF gracefully, so we put
    ### answer into an array before sorting
    a = np.array(tries,dtype=[('rez','f'),("func",'|S10')])
    a.sort()
    
    if a[0]["rez"] < 0.001:
        print "took us ntries = {0}, but we eventually found that '{1}' is functionally equivalent to f(X)".format(n,a[0]["func"])
    else:
        print "after ntries = {0}, we found that '{1}' is close to f(x) (metric = {2})".format(n,a[0]["func"],a[0]["rez"])
    
    return a[0]
    

voc = ["x","x"," ","+","-","*","/","1","2","3"]

x_array       = np.arange(-3,3,0.4)
real_function = x_array**2 + x_array
generate_function(x_array, real_function, voc, max_try=10000)


after ntries = 9999, we found that '++x*x' is close to f(x) (metric = 22.6000003815)
Out[17]:
(22.600000381469727, '++x*x')

In [ ]: