Genetic Algorithm (Example)

Problem Statement Family members from different places in the country wish to meet up in New York. They will arrive on the same day and leave on the same day, and they would like to share transportation to and from the airport. There are dozens of flights per day to NY from any of the family members locations, all leaving at different times. The flights also vary in price and duration.

Example taken from: "Building Smart Web 2.0 Applications: Programming Collective Intelligence" by Toby Segran


In [1]:
import collections
import time
import random
import math
import matplotlib.pyplot as plt

%matplotlib inline

The Family


In [2]:
# Family members and their point of origin
people = [
      ('Seymour', 'BOS'),
      ('Franny', 'DAL'),
      ('Zooey', 'CAK'),
      ('Walt', 'MIA'),
      ('Buddy', 'ORD'),
      ('Les', 'OMA'),
      ]

# Destination
destination = 'LGA'  # LaGuardia (New York)

Function to extract flight data

Data is placed into a dictionary where the keys = (origin,destination) and values are flight details.


In [3]:
# Get Flight information
def parseCsv(lines):
  flights = collections.defaultdict(list)
  for line in lines:
    origin, dest, depart, arrive, price = line.strip().split(',')
    
    #Add details to the list of possible flights
    flights[(origin, dest)].append( (depart, arrive, int(price)) )
  return flights

Flight Data


In [4]:
# Information about possible flights that can be taken
# "schedule.txt" contains, comma-separated, information about origin, destination, departure time, arrival time, and price.
flights = parseCsv(open('schedule.txt'))

Utility Functions


In [5]:
# Utility function to calculate how many minutes, into a day, a time is.
# This will be used to calculate flight times and waiting times
def getminutes(t):
    x = time.strptime(t,'%H:%M')
    return x[3]*60+x[4]

In [6]:
# This function takes the solution (list of numbers) and prints out a schedule that is easier to interpret.
# This function prints each persons name and origin, as well as the depature time, arrival time, and 
# price for outgoing and return flights.
def printschedule(r,dest):
  for d in range(len(r)/2):
    name = people[d][0]
    origin = people[d][1]
    out = flights[(origin, dest)][r[d]]
    ret = flights[(origin, dest)][r[d + 1]]
    print '%10s%10s %5s-%5s $%3s %5s-%5s $%3s' % (
        name, origin, out[0], out[1], out[2], ret[0], ret[1], ret[2])

We will be representing the solution as a list of numbers: The number represents a flight a person chooses to take (e.g., 0=first flight of day, 1=second flight of day, etc.).

Note: Since each person needs to take an inbound and outbound flight, the lenegth of the list is twice the number of people.


In [7]:
# Example
s = [1,4,3,2,7,3,6,3,2,4,5,3]
printschedule(s,'LGA')


   Seymour       BOS  8:04-10:11 $ 95 12:34-15:02 $109
    Franny       DAL 12:19-15:25 $342 10:30-14:57 $290
     Zooey       CAK 10:53-13:36 $189  9:15-12:14 $247
      Walt       MIA  9:15-12:29 $225 17:07-20:04 $291
     Buddy       ORD 16:43-19:00 $246 11:01-12:39 $260
       Les       OMA 11:08-13:07 $175 15:03-16:42 $135

Cost Function

The cost function is what is going to help you optimize. The goal is to find a set of inputs (in this case flights) that minimizes the cost function.

Here cost = total price + total wait


In [8]:
# This function takes into account the total trip and the total time spent waiting at the
# airport for the various members of the family.
#
# The logic is that total wait time assumes that all the family members will leave the airport
# together when the last person arrives, and will all go to the airport for the earliest 
# departure.
#
# The goal is to minimize the cost by choosing the correct set of numbers.
def costfunc(flights, dest):
  def schedulecost(sol):
    totalprice = 0
    latestarrival = 0
    earliestdep = 24*60

    for d in range(len(sol) / 2):
      origin = people[d][1]
      outbound = flights[(origin, dest)][int(sol[d])]
      returnf = flights[(dest, origin)][int(sol[d+1])]

      totalprice += outbound[2] + returnf[2]

      # Track the latest arrival and earliest departure 
      latestarrival = max(latestarrival, getminutes(outbound[1]))
      earliestdep = min(latestarrival, getminutes(returnf[0]))

    # Every person must wait until the last person arrives.
    # They must also arrive when the first flight leaves
    totalwait = 0
    for d in range(len(sol)/2):
      origin = people[d][1]
      outbound = flights[(origin, dest)][int(sol[d])]
      returnf = flights[(dest, origin)][int(sol[d+1])]
      totalwait += latestarrival - getminutes(outbound[1])
      totalwait += getminutes(returnf[0]) - earliestdep

    # One additional day of car rental fees?
    if latestarrival >= earliestdep: totalprice += 50

    return totalprice + totalwait
  return schedulecost

In [9]:
def geneticoptimize(population,costf, popsize=50, step=1,
    mutprob=0.2, elite=0.2, maxiter=100):
  
  # Make a small, simple random change to an existing solution  
  def mutate(vec):
 
    # Get a random integer      
    i = random.randint(0, len(population)-1)
 
    if random.random() < 0.5:
      # Decreasing value  
      return vec[0:i] + [vec[i] - step] + vec[i+1:]
    else:
      # Increasing value  
      return vec[0:i] + [vec[i] + step] + vec[i+1:]
    return vec

  def crossover(r1, r2):
    i = random.randint(1, len(population)-2)
    return r1[0:i] + r2[i:]

  # Starting population
  pop = [[random.randint(population[j][0],population[j][1])
    for j in range(len(population))]
    for i in range(popsize)]

  # This is the number of winners we want from each generation 
  topelite = int(elite*popsize)
  
  # Variable to hold cost history
  costhistory = [None]*maxiter

  for i in range(maxiter):
    
    # Sort the population by cost
    ranked = sorted(pop, key=costf)

    # add winners
    pop = ranked[0:topelite]
        
    # add mutated and bred forms of the winners
    while len(pop) < popsize:
      if random.random() < mutprob:
        # Randomly pick a winner and mutate    
        newIndividual = mutate(ranked[random.randint(0, topelite-1)])
      else:
        # Randomly pick 2 winners and perform a cross-over    
        newIndividual = crossover(ranked[random.randint(0, topelite-1)],ranked[random.randint(0, topelite-1)])
      
      pop.append(newIndividual)

    # Print current best score
    costhistory[i] = costf(ranked[0])

  plt.plot(costhistory)
  plt.xlabel('Iteration Number')
  plt.ylabel('Cost')
    
  # Return current best guy
  return ranked[0]# Random solutions

Run optimization


In [10]:
# Cost function that will be used to find "best" solution
f = costfunc(flights, destination)

# Initial Population
population = [(0, 8)] * (len(people) * 2)

# Run genetic algorithm
r = geneticoptimize(population, f)

# Print final schedule and cost
printschedule(r, destination)
print 'Final Cost = ' + str(f(r))


   Seymour       BOS 17:11-18:30 $108  6:17- 8:26 $ 89
    Franny       DAL  6:12-10:22 $230 10:30-14:57 $290
     Zooey       CAK 10:53-13:36 $189 13:40-15:38 $137
      Walt       MIA 14:01-17:24 $338  7:34- 9:40 $324
     Buddy       ORD  8:25-10:34 $157  9:42-11:32 $169
       Les       OMA  9:15-12:03 $ 99 18:12-20:17 $242
Final Cost = 1282

In [10]:


In [10]: