Evolutionary modeling


In [1]:
import research as r
import scipy
import pandas

def fitness_plots(path, treatment):
    D = r.load_files(r.find_files(path, treatment+"_.*/fitness.dat"))

    figure()
    for i,g in D.groupby('trial'):
        plot(g['update'], g['max_fitness'])
    ylabel('Fitness')
    xlabel('Update')
    
    figure()
    r.quick_ciplot('update','max_fitness', D)
    ylabel('Fitness')
    xlabel('Update')
#    savefig("centroid.pdf")
    final = D[D['update']==D['update'].max()]
    print "\nDominant fitness:"
    print final.ix[final['max_fitness'].idxmax()]
    return D

4/10/2014: expr 002-joint & 003-varjoint

002-joint attempts to discover the parameters describing the joint of two known normal distributions, where the genome holds exactly 2 reals (2 per distribution). 003-varjoint does the same, but the genome is allowed to vary in size. In both cases, fitness is again based on the K-S test statistic. These joint distributions were constructed via rejection sampling.

Results: They both work. Dominant individual for 002-joint discovered at 415u/26g in replicate ta0_30, while for 003-varjoint the dominant was discovered at 473u/33g. In 003-varjoint, the dominant genome held exactly 4 parameters; all ancestors began with a genome of length 2. This suggests that we'd be able to discover the decomposition of arbitrarily complex joint probability distributions.


In [4]:
D = fitness_plots('../var/002-joint','ta0')


Dominant fitness:
update                 500
mean_generation     26.039
min_fitness         1.8832
mean_fitness       34.9588
max_fitness           62.5
treatment              ta0
trial                   30
Name: 12023, dtype: object

In [5]:
D = fitness_plots('../var/003-varjoint','ta0')


Dominant fitness:
update                 500
mean_generation      33.76
min_fitness          1.845
mean_fitness       31.1983
max_fitness        58.8235
treatment              ta0
trial                   26
Name: 9518, dtype: object

4/7/2014: expr 001-normal

... where we try to match the parameters of a known normal distribution with a genetic algorithm. Fitness is $1.0/D_{n,n'}$, which is the Kolmogorov-Smirnov test statistic. Smaller values of this test statistic indicate less distance between the EDFs of the known normal distribution and that parameterized by the GA.

For this test, the known distribution has $\mu=10.0$ and $\sigma=1.0$. Each genome holds two evolvable real values, one each for $\mu$ and $\sigma$. During mutation, these values are drawn at random from the uniform distribution $[0.0,30.0)$.

The dominant individual, disovered at 271u/16g in trial ta0_9, has genome $[9.99492, 0.997972]$.


In [2]:
D = fitness_plots('../var/001-normal','ta0')


Dominant fitness:
update                 500
mean_generation     16.835
min_fitness              1
mean_fitness       46.0897
max_fitness        83.3333
treatment              ta0
trial                    9
Name: 15029, dtype: object

In [3]:
# for convenience, here is inverted fitness, which is exactly D_{n,n'}:
D['dnn'] = 1.0/D['max_fitness']
figure()
r.quick_ciplot('update','dnn', D)
ylabel('D_{n,n\'}')
xlabel('Update')


Out[3]:
<matplotlib.text.Text at 0x1100aea10>