Phylogenetic Invariants


In [1]:
import numpy as np

In [2]:
pi_string = """\
AAAA AAAC AAAG AAAT  AACA AACC AACG AACT  AAGA AAGC AAGG AAGT  AATA AATC AATG AATT
ACAA ACAC ACAG ACAT  ACCA ACCC ACCG ACCT  ACGA ACGC ACGG ACGT  ACTA ACTC ACTG ACTT
AGAA AGAC AGAG AGAT  AGCA AGCC AGCG AGCT  AGGA AGGC AGGG AGGT  AGTA AGTC AGTG AGTT
ATAA ATAC ATAG ATAT  ATCA ATCC ATCG ATCT  ATGA ATGC ATGG ATGT  ATTA ATTC ATTG ATTT

CAAA CAAC CAAG CAAT  CACA CACC CACG CACT  CAGA CAGC CAGG CAGT  CATA CATC CATG CATT
CCAA CCAC CCAG CCAT  CCCA CCCC CCCG CCCT  CCGA CCGC CCGG CCGT  CCTA CCTC CCTG CCTT
CGAA CGAC CGAG CGAT  CGCA CGCC CGCG CGCT  CGGA CGGC CGGG CGGT  CGTA CGTC CGTG CGTT
CTAA CTAC CTAG CTAT  CTCA CTCC CTCG CTCT  CTGA CTGC CTGG CTGT  CTTA CTTC CTTG CTTT

GAAA GAAC GAAG GAAT  GACA GACC GACG GACT  GAGA GAGC GAGG GAGT  GATA GATC GATG GATT
GCAA GCAC GCAG GCAT  GCCA GCCC GCCG GCCT  GCGA GCGC GCGG GCGT  GCTA GCTC GCTG GCTT
GGAA GGAC GGAG GGAT  GGCA GGCC GGCG GGCT  GGGA GGGC GGGG GGGT  GGTA GGTC GGTG GGTT
GTAA GTAC GTAG GTAT  GTCA GTCC GTCG GTCT  GTGA GTGC GTGG GTGT  GTTA GTTC GTTG GTTT

TAAA TAAC TAAG TAAT  TACA TACC TACG TACT  TAGA TAGC TAGG TAGT  TATA TATC TATG TATT
TCAA TCAC TCAG TCAT  TCCA TCCC TCCG TCCT  TCGA TCGC TCGG TCGT  TCTA TCTC TCTG TCTT
TGAA TGAC TGAG TGAT  TGCA TGCC TGCG TGCT  TGGA TGGC TGGG TGGT  TGTA TGTC TGTG TGTT
TTAA TTAC TTAG TTAT  TTCA TTCC TTCG TTCT  TTGA TTGC TTGG TTGT  TTTA TTTC TTTG TTTT
"""

## print as string
print pi_string

## also save as indexable array
pi_arr = np.array(pi_string.split()).reshape(16,16)
print pi_arr[:4,:4]


AAAA AAAC AAAG AAAT  AACA AACC AACG AACT  AAGA AAGC AAGG AAGT  AATA AATC AATG AATT
ACAA ACAC ACAG ACAT  ACCA ACCC ACCG ACCT  ACGA ACGC ACGG ACGT  ACTA ACTC ACTG ACTT
AGAA AGAC AGAG AGAT  AGCA AGCC AGCG AGCT  AGGA AGGC AGGG AGGT  AGTA AGTC AGTG AGTT
ATAA ATAC ATAG ATAT  ATCA ATCC ATCG ATCT  ATGA ATGC ATGG ATGT  ATTA ATTC ATTG ATTT

CAAA CAAC CAAG CAAT  CACA CACC CACG CACT  CAGA CAGC CAGG CAGT  CATA CATC CATG CATT
CCAA CCAC CCAG CCAT  CCCA CCCC CCCG CCCT  CCGA CCGC CCGG CCGT  CCTA CCTC CCTG CCTT
CGAA CGAC CGAG CGAT  CGCA CGCC CGCG CGCT  CGGA CGGC CGGG CGGT  CGTA CGTC CGTG CGTT
CTAA CTAC CTAG CTAT  CTCA CTCC CTCG CTCT  CTGA CTGC CTGG CTGT  CTTA CTTC CTTG CTTT

GAAA GAAC GAAG GAAT  GACA GACC GACG GACT  GAGA GAGC GAGG GAGT  GATA GATC GATG GATT
GCAA GCAC GCAG GCAT  GCCA GCCC GCCG GCCT  GCGA GCGC GCGG GCGT  GCTA GCTC GCTG GCTT
GGAA GGAC GGAG GGAT  GGCA GGCC GGCG GGCT  GGGA GGGC GGGG GGGT  GGTA GGTC GGTG GGTT
GTAA GTAC GTAG GTAT  GTCA GTCC GTCG GTCT  GTGA GTGC GTGG GTGT  GTTA GTTC GTTG GTTT

TAAA TAAC TAAG TAAT  TACA TACC TACG TACT  TAGA TAGC TAGG TAGT  TATA TATC TATG TATT
TCAA TCAC TCAG TCAT  TCCA TCCC TCCG TCCT  TCGA TCGC TCGG TCGT  TCTA TCTC TCTG TCTT
TGAA TGAC TGAG TGAT  TGCA TGCC TGCG TGCT  TGGA TGGC TGGG TGGT  TGTA TGTC TGTG TGTT
TTAA TTAC TTAG TTAT  TTCA TTCC TTCG TTCT  TTGA TTGC TTGG TTGT  TTTA TTTC TTTG TTTT

[['AAAA' 'AAAC' 'AAAG' 'AAAT']
 ['ACAA' 'ACAC' 'ACAG' 'ACAT']
 ['AGAA' 'AGAC' 'AGAG' 'AGAT']
 ['ATAA' 'ATAC' 'ATAG' 'ATAT']]

Create a simple alignment for four taxa

Ordered so the first two rows have a more similar sequence than the latter two (e.g., see CATCAT vs. GATGAT).


In [3]:
seqalign = np.array([
    list("CATCATCATCAT")*10,
    list("CATCATGATCAT")*10,
    list("GATGATGATCCC")*10,
    list("GATGATGATCCC")*10])

print seqalign[:, :10], '...'


[['C' 'A' 'T' 'C' 'A' 'T' 'C' 'A' 'T' 'C']
 ['C' 'A' 'T' 'C' 'A' 'T' 'G' 'A' 'T' 'C']
 ['G' 'A' 'T' 'G' 'A' 'T' 'G' 'A' 'T' 'C']
 ['G' 'A' 'T' 'G' 'A' 'T' 'G' 'A' 'T' 'C']] ...

A trick for dealing with large seq arrays

We convert it to integers because they are easier to deal with than characters. The 'view' argument is used here to convert chars to 8bit integers. I then reassign the ints as 0, 1, 2, or 3, which can be used to quickly fill the the count array using a set of rules designated below.


In [4]:
## print as char
print seqalign[:, :10]

## get a copy with dtype=int8
seqint = np.copy(seqalign.view(np.int8))
print seqint[:, :10]

## convert ints to indexes of matrix
seqint[seqint == 65] = 0    ## As
seqint[seqint == 67] = 1    ## Cs
seqint[seqint == 71] = 2    ## Ts
seqint[seqint == 84] = 3    ## Gs

print seqint[:, :10]


[['C' 'A' 'T' 'C' 'A' 'T' 'C' 'A' 'T' 'C']
 ['C' 'A' 'T' 'C' 'A' 'T' 'G' 'A' 'T' 'C']
 ['G' 'A' 'T' 'G' 'A' 'T' 'G' 'A' 'T' 'C']
 ['G' 'A' 'T' 'G' 'A' 'T' 'G' 'A' 'T' 'C']]
[[67 65 84 67 65 84 67 65 84 67]
 [67 65 84 67 65 84 71 65 84 67]
 [71 65 84 71 65 84 71 65 84 67]
 [71 65 84 71 65 84 71 65 84 67]]
[[1 0 3 1 0 3 1 0 3 1]
 [1 0 3 1 0 3 2 0 3 1]
 [2 0 3 2 0 3 2 0 3 1]
 [2 0 3 2 0 3 2 0 3 1]]

Fill the invariants counts array

This records the number of times each of the patterns in this array are observed. We will also fill two "alternate arrays", representing the site patterns observed for other two possible resolutions of our quartet. This first array assumes taxa are ordered as 12|34. We can fill the two alternates (13|24 and 14|23) from the patterns observed here.


In [5]:
## create a 16x16 array of zeros
invcounts = np.zeros((16,16), dtype=int)

## fill the array according to a set of rules
for idx in xrange(seqalign.shape[1]):
    i = seqint[:, idx] 
    invcounts[(4*i[0])+i[1], (4*i[2])+i[3]] += 1
    
print invcounts


[[30  0  0  0  0 10  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0 10  0  0  0  0 20  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0 10  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0 10  0  0  0  0  0  0  0  0  0 30]]

Fill the three alternative matrices with site counts


In [6]:
## Let's create three arrays 
mats_ints = np.zeros((3, 16, 16), dtype=int)
mats_ints[0] = invcounts

## Fill the alternates
x = 0
for i in xrange(0, 16, 4):
    for j in xrange(0, 16, 4):
        mats_ints[1, i:i+4, j:j+4] = mats_ints[0, x].reshape(4, 4)
        mats_ints[2, i:i+4, j:j+4] = mats_ints[0, x].reshape(4, 4).T
        x += 1

print mats_ints[:, :, :]


[[[30  0  0  0  0 10  0  0  0  0  0  0  0  0  0  0]
  [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
  [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
  [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
  [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
  [ 0  0  0  0  0 10  0  0  0  0 20  0  0  0  0  0]
  [ 0  0  0  0  0  0  0  0  0  0 10  0  0  0  0  0]
  [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
  [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
  [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
  [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
  [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
  [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
  [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
  [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
  [ 0  0  0  0  0 10  0  0  0  0  0  0  0  0  0 30]]

 [[30  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
  [ 0 10  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
  [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
  [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
  [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
  [ 0  0  0  0  0 10  0  0  0  0  0  0  0  0  0  0]
  [ 0  0  0  0  0  0 20  0  0  0 10  0  0  0  0  0]
  [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
  [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
  [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
  [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
  [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
  [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
  [ 0  0  0  0  0  0  0  0  0  0  0  0  0 10  0  0]
  [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
  [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 30]]

 [[30  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
  [ 0 10  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
  [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
  [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
  [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
  [ 0  0  0  0  0 10  0  0  0  0  0  0  0  0  0  0]
  [ 0  0  0  0  0  0 20  0  0  0 10  0  0  0  0  0]
  [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
  [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
  [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
  [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
  [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
  [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
  [ 0  0  0  0  0  0  0  0  0  0  0  0  0 10  0  0]
  [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
  [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 30]]]

What does this look like as strings?

We print this just for our edification.


In [7]:
## What does it look like as character strings
mats_strs = np.zeros((3, 16, 16), dtype="S4")
mats_strs[0] = pi_arr

## fill alternates
x = 0
for i in xrange(0, 16, 4):
    for j in xrange(0, 16, 4):
        mats_strs[1, i:i+4, j:j+4] = mats_strs[0, x].reshape(4, 4)
        mats_strs[2, i:i+4, j:j+4] = mats_strs[0, x].reshape(4, 4).T
        x += 1

## print just first 10 columns so that it's more readable
print mats_strs[:, :, :10]


[[['AAAA' 'AAAC' 'AAAG' 'AAAT' 'AACA' 'AACC' 'AACG' 'AACT' 'AAGA' 'AAGC']
  ['ACAA' 'ACAC' 'ACAG' 'ACAT' 'ACCA' 'ACCC' 'ACCG' 'ACCT' 'ACGA' 'ACGC']
  ['AGAA' 'AGAC' 'AGAG' 'AGAT' 'AGCA' 'AGCC' 'AGCG' 'AGCT' 'AGGA' 'AGGC']
  ['ATAA' 'ATAC' 'ATAG' 'ATAT' 'ATCA' 'ATCC' 'ATCG' 'ATCT' 'ATGA' 'ATGC']
  ['CAAA' 'CAAC' 'CAAG' 'CAAT' 'CACA' 'CACC' 'CACG' 'CACT' 'CAGA' 'CAGC']
  ['CCAA' 'CCAC' 'CCAG' 'CCAT' 'CCCA' 'CCCC' 'CCCG' 'CCCT' 'CCGA' 'CCGC']
  ['CGAA' 'CGAC' 'CGAG' 'CGAT' 'CGCA' 'CGCC' 'CGCG' 'CGCT' 'CGGA' 'CGGC']
  ['CTAA' 'CTAC' 'CTAG' 'CTAT' 'CTCA' 'CTCC' 'CTCG' 'CTCT' 'CTGA' 'CTGC']
  ['GAAA' 'GAAC' 'GAAG' 'GAAT' 'GACA' 'GACC' 'GACG' 'GACT' 'GAGA' 'GAGC']
  ['GCAA' 'GCAC' 'GCAG' 'GCAT' 'GCCA' 'GCCC' 'GCCG' 'GCCT' 'GCGA' 'GCGC']
  ['GGAA' 'GGAC' 'GGAG' 'GGAT' 'GGCA' 'GGCC' 'GGCG' 'GGCT' 'GGGA' 'GGGC']
  ['GTAA' 'GTAC' 'GTAG' 'GTAT' 'GTCA' 'GTCC' 'GTCG' 'GTCT' 'GTGA' 'GTGC']
  ['TAAA' 'TAAC' 'TAAG' 'TAAT' 'TACA' 'TACC' 'TACG' 'TACT' 'TAGA' 'TAGC']
  ['TCAA' 'TCAC' 'TCAG' 'TCAT' 'TCCA' 'TCCC' 'TCCG' 'TCCT' 'TCGA' 'TCGC']
  ['TGAA' 'TGAC' 'TGAG' 'TGAT' 'TGCA' 'TGCC' 'TGCG' 'TGCT' 'TGGA' 'TGGC']
  ['TTAA' 'TTAC' 'TTAG' 'TTAT' 'TTCA' 'TTCC' 'TTCG' 'TTCT' 'TTGA' 'TTGC']]

 [['AAAA' 'AAAC' 'AAAG' 'AAAT' 'ACAA' 'ACAC' 'ACAG' 'ACAT' 'AGAA' 'AGAC']
  ['AACA' 'AACC' 'AACG' 'AACT' 'ACCA' 'ACCC' 'ACCG' 'ACCT' 'AGCA' 'AGCC']
  ['AAGA' 'AAGC' 'AAGG' 'AAGT' 'ACGA' 'ACGC' 'ACGG' 'ACGT' 'AGGA' 'AGGC']
  ['AATA' 'AATC' 'AATG' 'AATT' 'ACTA' 'ACTC' 'ACTG' 'ACTT' 'AGTA' 'AGTC']
  ['CAAA' 'CAAC' 'CAAG' 'CAAT' 'CCAA' 'CCAC' 'CCAG' 'CCAT' 'CGAA' 'CGAC']
  ['CACA' 'CACC' 'CACG' 'CACT' 'CCCA' 'CCCC' 'CCCG' 'CCCT' 'CGCA' 'CGCC']
  ['CAGA' 'CAGC' 'CAGG' 'CAGT' 'CCGA' 'CCGC' 'CCGG' 'CCGT' 'CGGA' 'CGGC']
  ['CATA' 'CATC' 'CATG' 'CATT' 'CCTA' 'CCTC' 'CCTG' 'CCTT' 'CGTA' 'CGTC']
  ['GAAA' 'GAAC' 'GAAG' 'GAAT' 'GCAA' 'GCAC' 'GCAG' 'GCAT' 'GGAA' 'GGAC']
  ['GACA' 'GACC' 'GACG' 'GACT' 'GCCA' 'GCCC' 'GCCG' 'GCCT' 'GGCA' 'GGCC']
  ['GAGA' 'GAGC' 'GAGG' 'GAGT' 'GCGA' 'GCGC' 'GCGG' 'GCGT' 'GGGA' 'GGGC']
  ['GATA' 'GATC' 'GATG' 'GATT' 'GCTA' 'GCTC' 'GCTG' 'GCTT' 'GGTA' 'GGTC']
  ['TAAA' 'TAAC' 'TAAG' 'TAAT' 'TCAA' 'TCAC' 'TCAG' 'TCAT' 'TGAA' 'TGAC']
  ['TACA' 'TACC' 'TACG' 'TACT' 'TCCA' 'TCCC' 'TCCG' 'TCCT' 'TGCA' 'TGCC']
  ['TAGA' 'TAGC' 'TAGG' 'TAGT' 'TCGA' 'TCGC' 'TCGG' 'TCGT' 'TGGA' 'TGGC']
  ['TATA' 'TATC' 'TATG' 'TATT' 'TCTA' 'TCTC' 'TCTG' 'TCTT' 'TGTA' 'TGTC']]

 [['AAAA' 'AACA' 'AAGA' 'AATA' 'ACAA' 'ACCA' 'ACGA' 'ACTA' 'AGAA' 'AGCA']
  ['AAAC' 'AACC' 'AAGC' 'AATC' 'ACAC' 'ACCC' 'ACGC' 'ACTC' 'AGAC' 'AGCC']
  ['AAAG' 'AACG' 'AAGG' 'AATG' 'ACAG' 'ACCG' 'ACGG' 'ACTG' 'AGAG' 'AGCG']
  ['AAAT' 'AACT' 'AAGT' 'AATT' 'ACAT' 'ACCT' 'ACGT' 'ACTT' 'AGAT' 'AGCT']
  ['CAAA' 'CACA' 'CAGA' 'CATA' 'CCAA' 'CCCA' 'CCGA' 'CCTA' 'CGAA' 'CGCA']
  ['CAAC' 'CACC' 'CAGC' 'CATC' 'CCAC' 'CCCC' 'CCGC' 'CCTC' 'CGAC' 'CGCC']
  ['CAAG' 'CACG' 'CAGG' 'CATG' 'CCAG' 'CCCG' 'CCGG' 'CCTG' 'CGAG' 'CGCG']
  ['CAAT' 'CACT' 'CAGT' 'CATT' 'CCAT' 'CCCT' 'CCGT' 'CCTT' 'CGAT' 'CGCT']
  ['GAAA' 'GACA' 'GAGA' 'GATA' 'GCAA' 'GCCA' 'GCGA' 'GCTA' 'GGAA' 'GGCA']
  ['GAAC' 'GACC' 'GAGC' 'GATC' 'GCAC' 'GCCC' 'GCGC' 'GCTC' 'GGAC' 'GGCC']
  ['GAAG' 'GACG' 'GAGG' 'GATG' 'GCAG' 'GCCG' 'GCGG' 'GCTG' 'GGAG' 'GGCG']
  ['GAAT' 'GACT' 'GAGT' 'GATT' 'GCAT' 'GCCT' 'GCGT' 'GCTT' 'GGAT' 'GGCT']
  ['TAAA' 'TACA' 'TAGA' 'TATA' 'TCAA' 'TCCA' 'TCGA' 'TCTA' 'TGAA' 'TGCA']
  ['TAAC' 'TACC' 'TAGC' 'TATC' 'TCAC' 'TCCC' 'TCGC' 'TCTC' 'TGAC' 'TGCC']
  ['TAAG' 'TACG' 'TAGG' 'TATG' 'TCAG' 'TCCG' 'TCGG' 'TCTG' 'TGAG' 'TGCG']
  ['TAAT' 'TACT' 'TAGT' 'TATT' 'TCAT' 'TCCT' 'TCGT' 'TCTT' 'TGAT' 'TGCT']]]

Calculate the matrix rank

In this case you can clearly see that the first matrix is the better fit, it has two values that vanish, as we expect invariants to do, and thus the rank is 4, where it is 6 for the other two topologies.


In [8]:
### Well, first we can just calculate the matrix rank
### b/c if they have different rank then we just choose 
### the lowest one
for mat in xrange(3):
    print "mat={}, rank={}".format(mat, np.linalg.matrix_rank(mats_ints[mat, :, :]))


mat=0, rank=4
mat=1, rank=6
mat=2, rank=6

Why no SVD?

We didn't need it because there were so many empty columns/rows. However, most of the time with real data, and especially large data sets, the matrices will have little or no empty columns, and so the matrices will all be of equal rank. In that case we need to use SVD to calculate which invariants approach most closely to zero from the SVD scores.

Here we calculate the squared sum of SVD scores above matrix rank 4, the minimum rank observed in the data set... (not sure why this is, exactly).


In [9]:
for idx in xrange(3):
    ## if all 
    np.linalg.matrix_rank(mats_ints[idx])

    ## calculate SVD
    scores = np.linalg.svd(mats_ints[idx])[1]
    
    ## sum square of X smallest scores
    print "mat={}, rank={}, sum={}"\
           .format(idx, 
                   np.linalg.matrix_rank(mats_ints[idx]),
                   np.sqrt(np.sum(scores[4:]**2)))


mat=0, rank=4, sum=2.08152798678e-15
mat=1, rank=6, sum=14.1421356237
mat=2, rank=6, sum=14.1421356237

There we have it, the first topology (12|34) is best.