Testing scipy.sparse matrices

Seeing how much space can be saved, if working with sparse data.

Testing worst case database sizes

In the worst case the database will be storing a single value for each pair of proteins. Saving this file is time consuming and even if it can be saved it might be impossible to index it.


In [1]:
cd ~/Documents/MRes/geneconversion/


/home/gavin/Documents/MRes/geneconversion

In [2]:
ls


gene2go  gene_info  human.Entrez.pickle  testgen.pickle

In [3]:
import pickle

In [4]:
f = open("human.Entrez.pickle")
ids = pickle.load(f)
f.close()

In [5]:
import itertools

In [7]:
sys.path.append("/home/gavin/Documents/MRes/opencast-bio/")
import ocbio.extract

In [20]:
cd /mnt/external/remotes/geneontology/


/mnt/external/remotes/geneontology

In [ ]:
db = ocbio.extract.openpairshelf("test.db")
for p in itertools.combinations(ids,2):
    db[p] = 1
db.close()

In [ ]:
print "something"

In [12]:
print len(ids)


18217