In [1]:
%matplotlib inline
from __future__ import division, print_function
import numpy as np
import os
In [64]:
import SHS_data
uris, ids = SHS_data.read_uris()
Out[64]:
In [48]:
reload(SHS_data)
cliques_by_name, cliques_by_id = SHS_data.read_cliques()
In [4]:
reload(SHS_data)
train_cliques, test_cliques, val_cliques = SHS_data.split_train_test_validation(cliques_by_name)
First idea: get URI's and a ground truth matrix.
But maybe that's not what we want: 18K x 18K ground truth matrix in dense form = 2Gb.
Therefore: just URI's for now.
Should this function be in SHS_data or somewhere more general?
Probably somewhere more general, but there is no such somewhere for now, so leaving it in place.
In [44]:
reload(SHS_data)
train_uris = SHS_data.uris_from_clique_dict(train_cliques)
In [47]:
In [ ]: