Step 3 - Back project the embeddings to individuals

written by R.A.I. Bethlehem, D. Margulies and M. Falkiewicz for the Autism Gradients project at Brainhack Cambridge 2017

In [1]:
# first import the input list from the csv file
import pandas as pd
# read in csv
df_phen = pd.read_csv('./data/SelectedSubjects.csv')
selected = list(df_phen.filename_1D)

In [2]:
len(selected)


Out[2]:
160

Run the back projection ysing pySTATIS


In [4]:
%%capture
import numpy as np 
import os
from pySTATIS import STATISData, ANISOSTATIS

In [5]:
sbj_obj_list = []

for i in selected:
    
    sbj_id  = i
    
    sbj_file = os.path.join("./data/Outputs/Embs/" + sbj_id + '_embedding_dense_res_veconly.npy')
    sbj_data = np.load(sbj_file)
    sbj_obj  = STATISData(sbj_data, ID=sbj_id)

    """
    STATISData
    X: input variables for a single entity
    ID: ID of the entity; can be a set
    ev: eigenvalues of the X columns, in case that X are principal components
    col_names, row_names: labels for rows and columns
    normalize: normalization method to use (None, 'zscore', 'double_center')
    hdf5: reference to hdf5 file
    """
    
    sbj_obj_list.append(sbj_obj)

In [6]:
len(sbj_obj_list)


Out[6]:
160

In [7]:
abide_statis = ANISOSTATIS(n_comps = 10)
abide_statis.fit(sbj_obj_list)


Stack datasets for GSVD...Done!
Getting indices... Done!
Observation masses: Done!
Computing ANISOSTATIS Criterion 1 weights...Done!
GSVD
GSVD: Weights... Done!
GSVD: SVD... Done!
GSVD: Factor scores and eigenvalues... Done!
Factor scores for observations... Done!
Calculating factor scores for datasets... Done!
Calculating contributions of observations... .Done!
Calculating contributions of variables... Done!
Calculating contributions of datasets... Done!
Calculating partial inertias for the datasets... Done!
STATIS finished successfully in 0.913 seconds

In [8]:
type(abide_statis.partial_factor_scores_)


Out[8]:
numpy.ndarray

In [9]:
abide_statis.partial_factor_scores_.shape


Out[9]:
(160, 392, 10)

In [10]:
out_array = abide_statis.partial_factor_scores_

In [11]:
np.save("./data/Outputs/Mean_Vec.npy", out_array)

In [ ]:


In [26]:
### OLD VERSION
#%%capture
#from pySTATIS import statis
#import numpy as np 
#import os

##load vectors
#names = list(xrange(392))
#X = [np.load("./data/Outputs/Embs/"+ os.path.basename(filename)+"_embedding_dense_res_veconly.npy") for filename in selected]
#out = statis.statis(X, names, fname='./data/Outputs/statis_results.npy')
#statis.project_back(X, out['Q'], path = "./data/Outputs/Regs/",fnames = selected)
#np.save("./data/Outputs/Mean_Vec.npy",out['F'])

In [ ]: