Dimensionality reduction

The reduce function reduces the dimensionality of an array or list of arrays. The default is to use Principal Component Analysis to reduce to three dimensions, but a variety of models are supported and users may specify a desired number of dimensions other than three.

Supported models include: PCA, IncrementalPCA, SparsePCA, MiniBatchSparsePCA, KernelPCA, FastICA, FactorAnalysis, TruncatedSVD, DictionaryLearning, MiniBatchDictionaryLearning, TSNE, Isomap, SpectralEmbedding, LocallyLinearEmbedding, MDS and UMAP.

Import Hypertools


In [ ]:
import hypertools as hyp

%matplotlib inline

Load your data

First, we'll load one of the sample datasets. This dataset is a list of 2 numpy arrays, each containing average brain activity (fMRI) from 18 subjects listening to the same story, fit using Hierarchical Topographic Factor Analysis (HTFA) with 100 nodes. The rows are timepoints and the columns are fMRI components.

See the full dataset or the HTFA article for more info on the data and HTFA, respectively.


In [ ]:
geo = hyp.load('weights_avg')
weights = geo.get_data()

Reduce one array

Let's look at one array from the dataset above.


In [ ]:
print('Array shape: (%d, %d)' % weights[0].shape)

To reduce this array, simply pass the array to hyp.reduce, as below. We can see that the data has been reduced from 100 features to 3 features (the default when desired number of features is not specified).


In [ ]:
reduced_array = hyp.reduce(weights[0])
print('Reduced array shape: (%d, %d)' % reduced_array[0].shape)

Reduce list of arrays

A list or numpy array of multiple arrays can also be reduced into a common space. That is, the data can be combined, reduced as a whole, then split back into individual elements and outputted via hyp.reduce.

Here we show this with two arrays in the weights dataset. First, let's examine the arrays in the weights dataset (below).

Now, let's reduce both arrays at once (by passing in the whole of the weights data) and re-examine the data.


In [ ]:
reduced_arrays = hyp.reduce(weights)
print('Shape of first reduced array: ', reduced_arrays[0].shape)
print('Shape of second reduced array: ', reduced_arrays[1].shape)

We can see that each array has been reduced from 100 features to 3 features (the default when desired number of features is not specified), with the number of datapoints unchanged.

Reduce list of arrays (TSNE)

You can also opt to use different reduction methods. In the example below, we reduce multiple arrays at once, using TSNE. The data is reduced to three dimensions(the default when desired number of features not specified).


In [ ]:
reduced_TSNE = hyp.reduce(weights, reduce='TSNE')
print('Shape of first reduced array: ',reduced_TSNE[0].shape)
print('Shape of second reduced array: ',reduced_TSNE[1].shape)

Reduce to specified number of dimensions

You may prefer to reduce to a specific number of features, rather than defaulting the three dimensions. To achieve this, simply pass the number of desired features (as an int) to the ndims argument, as below.


In [ ]:
reduced_4 = hyp.reduce(weights, ndims = 4)
print('Shape of first reduced array: ', reduced_4[0].shape)
print('Shape of second reduced array: ', reduced_4[1].shape)

Reduce list of arrays with specific parameters

For finer control of parameters, a dictionary of model parameters may be passed to the reduce argument, in addition to the desired reduction method. See scikit-learn model docs for details on parameters supported for each model.

Supported models include: PCA, IncrementalPCA, SparsePCA, MiniBatchSparsePCA, KernelPCA, FastICA, FactorAnalysis, TruncatedSVD, DictionaryLearning, MiniBatchDictionaryLearning, TSNE, Isomap, SpectralEmbedding, LocallyLinearEmbedding, and MDS.

The example below will reduce to the default of three features, since the desired number of features is not specified.


In [ ]:
reduced_params = hyp.reduce(weights, reduce={'model' : 'PCA', 'params' : {'whiten' : True}})