The normalize
is a helper function to z-score your data. This is useful if your features (columns) are scaled differently within or across datasets. By default, hypertools normalizes across the columns of all datasets passed, but also affords the option to normalize columns within individual lists. Alternatively, you can also normalize each row. The function returns an array or list of arrays where the columns or rows are z-scored (output type same as input type).
In [ ]:
import hypertools as hyp
import numpy as np
%matplotlib inline
First, we generate two sets of synthetic data. We pull points randomly from a multivariate normal distribution for each set, so the sets will exhibit unique statistical properties.
In [ ]:
x1 = np.random.randn(10,10)
x2 = np.random.randn(10,10)
c1 = np.dot(x1, x1.T)
c2 = np.dot(x2, x2.T)
m1 = np.zeros([1,10])
m2 = 10 + m1
data1 = np.random.multivariate_normal(m1[0], c1, 100)
data2 = np.random.multivariate_normal(m2[0], c2, 100)
data = [data1, data2]
In [ ]:
geo = hyp.plot(data, '.')
Or, to specify a different normalization, pass one of the following arguments as a string, as shown in the examples below.
When you normalize 'across', all of the data is stacked/combined, and the normalization is done on the columns of the full dataset. Then the data is split back into separate elements.
In [ ]:
norm = hyp.normalize(data, normalize = 'across')
geo = hyp.plot(norm, '.')
When you normalize 'within', normalization is done on the columns of each element of the data, separately.
In [ ]:
norm = hyp.normalize(data, normalize = 'within')
geo = hyp.plot(norm, '.')
In [ ]:
norm = hyp.normalize(data, normalize = 'row')
geo = hyp.plot(norm, '.')