Manual one-hot coding

Simultaneously creating the lookup dictionary and converting the string into indices:


In [1]:
import numpy as np

s = 'abcdabc'
mapping = {}
mapped_s = np.array([mapping.setdefault(c, len(mapping)) for c in s])
print(mapping)
print(mapped_s)


{'d': 3, 'a': 0, 'b': 1, 'c': 2}
[0 1 2 3 0 1 2]

Build dense one-hot matrix:


In [2]:
x = np.zeros((len(s), len(mapping)))
x[np.arange(x.shape[0]), mapped_s] = 1
print(x)


[[ 1.  0.  0.  0.]
 [ 0.  1.  0.  0.]
 [ 0.  0.  1.  0.]
 [ 0.  0.  0.  1.]
 [ 1.  0.  0.  0.]
 [ 0.  1.  0.  0.]
 [ 0.  0.  1.  0.]]