notebook.community
Edit and run
Reduce memory seen by softmax by expressing memory hierarchically
Based on REINFORCE hard attention and softmax Soft attention is used in combination I do not really understand
Addressing using Lee group I can move the head naturally
In [ ]: