Papers and readings :

- EMNLP
- ACL 2018
- 1412.3555 arxiv
- seq2seq with attention 
- SIGIR2018 NLP tutorial
- Pointer networks ( handling Out of vocabulary )
- PCNN
- Neural Relation Extraction with Selective attention over instances
- Complex Embedding for simple link prediction. 
- Caruana 1997 et al 
- Khapra and Chandar et al 2016
- Isomorhpish between Embedding spaces Mikolov et al 2013
- Artexte et al 2016.
- Semith et al, 2017
- Ammar et al
- Smith et al 2017 Inverted Softmax
- Conneaua et al 2018
- Joulin et al Improvised CLSS
- Lazaridou et al 2016 Max-margin
- Herman and Blunsom et al, 2014
- Gouws et al 2015 
- Chandar et al 2014
- Smith et al 2017 Procrustres with inverted softmax
- github/babylonpartners/fasttext_multilingual
- Hoshen and wolf et al 2018 Enhancements
- Wasserstein Procrustes ( Zhang et al, Grave et al ) 
- ai.iisc.ac.in
- NELL Knowledge graph : rtw.ml.cmu.edu  / @cmunell
- ELDEN NAACL2018
- GCN ICLR 17
- GCN for NLP EMNLP 
- Document Timestamping GCN ACL 2018
- Bahdanu et al 2015, Embed Encode Attend Decode
- Minimal Parameter sharing 
- All shared architecture johnson et al 2017 
- Shared encoder Lee et al 2017, nguyen et al 2017, Gu et al 2018
- Preprocess Sentences Ponti et al 2018
- Data Selection Rudramurthy et al 2017

Lecture 1 - Embeddings and Neural Text Analysis

Lot of text analysis is representation learning.
Distributional vectors have been used to have text embeddings.
Representation usually depends on the topic, context and how much context you are looking at.
Word embeddings : Each word in the vocab is associated with two embeddings, one where the word is the focus and the other where the word is the context .
Sentence and Passage Representation : Sum , average of word vectors and weighted by word is a good benchmark.
CNNs for text :Learning Sentence Embeddings. Composing continuous representation without parses. You lose the idea of position in the sentence, that is what CNNs do they are feature detectors but not position or area dependent. The embeddings inputted can be word2vec or glove. Ultimately after making feature maps they are pooled. At some level these networks can be thought of as capturing the ngram level semantics and not beyond that basically ngram detectors. CNNs get local structure.
RNNs and LSTMs
RNNs as language models and translators : Machine Translation, Code Documentation, From Complex to easy sentences,
BiLSTMs : You want to capture the idea from left to right and vice versa.
Attention evaluations can give you word alignments i.e what word correponds to what other word is some other language.
seq2seq with attention

Open Problems :
1. Not very well known how to have large text representations

Relation Extraction

It's a multi labling problem, between the same set of words multiple labels are present.
Using Distant Supervision ( Weak Supervision Method) , Its an MIL ( Multi-Instance Leanring) problem. and you get bag ( bag of sentences) level labeling, large amount of labelled data but the labelled data is noisy.
Neural Networks for Distant Supervision :
1. PCNN, work at the sentence level. Same as CNN, but the pooling is piecewise. Remember : Within Sentence structure is relevant or not and if at this level you would want to pool it entirely .
2. NRE with selective attention : Give importance to the sentences not in a hard way and learn bag level representations.

Continuous Knowledge Representation

The goal is to embed not only words but also relations, entitites etc.
TransE : Translational Embedding. Simplest graph level embedding method. Rank positive tuples higher than negative tuples. Cannot model many-to-one, many-to-many, one-to-many.
Holographic embedding: state of the art.

Lecture 2 - Multilingual Learing

Build NLP Applications that can work on different languages.
Facets of an NLP Application : 1.First gen: Algorithms(Started with rule based systems,Parsers, Finite state transducers etc) , Knowledge(Production rules) and data. 2.Pre ML : Supervised classifiers, Probabilistic parsers as algorithms, complex rules as feature engineering in Knowledge and data. 3.Deep Learning systems : FCNs, Reccurent Networks, CNNs, seq2seq learning as algorithms, AutoML, Representation learning as knowledge and annotated data.
Multilingual Learning Scenarios : Joint learning :Analogy to multi-task learning, where each task is a language. Transfer learning and zeroshot learning.
Deep learning helps in this task because there exists a powerful framework for multitask learning. Word embeddings are also important.
A typical MultiLingual NLP pipeline : Text -> Tokens -> Token Embeddings -> Text Embedding -> Application specific DNN Layers -> o/p
Learn Cross Lingual Embeddings: There are offline and online methods. Orthogal Procrustes Problem
Hubness problem in nearest neighbour search. Solutions to hubness include Inverted Rank, Inverted Softmax
Online Methods for multilingual learning: Using parallel corpus only
A general framework for cross-lingual embeddings

Lecture 3 - Representing Knowledge

Background knowledge is key to intelligent decision making
NELL KG
Two Views of knowledge : Knowledge graph(Shallow) and dense representations (Dense)
Two types of Knowledge graphs : Ontological KG{high precision, canonicalized, requires supervision), Open KG(Ontology free, easy to build, availanle tools, fragmented)

Relation Schema Induction (RSI)

Problem : Given a sentence we need to find the underlying relationships within the entities of the sentence.
RSI as tensor factorization : First forming triples with OpenIE triples.
SICTF
CESI(Canonicalization) - KG embedding

Graph Convolution Nets

Minimal Parameter sharing : Separate vocabularies and embeddings, Embeddings learnt during training, A minibatch contains data from a pair of languages
All Shared Architecture : A minibatch contains data from all language pairs, source embeddings projected to a common space.
Shared encoder
Shared encoder with Adversarial Training : Machine Translation not done
Training Transfer learning systems

Lecture 1 - Embeddings and Neural Text Analysis

Relation Extraction

Continuous Knowledge Representation

Lecture 2 - Multilingual Learing

Lecture 3 - Representing Knowledge

Relation Schema Induction (RSI)

Graph Convolution Nets

Lecture 4 - Related Languages and Multilingual learning