First lets initialize BIDMach again.

```
In [ ]:
```import BIDMat.{CMat,CSMat,DMat,Dict,IDict,Image,FMat,FND,GMat,GIMat,GSMat,HMat,IMat,Mat,SMat,SBMat,SDMat}
import BIDMat.MatFunctions._
import BIDMat.SciFunctions._
import BIDMat.Solvers._
import BIDMat.Plotting._
import BIDMach.Learner
import BIDMach.models.{FM,GLM,KMeans,KMeansw,LDA,LDAgibbs,Model,NMF,SFA}
import BIDMach.datasources.{DataSource,MatDS,FilesDS,SFilesDS}
import BIDMach.mixins.{CosineSim,Perplexity,Top,L1Regularizer,L2Regularizer}
import BIDMach.updaters.{ADAGrad,Batch,BatchNorm,IncMult,IncNorm,Telescoping}
import BIDMach.causal.{IPTW}
Mat.checkMKL
Mat.checkCUDA
if (Mat.hasCUDA > 0) GPUmem

Check the GPU memory again, and make sure you dont have any dangling processes.

This time, you will build up the datasource and learner from scratch. You may want to open a browser tab or window to MLscalePart3.ipynb as a reference.

A **Topic model** is a representation of a Bag-Of-Words corpus as several factors or topics. Each topic should represent a theme that recurs in the corpus. Concretely, the output of the topic model will be an (ntopics x nfeatures) matrix we will call `tmodel`

. Each row of that matrix represents a topic, and the elements of that row are word probabilities for the topic (i.e. the rows sum to 1). There is more about topic models here on wikipedia.

The **element tmodel(i,j) holds the probability that word j belongs to topic i**. Later we will examine the topics directly and try to make sense of them.

Now define a mixture class to hold all the options. The learner should use an SFilesDS, an LDA model with IncNorm updater. Add the appropriate code here:

```
In [ ]:
```class xopts ...
val opts = new xopts

Next come the options for the data source. Fill these in using the previous data sources as templates.

You need an SFilesDS (Sparse files datasource), based on files in `/data/uci/pubmed_parts/partNN.smat.lz4`

for NN = 00 to 09. This datasource uses just this one group of files, and each matrix has 141043 rows.

```
In [ ]:
```val mdir = "../data/uci/pubmed_parts/"

```
In [ ]:
```val ds = {
implicit val ec = threadPool(4) // make sure there are enough threads (more than the lookahead count)
new SFilesDS(opts) // the datasource
}
opts.autoReset = false // Dont reset the GPU after the training run, so we can use a GPU model for prediction

Next define the main learner class, which is built up from the classes mentioned above: SFilesDS, LDA and IncNorm. For the Learner class, make sure you match the argument positions.

LDA is a popular topic model, described here on wikipedia.

We use a fast version of LDA which uses an "incremental multiplicative update". That's what the IncNorm updater does.

```
In [ ]:
```val nn =

Add tuning options for minibatch size (say 100k), number of passes (4) and dimension (`dim = 256`

).

```
In [ ]:
```opts...

```
In [ ]:
```nn.train

```
In [ ]:
```plot(nn.results(0,?))

```
In [ ]:
```val tmodel = FMat(nn.modelmat)
val dict = Dict(loadSBMat(mdir+"../pubmed.term.sbmat.gz"))

`dict(1000)`

, by their string represenation `dict("book")`

, and by matrices of these, e.g. `dict(ii)`

where `ii`

is an IMat. Try a few such queries to the dict here:

```
In [ ]:
```

Next we evaluate the entropy of each dimension of the model. Recall that the entropy of a discrete probability distribution is $E = -\sum_{i=1}^n p_i \ln(p_i)$. The rows of the matrix are the topic probabilities.

Compute the entropies for each topic:

```
In [ ]:
```val ent = -(tmodel dotr ln(tmodel))
ent.t // put them in a horizontal line

Get the mean value (should be positive)

```
In [ ]:
```mean(ent)

`elargest`

and `esmallest`

.

```
In [ ]:
```

`bestv`

gets the sorted values and `besti`

gets the sorted indices which are the feature indices.

```
In [ ]:
```val (bestp, besti) = sortdown2(tmodel,2)

Now examine the 100 strongest terms in each topic:

```
In [ ]:
```dict(besti(elargest,0->100))

```
In [ ]:
```dict(besti(esmallest,0->100))

Do you notice any difference in the coherence of these two topics?

TODO: Fill in your answer here

```
In [ ]:
```// words for 2nd lowest entropy topic

```
In [ ]:
```// words for 3rd lowest entropy topic

What would you expect to happen to the average topic entropy if you run fewer topics?

TODO: answer here

`dim=64`

and put the new value below:

dim | mean entropy |
---|---|

64 | ... |

256 | ... |

You're done! You now have the keys to the fastest machine learning toolkit on earth. Have fun!