Mid-Level Feature Coding

This work is a study to explore different types of Coodebooks and coding stregies for representing SD-OCT information applied to retinopathy. This work is an extension to explore some of the elementes encountered during the development of this study.

Some of the ideas here explored can be found in Kouniusz et al., 2013 and in this seminar given by Desire Sidibe.

Ideas

Useless words supression

BoW comes from text analysis. Roughly speaking, the idea is to find keywords and then analyse them. Words like and, the, for, etc. don't a pport much therefore during TF-IDF, these words are eliminated and the keywords reminding are used for the analysis. At the original application of BoW to image retraival, image keypoints are extracted, these keypoints are described using some descriptor and those are used as keywords to build the visual dictionary. Therefore,TF-IDF study is not necessary since all the words belong to keypoints and thus they qualify as keywords. However, later applications take advantage of dense description of the images. In those cases we belive that the TF-IDF study to suppress visualwords present in all the classes shall be detected an eliminated of the dictionary, and yet (up to our knowledge) this sutdy is never carried out.

Dictionary class, clustering convergence, etc..

Having the right size matters, no matter what they might say.

BoW formalization

BoW can be defined as X~DA, where X is the raw representation of the image. Understanding raw as the image representation of the original space. D is the dictionary representation in this original space and A is the image representation using the dictionary D. In this manner feature to describe an image can be seen as f(A).

The BoW formulation can be seen as a 4-steps process.

1. **Sampling strategy** to compute X. The feature space of X can be generated either only by keywords or in a dense extraction manner. *Look at TF-IDF*.
2. **Quantification** to hard-quantify the space of X. 
3. **Coding** to determine A.
4. **Pooling** to determine the final descriptor f.

Experimental Set-up (or stuff to try, compare and then try to make sense out of it)

Results combining all those configurations might be explored.

 1. Sampling
     1.1. Keypoints
     1.2. No-Kp
     1.3. No-Kp + TF-IDF

 2. Quantification
     2.1. to take into account
         2.1.1. Dictionary Size (num clusters, bandwidth, PCA info, etc)
         2.1.2. K vs d
             2.1.2.1. K << d
             2.1.2.1. K <  d
             2.1.2.1. K ~  d                 
             2.1.2.1. K >  d                 
             2.1.2.1. K >> d                                 
     2.2. Dictionary building strategy
         2.2.1. Random
         2.2.2. Clustering
             2.2.2.1 K-means
             2.2.2.2 heretical
             2.2.2.3 mean-shift
         2.2.3. PCA
         2.2.4. ICA
         2.2.5. NMF
         2.2.6. sparse
3. Coding
    3.1. Hard coding (one sample is associated to one word)
    3.2. Soft coding (i.e. distance to all words in the dictionary)
    3.3. Soft coding bounded (only give a weight to the n closests words in the dictionary)
    3.4. Sparse codign of A
4. Pooling
    4.1. Histogram
    4.2. Piramide
    4.3. other...

In [ ]: