Using the results of adaptive masking (see notebooks in ../masking), one can give relative importance to different inputs (pixels) in the dataset (images). This can also be done by using some fancier methods like CAMs and other saliency map techniques. But hey, we gotta do it this way!
The proposed model will take two inputs:
Its aim will be to learn to work on any combination of masked image with its corresponding mask. For example, if it gets just half the pixels (and the mask reflects the same), it should do its best to give results. On the other hand, if the image is complete and the mask is empty, it should work as any other usual model. Now the point is, if we are able to prioritize the pixels somehow and pass it to the network incrementaly, we should get an optimal number of pixels vs accuracy curve.
Note that this is not dimensionality reduction of the dataset. This is simple data censoring.
There are two models essentially. The first one uses adaptive ppn masks to prioritize pixels (we will compare this with priority generated by, say mean difference, or variance based techniques). The second will try out an incremental stream of data + masks to figure out the output class.
The implementation of second model will need a training batch generator which randomly masks out data from images and has certain properties like control parameter for clustering of mask pixels and a parameter for controlling the fraction of masking.
In [ ]:
In [ ]: