What if you could complete an entire image pixel by pixel, by just supplying a single pixel?
For modelling the distribution of natural images we require an image model which is expressive, tractable and scalable all at once.
This paper tries to present a dnn which sequentially predicts the pixels in an image along the two spatial dimensions. Instead of considering the pixel intensities as a continuous distribution, this method models the discrete probability of raw pixels and encodes all the dependencies in the image. This is done by using a multinomial distribution with a vanilla soft-max layer.