Hiding Images using AI — Deep Steganography

Harshvardhan Gupta
Published in
6 min readFeb 11, 2018

Deep Learning is giving us some very new kinds of things. From areas like Style Transfer, to Unsupervised Translation , it is constantly pushing the boundaries of computers. Interestingly, we have not yet reached an upper bound , and new papers with great results seem to come up very often. This post discusses one such new paper — Deep Steganography.

This post is based on the NIPS 2017 Paper Hiding Images in Plain Sight: Deep Steganography.

At the end of this article , I will provide links to my TensorFlow implementation , and a demo that you can access on the web.

What is Steganography

Steganography is the process of hiding some type of data into other data. An example would be to hide an Image inside another Image. The key difference between cryptography and steganography is that in steganography, the Image looks unchanged, and therefore will not be scrutinised or analysed by middlemen.

Figure 1.0: Example of Steganography for Images

Figure 1.0 shows a general steganography framework. It consists of 2 inputs, a Secret Image , and a Cover image. The Secret Image the image you want to hide. The Cover image is the image that should ‘cover’ the secret image. These two inputs are passed through some Hiding Algorithm to generate the Output Image. The output should look exactly like the cover image, but upon using a Revealing Algorithm, it will generate the secret image.

Thus, to an unsuspecting eye, the output will look like an ordinary image, but it would also contain a secret image.

Problems with Existing Methods

Current methods that hide images in other images already exist, but there are a few problems associated with these.

  1. They are very easy to decode, as the way information is encoded , is fixed.
  2. The amount of information that can be hidden is generally less. Hiding an image of the same size will probably lose a fair bit of information.
  3. In the case of Images, the algorithms dont exploit the structure of images. They don’t use the patterns found in natural images.

The Solution — A Neural Network

Convolutional Neural Networks have shown to learn structures that correspond to logical features. These features increase their level of abstraction as we go deeper into the network. Using a ConvNet will solve all the problems mentioned above. Firstly, the convnet will have a good idea about the patterns of natural images, and will be able to make decisions on which areas are redundant, and more pixels can be hidden there. By saving space on redundant areas, the amount of hidden information can be increased. Because the architecture and the weights can be randomised, the exact way in which the network will hide the information cannot be known to anybody who doesnt have the weights.

The Architecture

The entire network architecture is surprisingly similar to Auto Encoders. In general, auto-encoders are made to reproduce the input after a series of transformations. By doing this, they learn about the features of the input distribution.

In this case, the architecture is slightly different. Instead of merely reproducing images, the architecture has to hide an image , as well as reproduce an other image.

Figure 2.0: Network Architecture

The whole framework consists of 3 Parts: The Prepare Network, The Hide Network, and The Reveal Network.

The Prep Network takes in the secret image, and ‘prepares’ it. The Hide Network takes in the Output of the Prep network as well as the Cover Image. These two inputs are first concatenated across the Channels Axis. The Hide Network outputs an image, which is the Hidden Image. This is the Image that contains the Secret, but looks like the Cover.

In order to get the Secret Image back, it needs to be passed to a Reveal Network. The Reveal Network will output an Image, which looks like the Secret.

The actual architecture of each of the networks is roughly similar, and there is a lot of room for experimentation. I used 4 (3x3),(4x4)& (5x5) kernel convolutions on the input(50 maps), before concating. Then I did another 3 convolutions on the concatenated feature maps. After that , I did a 1x1 convolution to produce 3 channels. You can read about the actual details in the implementation code, and the diagram in my repo.

The Network Losses

The Loss is fairly straightforward. It is:

Figure 3.0 Network Loss

where c is the input cover, c’ is the covered image. s and s’ are the secret input, and secret cover images , respectively.

The loss is the standard MSE between the actual cover image and the produced covered image , and β*(MSE between actual secret image and the produced revealed image). Beta is a hyper parameter that controls how much of the secret should be reconstructed. Thus the loss optimizes for the following statement.

“The covered image should look very close to the cover image, and when revealed, the revealed image should look very close to the secret image”.

Since the function is differentiable, the entire network can be trained end to end.


The paper reports results that are substantially better than existing methods. There is a tool called StegExpose, which can find whether or not an image has something hidden. It is fairly easy to find out if the image is tampered if it is hidden using existing methods. However, this method is able to fool StegExpose.

Figure 4.1: Results from the Paper
Figure 4.2: Results from my Implementation(with β=0.75)

Why is this Useful?

You may wonder, what the point of hiding images is. Apart from its uses by intelligence services, it has a use that I find more appealing — Digital Copyrighting. A copyright image can be hidden in an image. If an image is wrongfully stolen, the original author can reveal the copyright. By using systems that make it hard to detect or remove this copyright , it makes it harder to steal digital media and get away with it.


We looked at a very new method to improve state of the art results of Steganography. This opens up a lot of new possibilities. The same could possibly be done for other media such as Audio and Video.

Also , using smaller size secrets on large size covers will allow the prep + hide net to achieve even higher quality results.

Call to Action

