Python Bytes!

Vera Worri
Aug 22, 2017 · 3 min read

Working with Bytes Using Python | Steganography

Python is extremely versatile. Everyday I am amazed by a program or a package being built using python. I have used it for a lot of different things but ever worked with file data in bytes. While it is true that I have used network sniffers like scapy and sockets, I had never actually manipulated the data or really looked at it even. if I did, it was in wireshark.

I will be using Python 3.6 on a Mac.

For this article, I want to find hidden data inside a file. This is an example of steganography. I will be using only native packages like Regex (not perfect, but it’s what I know).

First thing’s first, we need to find a file with hidden data. luckily for us, I found one on wikipedia.

The file looks like this:

We then read the file into Python:

This is the first thing that is different. Notice the flag "rb+” . The “b” tells Python to expect bytes.

Now, let’s look at the data we have just read in.

What you are looking at is the first ten bytes of the png file. If compare with what a standard png file looks like, this looks right.

On a glance, there is nothing odd here but we know that there is a hidden file. So we make a list of common file identifiers (in bytes).

Again, notice the “b’” flag.

Now, we can use a list comprehension to find out what type of file is hidden and where the file identifier is in our data.

In order to do this, it would be better if we created a function that checked whether or not the extension is in the data and either returns a message or the index. We will use this inside the list comprehension.

It seems that the hidden file is a “pk” file. This means that the file is a compressed file or a .zip file.

Now that I have found the index of the hidden file, I have to find out how many bytes are in the file so I can extract it. When I print out the first four bytes I see this monstrosity.

I am not quite sure what b’x17j’ is. It looks like hex but I am not sure where the j is coming from.


Research


Well, it is hex. Python is trying to translate the hex string into ascii characters. So, the best way to get to know the data is to use the binascii method of hexlify. This allows you to print out the hex data as is.

Now my code looks like this,

The 50 corresponds to “PK”. Now that I have this fixed, I can use the header offset to find the size of the file. There is a chart of the file structure on the wikipedia page.

The offset is at 18 bytes and it is 4 bytes long. When I print out the results I get:

b’c727df7a’

The hex is in little endian order. This basically means that the importance of the digit is backwards. Python has a little endian parser built in.

After this, all I need to do is write the bytes to a file with a .zip extension… in theory. But that’s another article all together so I’ll leave this one here. If you have any suggestion or corrections, don’t hesitate to comment.


Vera Worri

github: https://github.com/Vworri

website: vworri.github.io

)
Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade