UnicodeDecodeError: ‘utf-8’ codec can’t decode byte 0x89 in position 1: invalid start byte

Encoding and decoding in Python

Encoding and decoding values can get confusing.

This sample code might help you out. I named the variables to show whether they were encoded (bytes) or decoded (a string) or base64 encoded or decoded.

import base64value="Decoded string value"value_encode = value.encode('utf-8')
value_base64encode = base64.b64encode(value_encode)
value_decode = value_base64encode.decode('utf-8')
print(value_decode)value_encode = value_decode.encode('utf-8')
value_base64decode = base64.b64decode(value_encode)
value_decode = value_base64decode.decode('utf-8')

Sometimes you encode values so they don’t accidentally get processed incorrectly. For example if you have code in a value and you don’t want it to mess up parsing and processing, you might encode it to prevent certain special characters from being interpreted as code or breaking something.

Encoding values properly can help stop some security problems where attackers try to inject code into process to take malicious actions.

Attackers also may encode values to bypass security tools that inspect data.

At any rate, if you want to base64 encode a value you first need to translate into bytes, then encode it. Then decode it to turn it back into a string that you can pass into functions that expect a string.

When you want to decode the base64 encoded value, turn it back into bytes, then base64 decode it, then convert it back to a string.

There are different types of encoding and decoding like ASCII or UTF-8 — so you will want to understand what character set you need to support or work with to use the proper encoding and decoding.

One of the challenges is when you don’t know how something was encoded in the first place and you have to decode it. You’ll probably want to refer to the source code or the documentation to make sure you decode it correctly.

