z/OS UNIX Character Encoding Best Practices

Leonard Carcaramo Jr
Theropod
Published in
6 min readOct 29, 2021

🖥 🖥

Introduction

If you are a typical software developer on Linux or Windows systems, you’ve likely never given much thought to character encoding. Maybe you took an assembly language course in college where assignments required you to work with ASCII characters in their numeric representation. Some may also recall the boilerplate included in HTML pages that tells the web browser how an HTML page is encoded. You might also run into issues with Windows and Unix line endings from time to time. Other than these few cases, most people who work on Linux and Windows based x86 systems almost never need to worry about encoding.

Why Worry About Encoding on z/OS UNIX?

The reason why one needs to worry about encoding on z/OS UNIX has to do with the history of z/OS. The platform was initially developed to use the EBCDIC (Extended Binary Coded Decimal Interchange Code) encoding scheme, which is an extension of BCDIC (Binary Coded Decimal Interchange Code). BCDIC was the protocol that IBM peripherals attached to computers used. The reason why EBCDIC was used and not ASCII is because ASCII wasn’t standardized in time for the release of the S/360 in 1964. Today even z/OS UNIX has an affinity to EBCDIC, and MVS still requires datasets to be encoded in EBCDIC. This means that developers that use z/OS UNIX always need to be thinking about encoding since z/OS UNIX programs will often have issues when files are not encoded and or tagged properly.

Character Encoding Schemes That Are Supported on z/OS UNIX

z/OS UNIX supports all the encoding schemes that one might be used to working with on Unix and Windows based x86 environments. In general, z/OS UNIX supports EBCDIC, ASCII, UTF-8, and other ASCII-based encoding schemes. To see all the supported character encoding schemes on z/OS UNIX you can run iconv -l.

Note that many more encoding schemes are supported than what is shown in the image below.

How Does File Tagging Work on z/OS UNIX?

File tagging is a mechanism that tells programs how a file is encoded. If a file is not tagged, programs may make assumptions about the encoding of a file. In many cases, programs will assume files are encoded using an ASCII based encoding scheme. So, if a file is encoded in EBCDIC, but is not explicitly tagged, there is a good chance there will be issues when a program tries to read that file. This could be receiving a UnicodeDecodeError from Python or cURL simply reporting that CA certificate validation failed with no indication that there is an issue with how the CA certificate bundle is encoded. Either of these can occur as a result of supplying programs with untagged files. Additionally, similar issues can arise when a file is encoded in one encoding scheme but is tagged for a different encoding scheme. It is best practice to always tag files and to make sure all files are encoded correctly to avoid these issues.

How to View z/OS UNIX File Tags

On z/OS UNIX, ls -T can be used to show file tags. This will tell you whether or not files are tagged and what encoding schemes tagged files are tagged for.

How to Tag a z/OS UNIX File

Regardless of whether or not a z/OS UNIX file is already tagged, chtag can be used to tag a file for any supported encoding scheme. File tags can also be removed from a file using the chtag command if one wishes to do so.

Tagging a z/OS UNIX file
Un-tagging a z/OS UNIX file

How to Tell When a File is Not Tagged/Encoded Properly

If the file is supposed to contain plain text, the easiest way to verify that the file is tagged/encoded properly is to display the file using the cat command or open the file in the vi editor. If the contents of the file do not look like plain text and look like the contents of a binary file, that is an indication that the file is not tagged/encoded properly. To check if a binary file is tagged/encoded improperly, execute the file. If it crashes catastrophically, that may be an indication that the file is not tagged properly.

Check tagging for plain text using the cat command.
Check tagging for Plain Text using the vim editor.

How to Convert the Encoding of a z/OS UNIX File

To convert a file from one encoding scheme to another, use the iconv command.

The example below uses iconv to to convert the encoding of the contents of a file from ISO8859–1 (Latin-1) to IBM-1047 (EBCDIC), and then output redirection is used to write the converted bytes to a new file.

How to Auto Convert the Encoding of Files on z/OS UNIX

As mentioned earlier, z/OS UNIX has an affinity to EBCDIC. So, by default, z/OS UNIX is going to try to write text to files in EBCDIC unless told otherwise. Setting the _BPXK_AUTOCVT environment variable to ON enables the automatic conversion of the encoding of text being written to a file to the encoding that file is tagged for. For example, if one uses output redirection to echo text to an ISO8859–1 tagged file when _BPXK_AUTOCVT is set to ON while using Bash, the echoed text will be automatically converted from IBM-1047 to ISO8859–1. Conversely, if the _BPXK_AUTOCVT environment variable is set to OFF or is unset, no auto conversion will occur. In this case IBM-1047 encoded text would be written to the ISO8859–1 tagged file. If auto conversion is not set in the z/OS UNIX environment, it is highly recommended that one ensures that export _BPXK_AUTOCVT=ON is defined in .profile or .bashrc to avoid encoding issues that will arise if the _BPXK_AUTOCVT environment variable is set to OFF or is unset.

Note that _BPXK_AUTOCVT=ON may already be set in your z/OS UNIX environment. You can check by running echo $_BPXK_AUTOCVT. If _BPXK_AUTOCVT is already set to ON, no action needs to be taken.

_BPXK_AUTOCVT=OFF or unset
_BPXK_AUTOCVT=ON

Using Git on z/OS UNIX

The general rule of thumb when working with Git on z/OS UNIX is to always ensure that the encoding of all files in the Git repository are explicitly set in the .gitattributes file at the root of the repository. See Medium article Git on z/OS for more detail on using Git on z/OS UNIX.

Closing Remarks

Dealing with encoding issues on z/OS UNIX can be difficult, especially when one is not aware of the importance of encoding and file tagging on z/OS UNIX. However, once one is aware of the nuances of encoding and file tagging on z/OS UNIX, identifying and solving encoding issues becomes simple. A general rule of thumb for working in z/OS UNIX is to always check encoding and file tagging when an unusual or elusive issue occurs to catch encoding related issues early. This can save hours of debugging since encoding related issues are often solved by simply tagging a file.

--

--

Leonard Carcaramo Jr
Theropod

Proudly, enabling DevOps, open source, and containers on z/OS.