The Saga of PunchPy and ATOCS

Published in

Theropod

8 min readAug 16, 2022

Written by Samantha Wareing & Andrew Wilt

🖥 🖥
When I joined IBM as a co-op intern, I joined the development team for a 37-year-old z/OS product. I started on the test side of the house, knowing nothing of z/OS, JCL, and green screens. One of my first jobs was to run the regression bucket as needed. A regression bucket is a set of JCL tests that test all existing functions with automatic checking of the output to ensure that currently working things aren’t broken by new code.

The regression bucket consisted of a complex web of pieces, allowing us to send the regression test cases to a target test system, and verify the output is what is expected. All of the regression JCL jobs were stored on a disk attached to a z/VM ID dedicated to the regression work. We would use a REXX tool called PUNCH which would send the JCL file to the target z/OS system. The PUNCH tool would add a ROUTE statement so that the output would be routed back to a special z/VM service ID, which would gather the JOBLOG output and perform verification of the output via ATOCS.

ATOCS stands for Automated Testcase Output Checker System, and provided a really powerful service that allowed us to verify the output was correct, utilizing FIND and NOFIND rules in the ATOCS section of the JCL. This had the effect of encoding the knowledge of the test case writer into the JCL job, so that others didn’t have to manually examine the output and determine whether everything worked as expected (or be experts in whatever function was being tested). ATOCS allowed for global NOFIND statements — (e.g. Don’t find JCL ERROR anywhere in the output), as well as range-based checking with START and STOP sections, allowing FIND and NOFIND statements to only apply within a particular STEP output.

As time went by and I crossed over from co-op intern to full time employee, the organization I was a part of started to adopt more modern tools for development such as GitHub Enterprise and Jenkins. This was a great move in the right direction, but I couldn’t help but feel that regression and testing in general were being left behind. All our tests were still stored on a z/VM disk and run from that system. With that being their home, version controlling those tests with git and running them in a pipeline seemed like a far-off dream.

I sat with this gnawing feeling for some time, feeling like there were better options than what was being used. One day, I finally decided to dig into making these modern tools a viable option. To do this, there were three big things that needed to happen. First, we needed the test cases to be on our local machines. While we could simply transfer the test cases from z/VM using FTP and then use git to version control them, that created a lot of tedious, repetitive work. What we really needed was to be able to write and execute our tests directly from our local machines. Second, we needed a way to get the JOBLOG output back to our local machine. Submitting test cases from our local machines but then having to switch over to our test system to get the output was more tedious, repetitive work. Lastly, for any of this to be meaningful, we needed to be able to continue to use ATOCS because all of our existing test cases were checked by ATOCS. These three things were key and they needed to be easy to do with little to no changes to the existing test cases.

Digging into it, I found that all of this was entirely possible. To submit JCL jobs to our test systems from a local machine we could transfer them to our test systems using SCP and invoke the UNIX ‘submit’ command via SSH. The UNIX submit command takes JCL contained in a UNIX file and submits it for execution. To get the output back I found that SDSF has this super cool interface in the REXX language. Using the JOBID returned from the UNIX submit command we could use the SDSF REXX APIs to grab that job’s output and print it to a UNIX file. This UNIX file could then be transferred back to the local machine using SCP.

Checking the output with ATOCS was the trickiest part. I didn’t want to reinvent the wheel and we needed the automatic checking to work with existing test cases. With some sleuthing I found the ATOCS source code and, to my shock, I found arcane z/VM XEDIT incantations and other REXX code. ATOCS, as it was and with its maintainers long retired, could not easily be exported to our local machines. Fortunately, I found that I could place the ATOCS source code on a shared disk and execute it via another REXX script. Once the output of the job was retrieved, I could run the ATOCS tool from the test system and print the results to another UNIX file.

At that point I had everything I needed, but it was all in pieces that had to be manually executed. Again, it was important that this process was easy otherwise we might as well continue to use our current methods. So, I wrote a bash script. We called it PunchTool and it was a single bash script that orchestrated executing each of these individual steps, from transferring and submitting the JCL job to writing and executing the REXX scripts to retrieve and check the output. With a little setup, it could be invoked from any bash shell on a local machine. We could now store our test cases in GitHub and it was possible to instantiate a Jenkins agent to run those test cases using PunchTool.

Fast forward a few years and I had now joined a new team for an equally old z/OS product. There were still a handful of people using PunchTool, but it had remained largely unchanged since its initial implementation. While it worked well enough, I often thought about how it could be better. Maybe it could have better command line options, maybe it could also capture the syslog, maybe (and this weighed heavily on my mind) it could have its own version of ATOCS so that the tool could be more self-contained with fewer dependencies.

With these thoughts in mind, I embarked on a new mission to rewrite PunchTool and this time do it right. Not entirely from scratch, I rewrote the foundation in Python, lovingly referring to it as PunchPy. Fundamentally, PunchPy works the same way as PunchTool. It uses an SSH connection to transfer the JCL test cases to the test system and invoke the UNIX submit command. Again, REXX scripts are used to collect the JOBLOG output and, now, a snapshot of the SYSLOG. This is all accomplished with a collection of Python and REXX scripts that are installed on the test system (the z/OS host) and the local machine (the client). Initially, this was all PunchPy did, but with better command line parsing, logging, and a more secure connection thanks to the many great Python packages available.

Once I got PunchPy working well, I started using it to submit a bunch of tests for the coding project I was working on. My mind went back to ATOCS as I looked through the JOBLOG and SYSLOG of the 5th test case for the umpteenth time, trying to remember what was important to see and not see. I spent some time looking for existing text verification tools, but everything seemed focused on Junit or other higher-level packages, when I just wanted to verify that specific text was found or not found. I also wanted to avoid using the original ATOCS because it created another dependency (i.e. having access to a special disk) and it was important that PunchPy had as few dependencies as possible. (Remember easy-to-use was the theme.)

I remembered that Python has great Regular Expression (regex) support, and that might be a really easy way to perform ATOCS type checking on the output. ATOCS checking would save my time and eyes, just returning a SUCCESS! or FAILURE! message for the tests run. It was time to reinvent the wheel.

Inspired by the original ATOCS, and with the help of my amazing mentor, I wrote a new version of ATOCS and added a command line option to PunchPy to invoke it. PunchPy ATOCS builds rules objects based on the FIND/NOFIND statements in a ATOCS LISTING or ATOCS SYSLOG section located in the comments of the JCL. I extended it to allow for a STEP identifier which starts discrete section checking. The STEP will expect a START and STOP statement with regular expressions to say what defines the section.

Example ATOCS section in JCL//* EXPECTED RESULTS (ATOCS LISTING)
//*
//* NOFIND “.*IEA995I SYMPTOM DUMP OUTPUT.*” 
//* NOFIND “.*IEF453I.*JOB FAILED.*JCL ERROR.*” 
//* NOFIND “.*IEF272I.*STEP WAS NOT EXECUTED.*” 
//*
//*
//* STEP CONDCODE /* CHECK CONDITION CODES */ 
//* START “.*HASP373.*TSTJOB.*”
//* STOP “.*HASP395.*”
//* FIND “.*STEP001S.*IDCAMS.*0000.*”
//* FIND “.*STEP001A.*IEBDG.*0000.*”
//* FIND “.*STEP001U.*IEBGENER.*0000.*”
//* FIND “.*STEP001V.*IEBCOMPR.*0000.*”
//*
//* STEP STEP001V /* Verify data step */ 
//* START “.*COMPARE UTILITY.*”
//* STOP “.*END OF JOB.*”
//* FIND “.*TOTAL NUMBER OF RECORDS COMPARED = 00102400.*”
//* NOFIND “.*IEB221I.*RECORDS ARE NOT EQUAL.*”
//*
//* END OF EXPECTED RESULTS (ATOCS LISTING)

I wanted to be able to create a Jenkins pipeline that would submit my group of JCL test cases each time I made a code change, and tell me if the tests all succeeded or not. I needed the ATOCS processing to give me a resulting return code, as well as some indication of which FIND/NOFIND statements passed or failed the regex checking.

So, I have the PunchPy ATOCS create a <testcase>.<jobid>.results file with output describing the regular expressions, whether they were found, and which line they were on. Lastly, if all regular expressions passed, then it sets the return code to zero so Jenkins can pick that up. If any of the expressions fail, or any other error occurs, then PunchPy will return a non-zero return code.

Luckily, we have a local zLinux system in our lab where I was able to set up a Jenkins agent. In the beginning, I was using a Jenkins agent on my local PC, which works the same. Since PunchPy is written in Python, I can have the Jenkins agent run it as long as Python 3.9 is installed in that environment.

Additionally, since I have the test cases stored in GitHub, I have a stage in the pipeline that does a git pull to get the test cases to the local environment before submitting them to the remote system.

With all this work, I can easily create new test case buckets (groups of test cases) for my own development projects, including them in modern interfaces like Jenkins. I can get all the benefit of continuous testing along with easy visualization of success or failure like the old method. Moreover, with the Jenkins pipeline integration, I can easily pass the test bucket to become a regression bucket related to my project, minimizing the effort needed by someone else to add it to the product regression bucket.