Mainframe Batch 101 — Concepts & Why it Matters

Published in

Modern Mainframe

10 min readFeb 11, 2020

Where do your monthly credit card statements come from? Quarterly stock statements? Pension statements? Retail nightly sales reports? Supply chain delivery reports? Where is most of this heavy lifting done?

z/OS Batch Processing.

Bank statements, retail inventory reports, and stock statements are often processed by z/OS batch

This may not be the most exciting stuff, but think about what would happen if it stopped working — this is truly business critical workload that we take for granted. Delays or errors in processing these statements and reports can be disastrous. Batch disruptions can impact real-time interactions as well — as seen in 2018 outages or 2012 delays.

Fun fact: The precursor to JES was HASP (Houston Automatic Spooling Facility) which was developed for NASA by IBM. This is why JES messages, even today, use the $HASP message prefix.

Mainframes, z/OS and JES (Job Entry Subsystem of z/OS), excel at processing terabytes of data to produce meaningful output due to the nature of their architecture. This is one of the reasons z/OS continues to be the platform of choice for batch workload. If you’d like to learn more about how JES works, check out Steve Warren’s GSE UK presentation (PDF).

Survey

I ran a survey on z/OS batch applications on SurveyMonkey using social media and various mainframe watering holes like the open mainframe project slack and the mainframe subreddit. I am thankful to the 22 mainframe users who responded to the survey within 2 days.

This blog entry and survey are part of a technical session I’m delivering at the SHARE Ft Worth event.

“Are you really doing this for SHARE or to gather marketing info?” — r/mainframe user

I don’t know if that’s reddit being reddit or if it’s indicative of the mainframe community’s surprise around a survey about z/OS batch applications!

Industries

No surprises here — banking and finance represent the majority of the respondents.

Roles

All roles associated with the lifecycle of batch applications are well represented in the survey.

Prevalence of Batch on z/OS

The survey’s median response shows that 30–50% of the workload on the mainframe is z/OS batch workload. So, what’s the rest of it?

OLTP — Online Transaction Processing workload.

This could be CICS, IMS or zOSConnect EE, but essentially this is the interactive stuff; people checking their bank balances, insurance claims, using their passports at airports or out shopping using their credit cards. Even these interactive applications rely on batch processing for continued operation. For example, nightly batch jobs are typically what cause your “pending transactions” on credit card or bank accounts to go from pending to confirmed. If these batch jobs don’t function properly, they will ultimately cause the interactive portion of the app to stop functioning.

Generally, the interactive stuff usually gets all the love and attention, but we’ll be focusing on understanding batch workload.

What is a Batch Application?

linux cron (batch scheduler) sample

Batch workload is prevalent on many platforms. There’s JES for z/OS, and SLURM for Linux. Even cloud service providers have dedicated batch platforms like AWS batch, and Azure batch.

A batch job is a computer program or set of programs processed in batch mode. This means that a sequence of commands to be executed by the operating system is listed in a file (often called a batch file, command file, job script, or shell script) and submitted for execution as a single unit. — “About batch jobs” Indiana University

Batch Applications on z/OS

z/OS batch applications are a type of application software with business logic that runs on the z/OS Operating System at periodic intervals to produce meaningful output from data that was accumulated.

JCL

Multi-step batch job’s JCL to copy a load library and create USS directory

In linux, a batch job is likely written as a shell script. In z/OS, batch jobs are written in Job Control Language (JCL).

How do z/OS Batch Applications work?

Structure

z/OS batch apps (Purple in the “Batch Application Program Structure” diagram) are made up of multiple batch jobs (Yellow). The execution order of these jobs — parallel or serial — is managed by a third-party job scheduler tool like CA-7, CA ESP or BMC Control-M (Green). This logical grouping of jobs through a scheduler is the definition of the batch application.

Each job can contain one or more job steps (Blue). Each step executes a program (Pink) that is traditionally a COBOL, HLASM or PL/I program. These programs may call other programs during their execution. The collection of these programs are the business logic that transform the input into output.

Input & Output

Each step (blue in the “Transformation and Passing of Input & Output between Jobs and Job Steps” diagram) typically includes input (red) which could be in-line JCL SYSIN, physical sequential datasets, partitioned data sets or VSAM datasets.

Each step also typically includes output (orange) which could be dynamic JES spool datasets, physical sequential datasets, partitioned data sets or VSAM datasets.

In cases where large amounts of data are being output to spool, space is limited and shared by all address spaces executing on the system — so, spool output often need to be offloaded elsewhere.

Output Management

The sheer volume and variety of reports and statements generated by z/OS batch applications can be so enormous that output management tools such as CA Deliver (blue in the “Output Management for Batch Applications” diagram), CA View (Purple) and CA Spool (Green) are used to capture, view, manage, and export the batch output. These tools even offer access control to modern GUIs and a Zowe CLI plugin for automation purposes. The combination of these tools are used to convert the output into popular formats like PDF to be made available online or printed and delivered via snail mail. Some output is also exported to bulk storage mediums like tape for archival and compliance purposes.

Production

On production systems, there may be multiple batch applications competing for resources on the same z/OS LPAR with other OLTP applications. The workload manager (WLM) component is a base part of z/OS that systems programmers use to tune and manage the allocation of system resources for the running workload. Performance management tools like CA SYSVIEW or IBM Omegamon (no, not the digimon) are often used to set thresholds and alerts for z/OS workload. You don’t want your resource-hungry batch workload to drain the system of valuable resources to process an OLTP interactive user’s request for information, do you? That can result in delays and a bad user experience for someone desperately attempting to use an ATM machine during your batch window at 2 AM.

What can change in a z/OS Batch Application?

Changes to batch applications are usually made by means of a change request that requires various levels of approvals before the change is deployed into the production environment. Let’s understand what parts of a z/OS Batch Application can change as part of a change request.

In order of frequency of change:

Programs (COBOL, HLASM or PL/I) change — controls business logic that processes the data and generates output.
Job script (JCL) change — controls what programs should be executed and what the inputs and outputs are.
Job Sequence change in Scheduler — controls what order jobs that are part of an application should execute in. It also tells the scheduler which jobs can be executed in parallel and which ones must be serial.
Output Management change — controls what output from jobs is moved and transformed.

z/OS Batch Application Lifecycle

The lifecycle of a batch application involves various personas and responsibilities. Their methods seem very waterfall, but they have immense trust in their tried and tested application lifecycle philosophies given the importance of the workload they manage. However, agile and DevOps culture is starting to permeate and the status quo is being questioned.

Personas

Let’s understand the people who generally work on z/OS Batch Applications. The following personas related to batch applications are common, but may differ in title or role at different organizations.

Application Analyst — serves as the bridge between the business and the developers. They perform risk analysis and estimate time required to complete projects. In agile teams, this role goes by Product Owner.

Application Developer — tasked with enhancing or fixing defects in the business logic to add value or obtain compliance. As part of changing the programs in COBOL, HLASM or PL/I, they may sometimes need to tweak JCL in the job scripts as well.

Quality Assurance (QA) or Testing Engineer — responsible for testing applications to approve change requests before they make it into pre-production environments. In the mainframe world, QA is typically not embedded into the development team, and operate as a separate team. However, this is changing as more mainframe teams adopt agile.

Batch Manager— responsible for loading the batch workload onto schedulers and managing them. Batch Managers approve change requests being moved into pre-production and eventually deploy the change into production. Batch Managers typically have a holistic understanding of how the entire batch application is strung together.

Production Operator — responsible for maintaining the SLA for applications by monitoring the health of systems.

Workload Manager — responsible for helping the Operations team meet the SLA of the apps by tuning z/OS WLM rules.

Lifecycle

Let’s look at how changes are developed, tested and deployed today.

Gary | Application Analyst — Michelle | Application Developer — Susan | QA Testing Engineer — Fred | Batch Manager — Ashley | Production Operator — Brian | Workload Manager

The application analyst is asked to evaluate a new project. Based on their analysis, the business approves or rejects the project. Factors like complexity, risk, cost, time, and required skills are among the factors that play a role. Once approved, the application analyst introduces the project to the development team.

The development team is tasked with building an enhancement or fixing a defect by the analyst. As part of this, they change code in a few programs. Once the code changes are made, they manually unit test the changes by running the changed programs individually using custom batch jobs.

On occasion, the JCL in the job script needs changes in order to add input or output files as part of an enhancement. These changes are made by the dev team by consulting with the batch management team as they are typically the “keepers” of the production JCL. In even rarer cases, the sequence of jobs that are part of the batch application executed by the scheduler needs to change in order to introduce a new job into the sequence or remove/replace an existing one.

The dev team uses static test data while testing manually by submitting highly customized batch jobs. Once the test job(s) complete, they verify the output by scanning through it manually and ensuring the results match expectations. If all is good from the simple dev unit test, an official change request is created and the changes are promoted to the QA stage.

The QA stage takes it much further than the basic unit testing that was done as part of development. The change request is evaluated to see what parts of the application should be holistically tested. Then comes the time-consuming process of regression testing where pages of documentation are followed to manually test the series of jobs.

In some cases, the QA stage may access a sandbox environment with a batch scheduler (CA ESP, CA 7, BMC Control-M, etc) and a content management tool (CA View/Deliver, etc). The QA stage is used to functionally test the batch application in a simulated production-like environment. Performance tests are not typically part of this stage of testing. If all is well, the change request is promoted from QA to a pre-prod environment.

Pre-prod is used as a staging environment before the change request makes it into production. Change windows are short and rare in production environments, so pre-prod serves as the environment from which changes can quickly be copied to production. As part of the move from pre-prod to production, several teams like the batch management team, the WLM team and the production operations team get involved and various approvals are solicited. The batch management team will deploy the application changes, while the WLM team may need to tweak some resource allocation rules if the application changes are at risk of affecting system performance. The production operations team will need to be aware of the changes so that they can respond appropriately if things go wrong.

Opportunity for DevOps

Batch applications on z/OS are mission critical in many of today’s industries. Any type of an outage could have severe impacts on the business operations of a company.

Majority of survey respondents indicate that z/OS batch issues are high business severity.

The z/OS batch application lifecycle, while time-tested, is highly manual and could stand to gain in quality, speed of delivery, and audit traceability by adopting modern DevOps principles. There are many facets to adopting DevOps such as automating the testing, using developer-friendly IDEs, creating quality gates using code scans and CI/CD pipelines, and much more. I’ll be exploring some of these possibilities in my future blog entries.