Defusing COBOL Bombs with Smart Automation

Published in

The Technical Archaeologist

9 min readSep 14, 2018

As I continue this series on the challenge of legacy systems generally, and COBOL specifically, I’ve come to feel that the answer to our COBOL woes is not actually getting rid of COBOL, but simply making it easier for modern programmers.

Our society is loaded with COBOL bombs — applications that perform critical functions and would be difficult to fix if something were to go wrong. Much is made of this reality. COBOL programmers are retiring out of the workforce and not being replaced. When they go they typically take huge amounts of institutional knowledge with them, making applications difficult to decipher even when one does know COBOL.

But neither of these problems is inherit to COBOL itself. Rather it’s indicative of the design anti-patterns that governed how these applications were built, and how functionality is added to them.

Why can’t young people just learn COBOL?

As part of building a talent pipeline for TRU I’ve been developing technical screens that focus on code reading exercises. I write a small function (generally a common algorithm like a sort) in a language that would be foreign and obscure to most programmers (my favorite right now is TLA+, shout out to Hillel Wayne who has put in a lot of work to ruin my fun by making it less obscure!). I put it in front of them and ask them to talk to me about it. What is it doing? Is there anything wrong with its approach? How would you improve it?

Running these kind of exercises has taught me that in programming, size matters. When the function is small it doesn’t matter that the programmer has zero experience with the language in question. It doesn’t matter that they can’t even identify which language it is. They start digging into it and gradually they figure out what it’s doing. Then they connect it to patterns and approaches they know from the languages they do write code in and we’re able to have a sensible conversation about the program where they offer a lot of insight.

Even better: one time I decided to do this exercise with people who had no programming experience at all and — though their analysis lacked the CS specific commentary — they were ALSO able to figure out what it was doing.

But there’s a tipping point. The longer the function is the harder and harder it becomes for the programmer to relate it back to her own experiences. The 20 line program was a breeze, but the 40 line program was much harder for both engineers and non-engineers. They get overwhelmed by the foreignness of it and stop looking at the individual lines of code. Of course once you’ve crossed over to condensing huge amounts of steps into a single line using piping or method chaining the code goes back to being difficult to read. It’s really about right-sizing code so that the patterns of it can be detected, not making things as small as possible.

There is a rich pool of neurological research and theory around this effect. There are some studies that indicate that different parts of the brain are used when solving large and small problems. That small problems might involve more activity in long term memory. There’s research that indicates that when the brain receives too much information the area that controls both decision making and emotional regulation just shuts down, leaving you both frustrated, demoralized and unable to make rational decisions. (Google query of the year: load balancing the prefrontal cortex)

But this is not a neurology blog so really the only thing we need to accept is that code that is completely foreign can still be readable when its organized into small chunks. The problem with many of the COBOL bombs is that they are spaghetti. COBOL was designed to be loaded into machines with punch cards. It does not have the best code organization by default. It is verbose, with mandatory header sections that add length to programs and a syntax designed to map as close to complete English sentences as possible.

In 2002, Object Orientated COBOL became a thing. But contrary to popular belief, OOP is about code reuse, not organization. Plenty of OOP applications pile so many layers of inheritance on top of each other, across so many different files, that untangling what a given object is doing can be just as daunting as COBOL spaghetti.

(BTW, the opposite of spaghetti code is apparently “ravioli code” or code that is so tightly packed into objects and separated from each other it resembles ravioli …. you are welcome.)

As a friend of mine was fond of saying “there’s a lot of POOP in OOP.”

The Modernization Strategy

My preferred modernization strategy is always incremental improvement. Rewriting massive systems from scratch almost never works. The original systems were not built in this manner, so their replacements cannot be built in this manner. We don’t usually think of systems from the 1960s as being Agile, but these systems were in fact built starting with a MVP and adding more functionality as time went on. They were just built that way for social/economic reasons instead of an intentional project management strategy.

Therefore, when facing down a large monolith of COBOL, I prefer to break it into services. This means gradually moving functionality out of the monolith and into smaller, better documented applications that might still be written in COBOL or might be written in something else if there’s no particular advantage to COBOL.

This is not the approach that gets pitched and funded in the government. After all we have automation! We can convert all our COBOL into something else (usually Java) using a transpiler.

Except COBOL bombs are not a problem because they are written in COBOL; they are a problem because the number of engineers who understand what they are doing well enough to maintain them is shrinking. Transpiling poorly documented, poorly organized COBOL into Java just creates poorly documented, poorly organized, atypical Java code. The output of the transpiler is generally loaded with constructs and work arounds that are not intuitive to a good Java developer and may create hidden performance issues.

In other words: you’ve taken one problem and turned it into two or three problems. But we shouldn’t give up on using automation to assist in modernization based on this hard lesson. We just need to be smarter about what we automate.

How to Cut It Up

Success with COBOL modernization relies on a heavy investment in metawork — that is work that lays the foundation for other work. Before one can reengineer, one has to excavate. What dialect of COBOL is this written in? What are the business rules that govern the old system? What is it doing and how is it doing it?

This is where most organizations cut corners, often with the encouragement of their engineers. The first step in modernization is building a team that will be excited about the archaeology. Automation misapplied can double or triple the amount of work needed to produce a functioning application. If your team sees the metawork as something to just get through, they are unlikely to apply automation smartly.

Types of Metawork

Control Flow Graphs

What it does: A Control Flow Graph (CFG) is basically a map that traces paths through the application. They are useful when you want to build tools that help refractor code — particularly when looking for smart optimizations, dead code, dead variables, so on and so forth — because they highlight dead ends or duplicate pathways better than just the source text on the screen.

Tools to automate it: IN-COM Data Systems, IBM’s z/OS Program Control Flow, DMS Software Reengineering Toolkit

Trade-offs: With large and complex programs you need to understand enough of how the application is structured to divide it into sensible units. Otherwise you end up with graphs that are unreadable and therefore unusable. Done right though they provide a strong reference point that all other metawork can be based on. They can help clarify business rules and shave some time off the process of studying the source code.

Business Rules

What it does: Business rules are booleans that describe expected behavior and constraints. They are not specific to technology, but software often depends on a clear understanding of business rules. Therefore, legacy software is also a written record of the evolution of business rules.

Tools to automate it: Rational Asset Analyzer, D*Code, CM EvolveIT,

Trade-offs: Much of the work defining business rules is better off done the old fashioned way — interviewing the people who actually use and depend on the system. Automation in this space (sometimes called “business rule mining”) can be tricky. The output of such programs is not always sensible or comprehendible (see inline image). That being said, many of the tools available for this give you an interface that allows you to essentially navigate through the code from the generated business rules. Some even have CFG functionality built in.

Something I get asked a lot about is the other side of automating business rules. That is: not automating the mining of the rules themselves but using business rules to automatically generate a new application. So called Business Process Management (BPM) tools can be useful but it really depends on the scale of the application and your tolerance for vendor lock-in. Automatically generated code is rarely efficiently written or performance optimized code. If the application in question is internal, with a few hundred users and unlikely to see activity spikes, I think this shortcut could be relatively benign. But understand that applications generated through BPM solutions are more expensive to run, less resource efficient and very difficult to scale.

Formal Specification

What it does: A mathematically based way to both describe what a program does and verify that it does it. Formal specification is especially useful for concurrent and highly complex systems. It can be seen as a form of testing, but tests that are run before you write any code.

Another shout out to Hillel Wayne, who can help you figure this shit out

Tools to automate it: TLA+/PlusCal, Z notation, among many others

Trade-offs: Automation in this context means something a bit different. It’s very hard to automatically generate formal specs, so best practice is to have engineers write them. But once they are developed they can be used to run simulations similar to functional testing. The core difference is that you’re doing this testing before you’ve written (or changed) any application code. Formal specs ultimately paid dividends in that they can find bugs in your design and architecture before you’ve invested time and money spinning up servers (or buying mainframes). But the languages and tools necessary to write formal spec are pretty obscure and intimidating. Experts are few and far between and the training burden to get your own engineering team familiar with the necessary information is not an insignificant investment.

Building a Bomb Squad

With all this in mind, what should automating the modernization process look like?

If you know what the application does and how it does it: Select one set of functionality that can be separated out from the rest of the application. Choose the language that best suits that process — factoring in size of available workforce, hosting environment requirements, family of specialized libraries, modules, and open source projects, etc (this might still be COBOL). Figure out integration with the legacy system and develop an API strategy. Develop formal specs and write tests using observable inputs and outputs from the legacy system. Build. Migrate. Repeat with another set of functionality.

If you understand the general shape of the code: Use CFGs and/or business rules extraction to start learning how different parts of the code interact with one another. Look for places where internal interactions can easily be replaced with an API.

If you have millions of lines of code you know nothing about: Use Hercules or rent time on a mainframe to spin up a testing environment for personal use. Spend a lot of time with application users observing their process (what they do and how they do it). Select prioritization criteria (some groups will want to focus on modernizing the most important systems first, some groups will prefer to start with the lowest risk first). Pick one set of functionality based on your user research and figure out how it is represented in the code base. It may not actually be one set of functionality, even if it appears that way to the user. Trace the edges of just that set of features and use the exercise to understand the general shape of the code.