Why We Should Implement Machine Learning For Legacy Products

satyabrata pal

Published in

ML and Automation

4 min readOct 12, 2019

Even in 2019 more than 90% of the world’s software and hardware run on legacy code.

As reported by Reuters in 2017 95% of ATM machines still used COBOL .

Infographic from Reuters showing percentage of ATM using COBOL

In the same report it was also pointed out that in 2017 the average age of Cobol developers was 40–55 years. Most of these developers who participated in this study at that time will soon retire.

Average age of COBOL developers as reported by Reuters

This shows that the volume of legacy code in the software products around the world is huge and the talent required to maintain these software is getting less day by day.

This is where Machine Learning techniques can be leveraged to overcome some of the challenges faced in maintaining, correcting, testing and documenting legacy code and thus saving expenses. Here are some use cases.

Code Review

The number of lines of legacy code today is huge. For instance the total number of lines of COBOL code as of 2017 was 220 billion.

The amount of legacy code is huge and number of developers working on these codes are less. Which means that the code review is going to be a daunting task for the existing developers.

Here Machine Learning can offload some of the code review tasks.

An algorithm designed for code review can learn the review patterns from existing code repositories. This algorithm can then suggest review changes on new code.

In fact similar tool already exists and it’s name is “Deepcode”.

Deepcode can learn the code review patterns in an existing code repository and then on a new code it can suggest the places where there is a possibility of improvement or corrections.

Such a system reduces the burden of maintaining legacy code in an organization or team where there is a shortage of developers.

Refactoring

Most often the code developed 15–20 years ago were monoliths as microservices were not a rage back then. Refactoring such code based on monolithic architecture proves to be difficult.

Since these legacy systems had tightly coupled architecture, even a tiny bit of change done somewhere in the code starts a chain reaction of code failure in unintended places.

To make sure that everything is in working order post change, the typical development cycle is as follows: Develop → Test → Repeat

Here Machine Learning can play a vital role to automate the testing of the developed component and make sure that the changes made in one place do not affect the code elsewhere.

Using Machine Learning algorithms to do the testing can free up the developer to focus on refactoring the code rather than worrying about developing the test automation suite.

Debugging

A study by Cambridge University suggests that developers spend 49.9% of their time on debugging . This figure in case of a legacy code may be much higher.

Here once again Machine Learning can help by finding the possible cause of a bug and suggest fixes for that bug. This can save a huge amount of time on the developer’s part and also save costs to the project.

A Machine Learning based tool named as Sapfix by Facebook actually does that. A detailed article about how the tool works is at Facebook’s Engineering blog.

Automation Testing

Testing a legacy application is challenging since automation of the tests is not always effective or sometimes not even possible because legacy systems lack a web based GUI and thus off the shelf automation tools like selenium can’t be used to create automation tests for such systems.

In such cases the automation engineers end up creating custom test tools which again requires lots of effort and time when we factor in the day to day deliverables.

A solution to such a problem can be to design Machine Learning based testing tools which can carry out automated tests on such legacy products.

Such Machine Learning tools can crawl through the output files generated by the legacy code and find anomalies in these output files.

Another use case would be to use such Machine Learning testing tools to watch a certain batch of background processes and then detect any anomalies in these programs and thus carry out tests in an automated way without extensive coding done on the automation engineer’s part.

Conclusion

The uses cases listed here are some of the possible scenarios where machine learning makes life simpler for people or organizations who have to deal with legacy software.

There are many more possible uses of machine learning in context of legacy software maintenance, development and testing and if you happen to think of any of such use cases then do list them in the comments section.