Legacy software series — Part 1— What is legacy software?
In this article series I’m exploring legacy software. The software life-cycle, software aging, causes of software aging and modernization methods. These articles are based on my master’s thesis, which discussed microservice architecture in the context of legacy software modernization. Feel free to read the whole master’s thesis here. The articles are short and easy to read. Microservices are not discussed in this series.
This, the first article, describes what legacy software are and the different generations of legacy software that exist.
Definition of legacy software
In the 1960s, digitalization of companies started and has continued to this date. Old software systems are made using the languages, hardware and supporting software available at the time, the support for those is often long gone. Most of these systems had to evolve because of the changes in business models and user requirements, as well as in technological advancements (Liu, et al.,1998). These old software systems are business critical and long lived. They are often referred to as legacy software systems (Dayini-Fard, et al., 1999).
The dictionary defines legacy systems as “systems that are in one way or another outdated, but still in use” (YourDictionary, 2020). There are many signs that can tell a system is a legacy system (Pressman, 2015). For example:
- Outdated hardware.
- Missing or lacking documentation.
- Missing source code.
- Non-existent version control.
- Original developers have abandoned the company.
A legacy system can be divided into system hardware, support software, application software, application data, business processes and business rules. Replacing one piece of this complex system can have unforeseen consequences. The hardware that the system was developed with can be out of date and expensive. Special skills are required to maintain the hardware and it is not compatible with other hardware used in the system. The support software such as OS, and drivers used by hardware, and development environments used for developing the system are out of date and contain security vulnerabilities. The application software is the main application that provides the business services to clients. It is often made up of many applications that have been created in different times by many different teams. Old legacy systems often have substantial amounts of accumulated data. The data can be partly corrupted or inconsistent, caused by the changes made to the application during years of maintenance. Business processes are often built around legacy systems, that can enable and constrain their performance. The business rules define how a company does their business (Sommerville, 2016). For example, bank can embed rules for giving out loans into their application. Sometimes companies lose their documentation of the business rules, and the only place they exist in is the application.
Amount of legacy code
It is very difficult to estimate how much legacy software still is in use. Some indication can be drawn from the estimations that in year 2000 there was still 200 billion lines of COBOL in use (Kizior, et al., 2000). According to the survey done by Computerworld (2012), 60% of the companies surveyed used COBOL. COBOL is an old software language used from 1960s to 1990s, especially in the financial industry. There are still many software systems built with COBOL and if they keep working there is no need to replace them. However, there is an increasing risk since the support for old systems is running out. COBOL is no more a mainstream language and there are no new developers, that would learn it. In the next decades old COBOL systems will need to be replaced with modern versions (Kizior, et al., 2000). COBOL is just one of the many examples when it comes to legacy software systems.
Types of legacy systems
According to Langer (2016), “a legacy system is an existing application system in operation”. What this means is that there can be generations of legacy systems in an organization. The generations of legacy systems closely follow the generations of programming languages. There are five of these programming language generations (Langer, 2016).
1. First generation
Machine code performs actions using binary symbols between the machine and the programming language. It is very unlikely to encounter a legacy system with first generation programming language anymore.
2. Second generation
Assembler languages. These languages translate higher level languages to machine code that the hardware can understand. Some mainframe computers are still running on assembly code.
3. Third generation
Higher level symbolic languages such as: COBOL, FORTRAN and BASIC. These languages use English keywords and are often specialized. For example, FORTRAN is specialized for mathematics and scientific work, while COBOL was designed for business applications.
4. Fourth generation
Even higher-level languages that use English keywords, are more focused on the output of a program rather than how statements need to be written. Because of this, these languages were easier to learn for less technical people. Examples of 4th gen programming languages are Visual Basic, C++ and Delphi. These languages have features such as database querying, code generation and graphic screen generation.
5. Fifth generation
These are known as the rule-based code generation which means artificial intelligence software. These software use knowledge-based programming, where developers do not tell the program how to solve the problem, but rather the programs learn on their own.
In most cases, the legacy system is made with either 3rd or 4th generation programming language (Langer, 2016). There are different tools and practices involved in modernizing and replacing the different generations of legacy systems.
Dayini-Fard, H. (1999). Legacy Software Systems: Issues, Progress, and Challenges. IBM. Retrieved from www.cas.ibm.com/toronto/publications/TR-74.165/k/legacy.html
Kizior, R. J., Carr, D., & Halpern, P. (2000). Does COBOL Have a Future? Retrieved from https://web.archive.org/web/20160817115437/http://proc.isecon.org/2000/126/ISECON.2000.Kizior.pdf
Langer, A. (2016). Guide to software development Designing and Managing the Life Cycle, second edition. Springer. doi:DOI 10.1007/978–1–4471–6799–0
Liu, K. (1998). Report on the First SEBPC Workshop on Legacy Systems. Retrieved from www.dur.ac.uk/CSM/SABA/legacy-wksp1/report.html
Pressman, R. (2015). Software Engineering A Practioners Approach. New York, NY: McGraw-Hill Education. Retrieved from http://ce.sharif.edu/courses/98-99/2/ce474-2/resources/root/Roger%20S.%20Pressman_%20Bruce%20R.%20Maxin%20-%20Software%20Engineering_%20A%20Practitioner%E2%80%99s%20Approach-McGraw-Hill%20Education%20(2014).pdf
Sommerville, I. (2016). Software Engineering Tenth Edition. Pearson Education.
YourDictionary. (2020). Legacy Software. Retrieved from https://www.yourdictionary.com/legacy-software