Efficient Process to Extract Legacy Data from Mainframe System

kaushik M
Mindboard
Published in
2 min readMay 24, 2018

Objective: Legacy data migration through implementing Mindboard’s automation framework

Background and Solution: The Department of Human Services for the State of Maryland (MD-DHS) needed to move approximately 800 data sources (including DB2 tables and fixed/variable-length files) from a legacy mainframe system to a newly designed and provisioned Hadoop-based system (Hive). At the time of Mindboard’s engagement into the project, each data source required at least 10 hours of migration effort (data conversion, data transformation, creating the Hive schema, and a bunch of miscellaneous transformation steps before the data could be loaded from the mainframe to Hive).

Mindboard’s data engineering SME analyzed and documented the fundamental issues for the mainframe. Some of these items included:

· Understanding the data format: mainframe data was stored in EBCDIC, while the preferred format for Hadoop is ASCII

· Understanding and cataloging the data types: the mainframe used a variety of binary data types (such as packed decimal) which needs to be converted before use on Hadoop

· Understanding, clarifying, and making any amendments to cataloged metadata: metadata was stored in COBOL Copybooks and contained a variety of nested clauses

The Mindboard solution was implemented using a Hive schema. This framework handles data challenges from mainframe systems and provides a user-accessible bridge for building, blending, cleansing, and transforming mainframe data before it is ingested into Hadoop clusters. The implemented solution has built-in support for EBCDIC data, complex Cobol copybooks and mainframe record formats (VSAM, fixed, variable, packed decimal, etc.). It also lands mainframe data to HDFS in native format –preventing a need for pre-processing and translates data before transmission and staging.

Mindboard’s solution automated the previously manual migration process, reducing the needed project hours from 8000 hours to just under 3000 hours. An observed cost savings of 60% allowed MD-DHS to explore Mindboard’s expertise in several other areas especially in data migration projects as the Mindboard’s solution worked without dependency on mainframe SME knowledge or resources.

--

--