How to Do a Data Conversion

Dec 3, 2019 · 10 min read
Photo by Markus Spiske on Unsplash

So much has changed since the 1980s, one would think data conversion techniques from that era would have no use in 21st-century tech, but you would be mistaken. The longevity of this workflow is surprising. In today’s world of extreme complexity, their utility is invaluable. The workflow is relatively simple, but it does depend on engagement with the users which can be challenging.

The user’s role in this process has not changed much. Engagement with the user to help validate the data conversion has been a lifesaver over and over again for me. The durability of this process has its roots in leveraging business users of the data to ensure a successful data conversion. The confidence of knowing one is in possession of an assembly line methodology to convert data from one system to another can make the people portion of the methodology easier to tackle.

Data is everything. Frankly, without data, there is nothing. There exists a lot of complexity in the user interface, but that is the window into the data. There can be quite a bit of logic to assemble data into reporting and dashboards which are useful to decision makers, but that is a snapshot of the data. There is the moving of the data to the user interface/dashboard, but that is the transportation layer. Of course, there is also the storage and organization of the data, but again you need some data for this to matter.

So there it is no argument, data IS everything! Without data, you need to build a system to collect data or buy data OR CONVERT the data you have. If you have data and you choose to throw it away rather than convert it, then that data is presumably crap OR you are unaware of how to get at the information in the data or … However, since data IS everything, you will most likely need to convert it. And there is a relatively simple workflow to get you through the data conversion.

This data conversion checklist primarily revolves around the business world. It can be adapted, but the methodology presumes the data is organized to use in business systems. These presumptions are specifically about money flow. There will be entities that pay money to the business, customers, and entities receiving money from the business, vendors. These balances will be stored in some kind of accounts receivable and payable data structure.

The methodology defined here is largely about the project management necessary to do a successful data conversion. The specific code and storage mechanisms will not be addressed here. Whether the conversion is from Oracle to SQL Server or Access to MongoDB, business data still revolves around money, customers, and vendors. It is then skinned with some user interface.

If you get the customers, the vendors and the money correct most everything else just falls in place, which is not to say there are no additional data structures that will need to be converted. A medical clinic would have patients, doctors and insurance providers along with claims adding complexity to accounting for money with multiple responsible parties to bill and manage. Modern-day ERP (enterprise resource planning) systems slice and dice the data in many different ways. Ultimately though, capitalism is about the money divided between customers,incoming, and vendors, outgoing.

Whatever the business model, the data is about the universe of money coming in and going out. Getting this right is job one in the data conversion. All else will flow from getting this correct. If you get every single iota of data converted correctly from the original system to the new system, EXCEPT do not get the money correct, it will not be enough. Almost everyone involved will consider out of balance money as a failed data conversion or at best a flawed one. Get the money spot on and other flaws get overlooked. You will learn that getting the money right usually means getting the rest of it right due to the following workflow.

The system being converted, henceforth to be known as the “source” system, should be in some state of working order to migrate to the “target” system. You need some kind of functionality to validate the conversion. The source system is where you are, so you cannot know how well the transition has gone without the yardstick of where the true starting point is. Systems degrade which is why one converts them, but they have to be in some working order for the business to continue. There should be functional reports on the counts of customers, vendors, and account balances at the very least.

If there is not a good starting point, one must be created. Go back to the last paper reports that balanced, if necessary and use those. It should not be more than one accounting quarter that one is going back, otherwise, a “perfect” conversion is likely not possible. When we are talking about multiple closes of the month beyond one quarter with accounting structures not in balance, then one has to ask oneself about why a system that the entire business needs to operate has been neglected for over 100 days.

This level of data management dysfunction can be a problem. A starting data problem can be too great for any particular methodology to surmount. Also, even if we deliver a good conversion to the new target system the people dysfunction will likely persist. However, this dysfunction can also be the reason you landed this contract to do the conversion, so deal with it as best you can.

In short, a starting point needs to be established one way or another so that things can be measured, if the data conversion is going to be validated. One can do a data conversion and then HEAVILY audit all of the user interfaces to establish whether an okay conversion has been done. However, certainty in the conversion requires some certainty of the starting point. My experience has been there IS a starting point and data yardsticks that can be established in most scenarios, but there is a lack of “cooperation” somewhere in the data’s chain of custody. I call these “hostile” conversions.

Hostile conversions like this can be a minefield because they indicate other dysfunction in the business that may be difficult to navigate. This workflow cannot account for these things and presumes a certain level of cooperation between all the parties. The strategies and tactics to successfully pull off a “hostile” conversion are too broad for this essay. Notwithstanding the aforementioned, the following will be true of the source system. The source system will have basic functional reports on the counts of customers, vendors, and account balances and preferably with aging.

If possible, spending time with the users of the source system and interviewing them on how data is organized can be useful. Users know their data to some extent, so this can be helpful. You will interview them about what to expect in the data, especially the free form data fields, but you will also need to write some code to interrogate the data and verify that what the users tell you is correct. Inevitably, there are bits of information in the data that are unexpected. By writing code to pull out pieces of data that you are concerned about it will help you know EXACTLY what is out there, so your code is more bulletproof. This also can help gain user’s trust by identifying outlier scenarios they were unaware of or forgot about. The fact that you find them increases trust that you are doing your due diligence.

The users are needed to be auditors and testers to validate the outcome. If you are unable to enlist the users, you will have to become a user yourself at some level to audit the data conversion. Human eyes and validation are part of a successful data conversion. The users will determine in the end whether things are successful, so get them involved early.

It is a “given” once the source system has been mapped to the target system’s data structures that code is going to be written. Depending on the source and the target this could be a lot of code or just a little. Nonetheless, this code makes up the data conversion. Comparing the source system through its user interfaces to the new target system’s user interfaces will be a part of the process to validate this code’s accuracy. This audit process will also validate the new transportation layer by having users making requests of the data store.

To exercise the new code a day is selected for a pre-conversion test using data from the month-end close. This is the jumping off point for the conversion, the closing of the month. This prepares everyone for the “backlog” situation with this timeline. All conversions will have some backlog after the data is converted.

Backlog is the amount of time the live data conversion will take. This is the shadow time between the old and the new systems. Month ends lend themselves to this procedure as they are a natural cut-off. During this time the business is potentially still operating and everything is being done on paper and/or the old system. It is most likely only about a day or two, but it can be as long as a week depending upon complexity. Any data entry that was done instead of using paper will need to be entered into the new system.

However, the pre-conversion month end is for testing. There is no need for a backlog, because this is a pre-conversion test. Everything that will be done to the data to convert it and validate it will be done at this time. Also, if there are sufficient resources running the two systems in parallel for a time can help expose problems with the new workflows that can be addressed BEFORE going live.

It is imperative that you treat the pre-conversion test as if it was live. The most important part of this is exercising the system and checking on the conversion data. Check on the customer counts by running reports in the new system and comparing them to the old system month end reports. Check on the vendor counts the same way. Finally, look at all the financial reports and make sure the new accounts receivable matches the old and do the same with the accounts payable. If everything balances then you are on your way to a successful conversion. Whether or not everything balances, act like the numbers do NOT balance and start to look for discrepancies in ancillary data structures. By doing this, the actual live production conversion will go smoother.

Users can help identify what accounts should be reviewed. Ask the users, what customer is their oldest. This customer will have the most history to be converted and so a lot of data will have been run through the conversion process. Which customer buys the most in a given month, because this is another customer who will be pushing the limits of possible code boundaries like number of invoices and balances. There are always some credit balances so go look at those as well, because these are usually oddball accounts that can trip up your conversion code.

If this is all good, then do the exact same thing with the vendors. Check the oldest vendor for validity and check the vendor that generally receives the most monthly orders from the business. Again these accounts are the ones most likely to trip up the conversion code by bringing in outlier data sets. Finally, do some random checking as well. Spread it out over the breadth of the customers in some kind of random way, but make sure to do it. A one percent audit sounds extreme, but one thousand accounts only requires you to check TEN to achieve a one percent audit.

While you are doing these audits, verifying things like addresses and other demographics become easier. Concentrating on the numbers of customers, vendors and balances and physically verifying the data forces the users into the new system. This can make the post-conversion go better as well, because the users are being trained by using the system. There is no better way to learn a new system.

Finally, make sure the old system is completely retired. Using the old system for look ups may seem like a way to save money, but the costs in productivity can be large. The users are going to have a learning curve with the new system. This learning curve can be made steeper by going into the old systems and “reminding user brains” of the previous interfaces. New systems should have hooks in them to support “look-ups” of data that was not necessarily completely converted into the new system, like old transaction histories or retired vendors et al.

By making sure the new system is the one the users are using exclusively post-conversion really helps with training and acceptance. The above methodologies may seem a bit archaic, but the fact is that the users are the ones that will be “buying off” on the new system and the data conversion. Having them involved from the start means they become stakeholders in the successful data conversion. Nobody purposely wants to be a part of a failure, so they become more motivated to find problems early on.

Once there is buy-off, the backlog will need to be entered. This should not be too painful, especially if the users have been in the new system auditing and validating. They will know their way around in the new system and it will be further reinforced by the backlog data entry.

This methodology has worked for me on innumerable data conversion projects. The attention to details like money and customer counts forces scrutiny. The engagement with the users to audit and validate, first in the test conversion month and then in the live conversion month cast them in the role of “allies” rather than adversaries in the data conversion. This might pay more dividends then any other aspect of the methodology, since people are the soul of any successful business.

The Old Coder

Ruminations on tech from a dinosaur.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store