Migration to Cloud and Legacy Systems: Challenges and Opportunities

ODSC - Open Data Science
4 min readDec 11, 2020

The emergence of cloud computing has brought about a real revolution in the way of storing and accessing data. Users don’t have to store their files on private servers anymore. Cloud computing is a virtual space that connects users all over the world. By using Cloud and legacy systems, companies can share resources, software, and data all over the Internet.

With Covid-19 accelerating virtual working trends, moving to the Cloud has become a baseline digital transformation imperative. It’s now more important than ever to collaborate and innovate on a unified platform that’s Cloud-native.

As organizations move to the Cloud, the need to address what to do with legacy systems that aren’t Cloud-native becomes a priority. More often than not, an organization’s target Cloud environment won’t support their legacy systems. As companies consider options for their legacy systems, they have a few including:

  • Rehosting aka “lift and shift”
  • Replatforming aka “lift and reshape”
  • Refactoring aka “re-writing / decoupling”
  • Repurchasing aka “replace / drop & shop”

Regardless of the option selected, one can guarantee it will be costly — in time/effort and dollars. In fact, in a recent survey among CIOs, conducted by Logicalis, more than half spent 40% to 60% of their time managing legacy IT instead of focusing on or towards strategic activities. From this alone, we can deduct that legacy technology is a significant barrier to digital transformation and innovation.

Many companies continue using outdated systems with the old “if it’s not broke, why fix it?” adage. When actually there are quite a lot of reasons to modernize your legacy systems. The cost of running legacy software being the foremost reason, especially with free open-source alternatives available. Legacy systems also require a specific technical environment, including hardware. This will result in long term infrastructure maintenance spending remaining high, as compared to modern cloud-based solutions. This said, the fact is there are often hidden costs overlooked to maintaining legacy, and these include:

Security: legacy systems lack modern security features that can leave them less resistant to cyberattacks, harmful programs, and malware, which is only logical. If the software solution has been around for years, attackers have most likely had enough time to get familiar with the code and find its vulnerabilities.

Integration/Compliance: modern technologies are integration-ready by default and API vendors typically provide support for most of the programming languages and frameworks out of the box. However, connecting legacy software to a third-party tool or service often requires a significant amount of custom code.

Lost business opportunities: when maintaining legacy, you leave less room for innovation by the nature of your talent stagnating in the “same old same old” day-to-day use of technology. Instead of adopting new technologies and business models, you are stuck with your old software, letting new opportunities in your industry go unnoticed. How might this leave you open to competitive threats by those outperforming or looking to take over your market share?

This is why we advocate that by adopting open-source Apache Spark, there is no faster, easier way to migrate to the Cloud. When it comes to migration for cloud and legacy systems, your organization’s looking at millions of lines of code to convert to transition your current analytics models and workflows alongside your actual data migration. Frankly, only Apache Spark can handle these volumes in a timely, efficient manner. Also, when it comes to data science modernization, only Apache Spark is a comparable toolset to legacy SAS.

Even understanding Spark’s enabling capabilities, organizations still struggle to understand what to migrate when and how to the Cloud. We recommend a lift and reshape (replatform) as a first step, and then if necessary a refactoring to optimize your data structure. The most successful migrations we’ve noted start with migrating your code to a Cloud compatible technology as the first course of action. For example, converting SAS code to native PySpark code becomes the priority before optimizing your code or your analytics models. You’ll want to automate the code conversion process to ensure it happens on time to ensure your staff doesn’t continue to code in SAS and force a never-ending migration state. Once the code is Cloud compatible, the migration of your data from various sources to a target architecture that is unified and centralized, such as a data lake, can then be done more easily.

With your data now in the Cloud, you are able to retire the legacy workflows and modernize new processes in your faster and more performant environment. Spark delivers 100X the compute power, processing data faster than anything else on the market, making it the defacto option.

As you go through these steps, your people will be adopting the new toolsets, now in open source, and innovating and collaborating unlike ever before. When it once took your data scientists hours to run a model, it now can be done in seconds with Apache Spark. Imagine the innovation power behind the opportunity to leverage both structured and unstructured data in data modeling than now have no barriers between data domains.

To accelerate code migration with automation for cloud and legacy systems, organizations can benefit from a look at innovations such as Wise With Data’s SAS to PySpark automated migration solution. As the world’s only fully automated solution, SPROCKET is fast, simple, and accurate, producing native PySpark code at 95% automation. We’d love for you to see SPROCKET in action, contact us at hello@wisewithdata.com for a demo.

--

--

ODSC - Open Data Science

Our passion is bringing thousands of the best and brightest data scientists together under one roof for an incredible learning and networking experience.