How To Get Rid of Data Integration Option Fatigue in Just 10 Minutes

Kelly Kohlleffel
Hashmap, an NTT DATA Company
9 min readAug 26, 2020

Decisions. Decisions. Decisions.

We make them every moment of the day, but the possibilities can simply be overwhelming when you start down the path of deciding on a modern data stack.

I recently led another data integration enablement workshop for a client that decided to move their data products and data applications to a cloud data warehouse but was struggling with deciding on what they should select for a data integration tool — a crucial part of the equation.

Based on the number of requests we have gotten over the last couple of years and are continuing to receive, this client’s challenge is not a one-off. Everyone we speak with is voicing similar questions and concerns, but at Hashmap, we can only do a limited number of enablement workshops (we do them remotely, they are complementary, and we schedule them based on receiving qualified requests).

So in the interest of scaling some of the concepts, considerations, questions, and approaches that we discuss in the workshop, I thought it would be useful to walk you through a Hashmap Data Integration Workshop.

Only 435,897 Possible Combinations

Competition in the data integration space has ramped up, and our clients get a never-ending stream of vendors knocking on their door and pitching their wares. After a while, they all start to sound the same. How do you possibly make sense out of the chaos?

Here’s how I think about the level of complexity for this decision:

  • 12 possible data integration vendors (we could probably all name at least this many).
  • 10 unique data sources (a really low estimate for most organizations).
  • 7 use cases that each vendor says they can solve for.
  • 5 client use case patterns.
  • 3 distinct data integration skillsets.

You can do the math yourself or hit the easy button (my route) using this combination calculator.

And yeah, 435,897 possible combinations is pretty daunting, but it’s not as bad as it looks.

Pete and Repeat

One of the first workshop activities we start with is sharing the type of questions we are commonly asked by clients across industries and also add in any others that are specific to our client. Here are some examples from a recent workshop:

  • What do you recommend for getting data to the cloud and into a cloud data platform?
  • How do I distinguish the subtle differences between the various tools and vendors?
  • We are impressed with “name your flavor of cloud data platform,” but can you help us with data acquisition, data movement, data integration, data transformation, data engineering, or data pipelining from our source systems on-premise?
  • Can you recommend that one “super-tool” that does it all?
  • What about data virtualization solutions, how do they fit into the mix?
  • How should I approach a bulk migration to the cloud that is 20TB+?
  • What is the best way to acquire incremental loads and ensure a reliable change data capture process?
  • What about my ever-increasing cloud-based data sources?
  • How do I automate my environment and ensure a sustainable DevOps and DataOps process?
  • What are the best practices for managing the associated source code?
  • How do we incorporate orchestration, governance, lineage, and security, which are top considerations?
  • We are already an “abctool.com” shop and have skills in that area, can we and should we use that solution?

As we talk about the questions above, there are several foundational concepts that are reviewed in the workshop, including “ELT over ETL” and “Get to Know Your Cloud Data Warehouse.” I wrote about these in a previous post, and I recommend you check them out later on.

Active Workshop Participation is Critical

A critical team activity during the workshop is to explore together a range of use cases that are linked directly to the data integration questions and requirements. Everyone is encouraged to actively participate and we typically explore together many key dimensions related to the use cases. Topics that we discuss and then map out together include:

  • Description — Patterns, Data Flows, Acquisition, Transformation, Persistence, Consumption, End User Examples, SLAs, etc.
  • In use today or net new?
  • What works well?
  • What are you most struggling with?
  • Data Source/s — What, Location, Number, Volume, etc.
  • File Types
  • Target/s (cloud/s and data platform/s)
  • Bulk Load Migration and/or Incremental with CDC
  • Focus — Innovation, Cost, etc.
  • Perceived Risk — People, Process, Technology
  • Priority and Urgency
  • Expected Time to Value
  • Business Impact & Expected ROI

Demystifying the Data Integration Tool Landscape

Once we’ve completed the use case team activity and determined the top business outcomes and drivers as well as common patterns, we dive into the data integration tool market itself. Importantly, we do this from a perspective of “what will work best for you” motivation since we are 100% vendor-neutral and do NOT resell anyone’s products or cloud services.

We’ve taken our share of “lumps” along the way trying to force-fit a “square data integration tool peg” into a “round use case hole.”

Importantly though, we have significant hands-on, practitioner-based experience across all categories and tools. And believe me, we’ve taken our share of “lumps” along the way trying to force-fit a “square data integration tool peg” into a “round use case hole” — simply because a client had already selected a tool and required that it was used for a project.

For any engagement, we bring that guided expertise experience, but you also get some “tough love” when needed. We’ll tell you when there are better approaches or tool options than what you may have pre-selected. Sometimes it’s welcomed, sometimes it’s not, but you can count on getting that perspective along with pros and cons. Ultimately, it’s your choice to make the call.

We’ve subjectively grouped the data integration tool landscape into 10 major buckets (expanded from a previous 8) and provided several examples of vendors, tools, or approaches for each category. Note that this is not an exhaustive list of every vendor, but a representative sample (the infographic needed to fit on the page).

Also, these categories may appear to be mutually exclusive, but in reality, they are not. There can be a great deal of overlap between categories and vendors may span multiple categories. A good example of this is Talend which provides traditional ETL tooling, and can pushdown transformations to a cloud data warehouse. They also have a cloud-centric solution, Stitch, in their portfolio.

As we talk through the options in the context of the client’s use cases, we tend to gravitate towards 5 guiding themes:

  1. Don’t get locked into a “unicorn” tool mindset.
  2. Stay outcome-focused and keep it simple until you can’t.
  3. Determine what the correct balance of technical and business fit is for your organization.
  4. Be realistic about your delivery capabilities and get help if you need it.
  5. Think in terms of a 5S model: Simple, Speedy, Sustainable, Secure, and Self Serve.

Mapping and Alignment | Example Artifacts

Our clients are rarely able to find that “one tool that does it all.” The reality is that your situation will likely vary significantly even from another company in your industry based on use case priorities, organization readiness, skillsets, executive management directives, etc. Not to mention the fact that bringing in a brand new tool into a large enterprise could take 3–9 months depending on the number of gates that the vendor is required to navigate through (technical, business, contracting, pricing, support, infosec, etc.).

Below are some examples of workshop output based on the use case activity, data integration tooling review, and overall priorities within the client’s organization.

Requirements Mapping for Highest Priority Use Cases

Here’s an example with just three use cases, but you can start getting an idea of how the final output shapes up. These requirements ultimately drive how we think about personalized data integration tool recommendations.

Example of Use Case Mapping

Use Case Value / Risk Matrix

Anything that falls in the upper right quadrant has a decent executive “wow factor” and is also lower on the risk scale. Risk is all-encompassing and takes into consideration data availability, established patterns and processes, skillsets, anticipated timing to MVP and production, and any other factors that are unique to an individual use case.

Example of the Value / Risk Matrix

Alignment to Data Integration Tool Categories

In the example below, you can quickly get an idea of how these use cases aligned with the data integration tool categories that we’d discussed earlier.

Use Case & Data Integration Tool Heat Map

I’ll reiterate that this map rarely looks the same from workshop to workshop and customer to customer; many things cause these recommendations to shift including existing skillsets, previous decisions to standardize on a tool, length of time to get a new tool “approved”, the introduction of new product features for an individual data integration tool, or perceived company viability.

Individual Use Case Detail

We also provide a detailed use case one-pager to bring everything together for the top 1–2 use cases. This allows for quick socialization of the use case across the organization for anyone that was not directly involved in the workshop.

Netezza to Snowflake Migration Use Case Detail

Final Thoughts and Next Steps

I hope this initial workshop “walkthru” gave you a sense of how we think about the data integration space and may simultaneously demystify it a bit. Whether you are new to data integration or have been doing Informatica mappings for 25 years, I’d encourage you always be outcome-focused, be realistic about your organization’s skills and capabilities, and don’t think about the cloud as a lift-and-shift of your current ETL environment. Instead, use the opportunity to modernize and simplify as much as possible.

3 Suggested Next Steps

  1. Listen in to our recent Hashmap on Tap podcast on the same topic: #34 Remedying Data & Cloud Technology Option Fatigue with Hashmap Recommender
  2. Also, try out our free Hashmap Recommender. It’s a collection of quizzes designed to be a starting point for your data architecture journey. While these quizzes will never replace a tailored approach from an actual expert, they’ll give you an idea of the tools and options that are right for you and your teams. We have a Data Replication Tool Recommender and a Cloud Data Warehouse Recommender.
  3. If you’d like to request a personalized Data Integration Workshop for your organization, feel free to submit a conditional request here, and we’ll do our best to schedule a session with you.

Feel free to share on other channels and be sure and keep up with all new content from Hashmap here. To listen in on a casual conversation about all things data engineering and the cloud, check out Hashmap’s podcast Hashmap on Tap as well on Spotify, Apple, Google, and other popular streaming apps.

Kelly Kohlleffel is responsible for sales, marketing, and alliances at Hashmap and delivers outcome-based data and cloud consulting services across 20 industries. He also co-hosts Hashmap on Tap, a podcast where special guests explore technologies from diverse perspectives while enjoying a drink of choice. He enjoys helping customers “keep it simple” while “reducing option fatigue” and delivering high-value solutions with a stellar group of technology partners. You can connect with Kelly on LinkedIn and follow him on Twitter.

--

--

Kelly Kohlleffel
Hashmap, an NTT DATA Company

Avid technologist, open-source software standard bearer, devoted husband and dad