How Coalesce Makes Snowflake More Amazing to Work With
This article was inspired by one of my excellent Snowflake colleagues I worked with for years, John Gontarz, post about why he joined Coalesce. Part of my focus on Data Thought Leadership is to “try” and keep up with the endless innovations happening. It is becoming more and more difficult to stay abreast of the latest “real” data concepts, tools, and service trends. It is even harder to evaluate which of these tools, technologies, trends actually add true value to organizations.
Late last year I heard that some of the previous WhereScape team who we at IT Strategists partnered with in 2018 and 2019 had formed a new startup named Coalesce.io. Once they came out of stealth and we went through a hands on demonstration of the product I was blown away similarly to how I felt when I first discovered Snowflake back in early 2018 and actually put it to the test.
If you have not heard of WhereScape, back in 2018 and earlier it was a completely different solution around full data pipeline management with automated documentation and full automation at the object level. Unlike the large dominant previous ETL data pipeline players like Informatica, Talend, Pentaho, at the time they had taken an approach more at the object level than the pipeline level. This was something that was unique and truly different that came with pros and cons. Unlike other products on the market that are either on-prem only or cloud-enabled, Coalesce is built as a Cloud First product taking full advantage of the Cloud’s scalability, performance, and ease of use. Not to mention, it just takes a few clicks to get started with a free trial.
Then in September 2019 (Right after we at Fairway/ITS were acquired by Accenture), WhereScape was acquired and sadly many of their incredibly knowledgeable expert team left within days, weeks, and months. This basically crippled WhereScape’s product roadmap and innovation, and from our viewpoint it never really grew after that. https://www.businesswire.com/news/home/20190919005152/en/Idera-Inc.-Acquires-WhereScape-Advancing-Portfolio-of-Cross-Platform-Database-Tools-with-Data-Infrastructure-Automation
John mentioned the following, which I think most of us Snowflake experts and Snowflake Data Superheroes observe all the time: many, many customers still struggle with migrating complex data pipelines from truly legacy systems not built for the cloud. The ETL/ELT market is relatively fragmented and littered with overpromises of cloud solutions that really just do not work effectively. There remains massive complexity and confusion around how to migrate and build complex data pipelines on Snowflake. John did a great job of articulating these challenges in his post, Why I Joined Coalesce from Snowflake.
The main problematic approaches we agree on are:
- Creating and managing SQL (or Python, or insert language of choice) scripts manually
- Implementing a full stack ETL tool
- Cobbling together a set of open-source transformation tools
From my view and based on our Coalesce usage internally and with implementations, Coalesce is the first true game-changing holistic data pipeline solution for Data Processing for Data Warehouses, Data Lakes, Data etc. While there are many other solutions out there, none of them are as easy to use and as easy to maintain as Coalesce from our testing.
John also outlined these key amazing automations that come out of the box with Coalesce. This is HUGE and really an incredible move forward to more automation around the data pipeline that I have not seen before in any other solutions.
- Want to build a Type 2 SCD following data warehouse best practices? There is a node for that.
- Want to create a streaming pipeline using Streams and Tasks but don’t know how to code it? There is a node for that.
- Want to implement a Deferred Merge (Lambda Architecture) to reduce ingestion cost and data latency? There is a node for that.
- Want to quickly implement the Deferred Merge logic across 5,000 tables? Coalesce automates that.
I also view Coalesce and Snowflake as being similar in that they both have revolutionized the automation of getting Data to Value. Snowflake had core feature and technology differentiators that no other solution had: write forward micro-partitions, time-travel, zero-copy cloning, separation of storage from compute. Most of these data industry changing features were related to the core concepts of micro-partitioning combined with the separation of storage and compute. Coalesce is similar in having differentiators that none of the other incumbent full stack ETL/ELT tools have, including:
- the flexibility to work from objects or code structures (I have not seen any other tool do this, which is a critical component in ease of use), and automated documentation.
- column awareness.
- automated data pipeline history and documentation.
Also, similar to Snowflake, Coalesce was “born” and built from the ground up to be Cloud First with a full cloud based architecture. Unlike WhereScape, Coalesce can be tried within a few minutes since there is no software to install and no infrastructure to manage. You are up and running within minutes.
Similar to the separation of Data Storage and Compute which enabled many of Snowflake’s features, Coalesce separates the BUILD and the DEPLOYMENT of the data pipelines. While some other tools conceptually use this I have not seen this level of separation that provides complete control and testing of both data objects within a pipeline and the overall pipeline BUILD before doing an actual data deployment. Another major feature of Coalesce that in some ways is similar to Snowflake’s write forward Micro-Partitions (that provide the architecture and tech for features like time-travel and no-copy cloning) is that each change within the objects and data pipelines is committed into git and therefore you can apply the commit to any target state. The Column-Awareness feature tracking is similar to this as well. It allows a concept of state and history within a column (field) of data. This is a really important concept that has been missing from data integrity and pipelines until this implementation.
These technical cloud architecture structures allow some key data pipeline/test functionality such as:
*A mechanism to replay any of the meta-data changes within the data objects and data-pipelines.
*The ability to spin up a new test environment and test a modification against it very easily compared to any other tools I’ve seen.
Conclusion:
I love working with Snowflake and I love finding where we can really take the Snowflake Data Cloud to the next level and this is what I see in Coalesce (Similar to what I saw with our other partners back in 2018 and 2019 like Snowflake, Fivetran, Matillion, and Sigma Computing.). Coalesce provides their customers the next level of data automation and automated data processing of data sets. This is a continued major trend I see across the entire Data Nation and articulate in my Data to Value series. Similar to how companies migrate to Snowflake due to its data industry changing features I see huge advantages to migrating to the combination of Coalesce and Snowflake as well as our FULLY automated solution for Snowflake Optimization of Costs, Performance, and Security — Snoptimizer™
If you want to see how you can truly move to the next level of automation with your data pipelines without cumbersome and complex tools that do not scale well then click below and we will show you the path to the Automated Modern Data Processing promised land!
Find out how Coalesce helps Snowflake customers use Snowflake so much better?
Find out how Snoptimizer™ helps Snowflake customers optimize their Snowflake Accounts?