Essential Tips for Success in the Cloud with Snowflake in Azure

Perspectives on Cloud Adoption within Midstream Oil & Gas

Hashmap, an NTT DATA Company

Published in

Hashmap, an NTT DATA Company

14 min readOct 25, 2019

by David Clausen and Kelly Kohlleffel

I recently had the pleasure of spending some time with David Clausen, Lead Data Architect at a midstream oil and gas company in Oklahoma City. He is launching a project that is going to take advantage of the cloud for data and analytics using Snowflake’s Cloud Data Warehouse in Azure as well as data integration and orchestration solutions such as Attunity and Azure Data Factory.

He talks about eliminating “data islands”, aligning technology with business value versus doing tech for tech’s sake, why he avoided Hadoop and traditional Big Data, cost monitoring and value measurement, and the value that Snowflake and the cloud bring to his organization.

Kohlleffel: David, thanks for spending a few minutes with me today. Can you please take a moment and describe your current role and your primary focus?

Clausen: Sure, I’m the Lead Data Architect responsible for ensuring that we establish and adhere to principles that follow best practices for data integration, data pipelines, data warehousing, and overall architecture related to everything to do with data. I’ve been with the company for 7 years.

Kohlleffel: To kick things off, take me back to when you started taking a look at doing something different for your data, new data warehousing approaches, and data pipelining in order to deliver business outcomes. What stood out to you and what were the key drivers that you were aiming for?

Clausen: In various interviews and meetings with business stakeholders, we were getting continual requests for access to more datasets and data combinations and to essentially democratize the data — make it available. Like most companies, we were also seeing a number of what I would call “data islands” where the data is persisted for its first use, it’s primary use, but never really gets used beyond that and never gets meshed with other datasets.

We looked across the technology landscape and kept an eye on advances with Big Data, Hadoop, and those types of platforms, but were hesitant to start down those paths. The business drivers along with cloud technological advances brought us to a decision-making point where we said, we need to help the business and the tools are ready for prime time.

Another driving factor was the concern of having users connecting directly to production systems and back end databases — performance is a concern. Due to size, a number of our databases require some strong architectural consideration about the way that the users will access the data. The decision was made to move the data to a secondary platform, in our case the Microsoft Azure Cloud, and try to give the users more unfettered access to data while at the same time not affecting production systems.

Kohlleffel: You mentioned keeping an eye on the technology landscape in the Big Data market, but you didn’t go down the Hadoop path. It looked interesting, it had the hype, but for some reason, you didn’t do it. What was it that caused you to pause there and not pursue a traditional Big Data approach?

Clausen: Well a couple of things — for the longest time, it appeared to me to be a case of doing technology for technology’s sake. So the tech was really cool, you had deep technical folks, and everyone was excited about standing up all the various pieces that make up the Big Data puzzle, but we weren’t really seeing tangible results from all the effort. It seemed like a lot of work for minimal results and certainly not proven results yet. Also, I was waiting for those types of technologies to mature where our team’s existing skills could be used more effectively.

It seemed to start coming of age with solutions like Databricks in Azure where we could economically spin up an environment and not have a team of infrastructure engineers managing all the backend pieces, but it gave us a chance to really start exploring that world.

So the maturity level was improving and the toolsets were now making it economically feasible to start processing and exploring larger datasets in Azure for new cases and it seemed that the time was right to start building out a modern data warehouse in the cloud with Snowflake.

Kohlleffel: What caused you to pivot from exploring traditional Big Data and Hadoop type technologies into something that has some Big Data characteristics such as accommodating multiple data types in a single data store, but still provides a true SQL experience, is 100% elastic with independent compute clusters, doesn’t require management overhead, and is consumption-based? Specifically, what stood out about Snowflake?

Clausen: First we were attracted to the low cost from an administrative support perspective. Snowflake has a proven architecture with pieces that work together that didn’t require a team to go explore that, train up, and then build that skillset over many years. It becomes an economy of scale issue — I don’t need a separate team to stand up servers, build out the integrations and connectivity, start writing code to make everything come together. Someone else is already doing that very efficiently and this is an overall theme of the cloud in general.

Kohlleffel: For me, when I take a look back at the traditional Hadoop market, there were a couple of things that kept it from succeeding the way that everyone had hoped. Number one, it was just too difficult and took too many people and too many specialized skill sets not only to run, operate, and maintain, but when you look at it, you’ve got to learn Scala or some esoteric language that was not SQL. On top of that my BI tools were always hit or miss, both from a connectivity standpoint and a concurrent performance perspective. It was really tough to feel comfortable putting BI guys into the system and giving them access as data warehouse users knowing that chances were, they were not going to be happy. it’s a big constituent base to disappoint and you have to get that right.

Clausen: Remember that in our particular case, we were driving towards a platform for not all of our end-users, but at least our advanced data analyst end users. So you are right, to ask those folks that are very business savvy and have technical knowledge, but are not necessarily technologists at heart, to ask them to go learn new languages and concepts right out of the gate is a big ask. With Snowflake, if you know SQL and basic SQL query language you are going to be able to get around in Snowflake immediately plus it’s a first-class citizen to PowerBI and Tableau and the other BI tools that everyone uses already.

Kohlleffel: Yes, in fact, the OKC Snowflake User Group Meetup in a couple of weeks is focused on BI tooling. We just demoed Snowflake for another customer and showcased four BI tools during the demo, and while each tool has some differences, they all connected and worked essentially the same way with Snowflake. That is a big shift that classic big data customers really needed and were asking for daily. So when you were going through the evaluation process, and you had a lot of cloud options in additional to Snowflake such as Azure Data Warehouse, Redshift, Big Query — what was the key advantage that stood out about Snowflake that was most impactful for you.

Clausen: I would say that there were two key points for us that stood out about Snowflake. We do have a lot of transactions and our users expect the data in our warehouse to be up to date and refreshed on a frequent basis. The ability to be able to continue to load data into the Snowflake data warehouse and do updates and inserts into Snowflake’s service while the users are querying data and not affect their performance at all is a huge factor. So that separation of compute workloads and having independent compute workloads was a big factor.

Another key advantage that we really like in Snowflake is the autoscaling and the controls around that feature. We looked at other platforms that had a few similarities, but they didn’t autoscale the way that Snowflake does. So when we turn on a warehouse in Snowflake, it runs until it completes its task and then after the configured idle time it goes into standby so that we aren’t getting charged, and that is a huge, huge cost-saving advantage. Snowflake is a true consumption-based pricing model. Those are two of the primary factors for us in addition, of course, to the low administrative overhead which I’d discussed earlier.

Kohlleffel: When you look at the other cloud options out there, most are hampered by a pretty significant amount of legacy architectural baggage that they were originally developed around and those cloud-native capabilities for Snowflake that you described become difficult if not unattainable for other solutions.

Clausen: The cloud-native architecture is a big differentiator for Snowflake. The way that storage and compute are separated and provide for scalability, almost all of that is impossible without a cloud-native architecture. The other platforms that we were considering all had some legacy technology issues, and who knows, they may catch up at some point, but Snowflake has been a big game-changer for the market.

Kohlleffel: I was curious, how did you originally hear about Snowflake?

Clausen: I was doing some personal research in and around the data market, and I started hearing some buzz around Snowflake. We had recently made a strategic IT decision to use PowerBI. I was at an event and talking with one of your architects and I mentioned looking to go to Snowflake, but I was concerned with how it would work with PowerBI. The Hashmap architect took me through how well PowerBI and Snowflake work together and that encouraged me to look deeper into Snowflake.

Kohlleffel: I’ve done a couple of conferences the last few weeks and it seems like everyone comes up to us talking about how Snowflake is really the easy part of it — it’s all the data integration, data engineering, and data movement that is causing what we call “option fatigue”. What do we do for our SQL workloads, what about our semi-structured datasets and workloads, how do we get large datasets to the cloud, etc. It seems that everyone is confused by the amount of options that are available, what was the process that you went through to narrow down your choices in the data integration space?

Clausen: We did an MVP integrating data from source systems to Azure using a few different source database platforms and were able to do some experimentation with different methodologies to marshal that data from our source systems to the cloud. We quickly realized that trying to do that with custom code was not the right answer for us. We are primarily a COTS environment and don’t have a team of developers sitting around writing code. So then it just became an exercise in looking at the toolsets that were available, matching that to our primary use cases and selecting the right replication and integration tools to do source system connectivity and CDC and enabling our move to Snowflake and Azure.

Kohlleffel: Yes, I think that the days of having a single data integration tool are numbered — there are just too many varying data sources and business requirements to realistically think that one tool can solve every data integration and transformation problem.

Switching gears, we get asked a lot about value measurement and it seems a lot easier to measure value now on a more granular basis with something like Snowflake where I have a consumption based pricing model and I can visibly see my month to month costs. I’m spending this much every month in the cloud with Snowflake and my other services and I’m getting this result or value back.

How are you thinking about cost monitoring and value measurement and what are your current thoughts in this area?

Clausen: Monitoring your costs in any of the cloud environments is a must and we do that — we monitor costs and look for spikes and honestly it was difficult for us at first. Now we have enough experience in the cloud that we know what our expected expenses are and so now we are looking for anomalies — why did something spike or for that matter why did something decrease dramatically and what activity is causing that. Through that learning experience of working through the MVP, we learned a lot about cost management in the cloud. For someone starting out new in the cloud, I would highly encourage them to put that as the number one concern — know and understand it well right in the beginning.

What I am seeing is that the cloud is a huge change for developers also — they are asked to go through certain activities and are somewhat responsible, if not directly responsible for a portion of the costs and expenses in the cloud. The way that they develop and deliver with those cloud resources will dramatically affect cost. In the old legacy world, we would stand up a server and the developers would go after development activities with that server running 24/7 for the next several years. It had been capitalized so there is no concern about how much the server is being used — use it or don’t use it — it didn’t matter. If a developer got pulled off of a project that was using that server, then the server would sit idle for a few months and it didn’t really matter to us.

In the cloud, that scenario opens up the potential for significant cost savings. If the developer is writing code the way they should be and deploying code in the right way, then that environment goes idle or goes away so you can minimize your costs, but you also have to be careful as you can drive up your costs if developers are not using best practices.

Kohlleffel: That’s a great point — the cost controls and elasticity really work both ways in Snowflake. On one hand, I can ensure that the Snowflake compute clusters that I’m using and paying for are delivering value and are cost-controlled and actively managed.

On the other hand, you have immediate upward elasticity if you want Snowflake to accommodate new demands such as more BI users on Monday morning or to ensure that your data science teams or power analysts can do exploratory data analysis which could require additional Snowflake compute power — you want those teams to be able to explore, build, develop, and experiment quickly and easily because that’s where you could see net new value being delivered.

You mentioned working through MVPs and comparing the old world to the cloud, and you discussed the need for visibility into why something spiked, but also why something decreased dramatically. We’ve seen the same thing, my cost went down, but why? Is my application, approach, or solution not optimal. In other words, I think someone should be using this, but they’re not now. You may want to also look at those decreases and go, wait a minute, are we not delivering what we think we’re delivering to this user base?

Clausen: Yeah, exactly. Maybe an alert configuration wasn’t set right or a particular resource went idle when we didn’t expect it to and it’s traced back to a change in security or a variety of variables and something that is expected to run is not running anymore. The ability to pinpoint and isolate those types of situations quickly and clearly becomes yet another value point when you are using cloud services and have true visibility into the costs associated with them.

Kohlleffel: You talked earlier about how certain demands on the overall team may go down quite a bit as you alluded to earlier. At the same time, you are also picking up some new technologies that are being implemented such as Attunity. Can you spend a minute on that aspect of the shift to the cloud?

Clausen: I think our work effort shifts. Once the data is in the cloud, it shifts to more of a training and enablement effort with the user community — the actual data consumers. How to best use the platform in its entirety and how to get the most out of our data that is in the cloud.

We expect a lot of effort in that regard, but also significant demand from additional business areas where we can then bring in those other data islands that I mentioned earlier. It can be difficult to build out a use case and justification just based on gathering and collecting data, but if you have a success with clear associated business value, then it becomes easier for another business area to imagine themselves getting similar results and getting on board.

Kohlleffel: Are there a few top tips or lessons learned that you would give to others going down this path?

Clausen: Sure — one is, don’t underestimate the amount of change that you are going to have to manage. I hear organizations talking about attempting a venture into the cloud as if it’s just another data center and while there are some similarities, it is decidedly different.

If you approach the cloud just like you do an on-premise data center, then you aren’t going to realize the true advantages of the cloud. Most of the efforts that I’ve seen where the cloud has failed, tend to revolve around unrealized cost savings where someone was attempting a lift and shift with VMs. Treating the cloud like it’s your on-premise data center, to me, is a high risk, low reward proposition.

Also, I’d say, start with training and control your costs from the start. Bring your security teams onboard, as well as the network infrastructure and connectivity group — each of those is a must.

Lastly, go after something small — if you are going to try a data pipeline, then find something achievable in a reasonable timeframe — you’ll get quicker turnaround and faster iterations and you’ll learn more to help improve on the next use case.

Kohlleffel: I completely agree. If you can pick out something achievable with your business stakeholders, and demonstrate an ability to move quickly and show value versus a 6 month to year-long project where half the time is spent standing up infrastructure and getting it working properly, then I think your business really appreciates that — get me to a meaningful outcome sooner rather than later.

On that note, what would you say is your top goal related to the outcomes that you are driving towards personally or as an organization that you hope to showcase?

Clausen: Our ability to deliver new data sources and high-value datasets to Snowflake in a timely fashion is a big win for us — everything that we are looking to do going forward from an analytics perspective is based on having that data consolidated, timely, and available.

Kohlleffel: It sounds like you have some plans for other business segments in the near term.

Clausen: I don’t think that we are unique from a lot of enterprises that are our size, especially those that are as geographically dispersed as we are. We have a lot of data in the field that never makes it back into the enterprise and a centralized data store. It definitely serves its primary use case well, but there is also a lot of value that we aren’t able to realize if we don’t have the tools and processes in place to combine that data into an enterprise dataset. That is one of the primary opportunities that Snowflake and the cloud provide us.

Kohlleffel: Well David, this has been a great conversation and I’m sure that the data and cloud community is going to enjoy hearing about your experience and learnings and importantly, your perspective on why you did or didn’t do certain things.

I think that individuals like yourself that embrace the community to help everyone learn and grow together tend to do very well and your overall perspective is just really appreciated. It would be great to follow-up in a few months or a quarter or so to see how things are progressing. Thank you again, and I really appreciate the time today and your ongoing partnership and collaboration.

Clausen: Thanks Kelly, I’d be up for another visit in the future and I enjoyed the time today.

Feel free to share on other channels and be sure and keep up with all new content from Hashmap here.

David Clausen is the Lead Data Architect for a midstream Oil & Gas company in Oklahoma City continuing to help capture real business value from data and leveraging cloud technologies to reduce cost and expand capabilities.

His passion is bringing the highest quality, most innovative solutions to stakeholders and also mentoring and coaching up and coming technologists. Be sure and connect with David on LinkedIn and reach out with any comments or questions regarding his perspective on the cloud.

Kelly Kohlleffel is responsible for sales, marketing, and alliances at Hashmap, a company of innovative technologists and domain experts accelerating the value of Data, Cloud, IIoT/IoT, and AI/ML for customers across industries and the community while providing outcome-based consulting services, managed services, and accelerator services.

He enjoys helping customers “keep it simple” while “reducing option fatigue” and delivering the highest value solutions with technology partners like Snowflake. Be sure and connect with Kelly on LinkedIn.

Essential Tips for Success in the Cloud with Snowflake in Azure

Perspectives on Cloud Adoption within Midstream Oil & Gas

Written by Hashmap, an NTT DATA Company