How IONOS leverages Snowpipe Streaming and Dynamic Tables and has ready-to-go data for analytics in Snowflake within the hour
Authors
Bastian Hoffmann, Data Engineer at IONOS
Thomas Krug, Head of Data Management & Campaign Solutions at IONOS
Whether it’s an entrepreneur taking their first steps online to scaling large companies, IONOS’ web hosting services play a critical role in our customer’s successes. As the largest hosting company in Europe and the leading European digitalisation partner for small and medium-sized businesses. with over 6.2 million customers we offer a portfolio of solutions for the digital space.
To keep that long term relationship, we offer a dedicated customer service agent to every customer at no extra cost. This personal consultant will take the time to get to know a customer’s business, helping them make the most of their online presence. We’re now innovating to provide relevant data and insights to our customer service agents in near real time for operational efficiencies
In this blog, you’ll learn about the following:
- How we leveraged Snowflake’s Snowpipe Streaming and Dynamic Tables features to abstract away complexity and have data available in Snowflake, ready for analytics, within the hour.
- How we can easily monitor performance of Dynamic Tables at a glance.
This has been transformative for how the business has been able to provide customer support and guidance as early as possible.
Our Snowflake Journey
A few years ago, we relied on solutions such as SQL Server and PostgreSQL DBMS. This resulted in data silos and became increasingly challenging for the centralized analytics needed to support reporting for business units, such as sales operations and customer service. We were early adopters of Snowflake and saw the immense value of having a central data platform. However, the ingestion and transformation process was still quite complex, and we needed job orchestration to flatten and enrich the data.
To address this, we implemented Snowflake’s Snowpipe Streaming and utilized Dynamic Tables to transform and prepare the raw ingested data into ready-for-analytics datasets. And depending on the use case, typically available within 1 hour.
- Snowpipe Streaming API removes the need to create files to load data into Snowflake tables, and enables the automatic, continuous loading of data streams into Snowflake as the data becomes available.
- Dynamic Tables simplify data engineering in Snowflake by providing a reliable, cost-effective, and automated way to transform data. Instead of managing transformation steps with tasks and scheduling, you define the end state using Dynamic Tables and let Snowflake handle the pipeline management.
Putting it together: To ingest logstash events, we created a logstash plugin and leveraged Snowpipe Streaming to write into specific Snowflake tables and used Dynamic Tables to flatten them. The data is instantly available in Snowflake and considerably streamlines our process, eliminating the need for S3, pipes and jobs executing SQL. Below, we share examples of this in action.
Example Dynamic Table. IONOS_ETL.EVENTS.MACHINELEARNING is a target table of Snowpipe Streaming Logstash Plugin
create or replace dynamic table IONOS_ETL.PUBLIC.MACHINELEARNING_FRAUD_MR(
PROB,
FS_ID,
CUSTOMER_ID,
AGE,
IS_HOLD_OUT,
LOWER_THRESHOLD,
HIGHER_THRESHOLD,
DECISION,
WRITE_TO_FS,
ANCHOR,
TIMESTAMP,
ANCHOR_ID,
AGE_CLASS,
CUSTOMER_GROUP,
CUSTOMER_GROUP_DESC,
MODEL_NAME,
MODEL_VERSION
) lag = '1 hour' refresh_mode = AUTO initialize = ON_CREATE warehouse = LOAD_WH
as
select
event:event_context:prob::float,
event:affected_entity_id::string,
event:customer_id::string,
event:event_context:age::int,
event:event_context:is_hold_out::boolean,
event:event_context:lower_thres::float,
event:event_context:higher_thres::float,
event:event_context:decision::string,
event:event_context:write_to_fs::boolean,
event:event_context:anchor::boolean,
event:event_timestamp::timestamp,
event:event_context:anchor_id::string,
event:event_context:age_class::string,
event:event_context:customer_group::string,
event:event_context:customer_group_desc::string,
event:event_context:model_name::string,
event:event_context:model_version::string
from IONOS_ETL.EVENTS.MACHINELEARNING;
For our central data team, the greatest impact for us has been reducing complexity and risk. With this new architecture, leveraging Snowpipe Streaming and Dynamic Tables, we no longer need to manually put together orchestration, pipelines and SQL scripts to ingest and flatten data. It’s all done for us with an almost zero ETL approach and nearly no maintenance required.
A classic streaming problem is when an event is delivered twice. If the duplicate was stuck in the stream, it could not be deleted.
Here, the all-round unpopular “alter session set ERROR_ON_NONDETERMINISTIC_MERGE = false;” has to be used and the stream has to be consumed manually. Typically this would have taken me 10–15 minutes to fix.
With Snowflake’s streaming capabilities, I would simply need to delete the duplicate event and do a refresh on the dynamic table. And fortunately, this rarely happens. It just works! And in terms of cost, compared to our previous solution, we’re saving around 3,000 euros per month. This doesn’t include the 2–3 days of work saved per month our team would have previously spent.
Through Snowsight, we’re able to easily monitor how our Dynamic Tables are performing at a glance. If for some reason a Dynamic Table failed or not meeting SLAs, we can easily triage the reason in a couple of steps.
Results and Future
Having the near real time data available in Snowflake opens up many business use cases without having us having to change the architecture. For example, when onboarding customers and there is a challenge with initial account set up, they will typically call customer support. With data already available within the customer service system, the agent has the proper information to quickly triage and ensure the best experience possible. And for the self service opportunities, we have created dynamic FAQ sites that show information relevant to the customer and their potential questions.
With how easy and efficient it has been to adopt native features, as shown by Dynamic Tables and Snowpipe Streaming in this blog, we’re looking forward to leveraging Snowflake’s AI capabilities through their Cortex feature. We are currently evaluating multiple use cases in the campaign operations to enhance target group selections using forecasting to optimize conversions of campaigns, anomaly detection to optimize the amount of emails customers receive, and classification for churn detection.