GetYourGuide Tech Blog - Medium

Migrating our self-service BI tool Looker from Hive (Apache Spark) to Snowflake

GetYourGuide Tech Blog — Wed, 28 Jul 2021 15:38:39 GMT

Robert Bemmann, data engineer, and Augusto Elesbao, senior data engineer, share the steps they’ve taken to migrate all of our Looker projects from an Apache Spark/Hive connection (Databricks) to Snowflake.

This story was originally posted in our main blog. To see it follow here.

The Data Platform team is responsible for providing the company with quick access to data. To improve this experience, at the beginning of 2020, the team conducted an analysis to choose and migrate our Data Warehouse in Hive to a new and more robust platform. After an extensive evaluation of the most relevant players in the market, the team chose Snowflake for its simplicity, cost, and performance.

This post is one of a two-part series. In this post, Robert Bemmann, data engineer, and Augusto Elesbao, senior data engineer, share the steps they’ve taken to migrate all of our Looker projects from an Apache Spark/Hive connection (Databricks) to Snowflake. In the next post, Robert will describe what led to this choice and how they conducted it to a successful rollout only six months later.

Our main goal with the database migration was to improve the query performance and decrease the average query time. There were other reasons why we decided to use Snowflake as our reports querying engine instead of Spark:

Data caching for repetitive queries
Auditing (table and database usage (check Snowflake docs), who queried what and when) & Access control (user management with roles)
Maintenance overhead with manual cluster management on Databricks for the Looker workload to improve performance
Need of manual partitioning of the datasets, yet the most effective partition pruning is still not guaranteed

In our Data Platform setup, the data stored in Hive still stays our single source of truth, so most of our data pipelines are writing to two places: first to the S3 buckets for the tables used in Hive, and then to Snowflake. However, any query executed through Looker is using the Snowflake DWH only. Having the data copied to Snowflake is just the first step: in order to successfully roll it out company-wide with no service disruption, there are many more aspects to be taken care of.

Set yourself targets to measure the success of the migration (e.g. 90% of queries are running without an error and the average query run time is less than 10s). It keeps you accountable and gives a target in case you need to make amendments to the Snowflake infrastructure in terms of query performance.

This first blog post focuses on two main parts: the SQL syntax differences and the work related to the Looker migration, like some clean-up tasks and automated testing. Splitting it up into subtasks led to the following sections:

Main SQL syntax differences between Spark SQL and Snowflake
Looker migration
Looker cleanup
find all tables in Looker projects currently in use
find unused views / unreferenced views
How to query the Looker API
Leverage the Looker API to automate testing and find SQL syntax errors

Main SQL syntax differences between Spark SQL and Snowflake

The Spark SQL syntax follows Hive SQL standard closely as Spark is also leveraging pieces from Hive such as Hive metastore and Hive tables. You can find a comprehensive list of all Spark SQL functions. On the other hand, Snowflake supports standard SQL, including a subset of ANSI SQL:1999 and the SQL:2003 analytic extensions. They also have very detailed documentation about each of their functions.

Without going through an extensive list of examples, let me provide you a few takeaways:

¹ Data type mapping Spark SQL to Snowflake

² Exploding nested activities from a single shopping cart into multiple rows

https://medium.com/media/fefafbff7bc47265f8a49ef43be2172f/href

³UDF on Snowflake (You need to GRANT USAGE on the function for the Snowflake role that queries it via Looker)

https://medium.com/media/370f53f84116b2703a16f3552a448134/href

Now that we know the major differences in the SQL syntax, let’s take a closer look at the steps taken to migrate our self-service BI tool Looker.

Looker migration

The database migration to Snowflake was also a good opportunity to clean up all of our Looker projects. A Looker project consists of models (see the Looker docs for reference). Usually, you would want to group certain reports/queries related to a certain topic (like marketing reports) in a model.

A model then can consist of multiple explores (a generic equivalent of a report). The explores in turn usually represent the table after the FROM statement and can be joined with other tables which are defined in so-called views. All measures and dimensions are defined in the view code. In a star schema, you could think of an explore as the fact table that gets joined with dimension tables (other views).

So our first task was to compile a list of all tables used in the Looker queries because we had to get a full picture of any table that had to be present on Snowflake.

Investigate and collect findings regarding the SQL syntax differences in advance. These were really helpful to onboard colleagues fast to the new syntax and share it with analysts.

Additionally, we had to investigate which views (tables) and explores were not referenced or used in the Looker code. The views could be safely removed as no one was using them. We didn’t have to move tables that were not used in Looker models.

Finally, we also had to ensure that all the SQL syntax was properly adjusted. We could leverage the Looker API to automate the testing process by running queries and catching the errors.

Preparation part 1: Find all tables in Looker projects currently in use

First, we had to get a comprehensive view of all tables we needed to load from Hive to Snowflake, so we had to retrieve all tables that were used in the LookML code of our Looker projects. This step was also important because we had to find all the dependencies in airflow — we could only load the data to Snowflake once the write step to the Hive tables was finished.

I simply used regex patterns to parse a list of all of these tables. I downloaded all Looker projects to my local machine and wrote a python script to extract the table names from the Looker view files.

https://medium.com/media/ca97b4898044c59743a6d985734abac3/href

The high-level logic is:

download/clone all Looker repositories to a local folder
write a function that can search the “.view.lkml” files within each folder for a regex pattern and append the matches to a list (I can highly recommend Regular Expressions for debugging/creating your regex patterns)
the regex pattern is matching any used table after the typical Looker syntax for the
“sql_table_name” parameter
“from” keyword
“join” keyword
clean the list from obvious mismatches and export to csv

Preparation part 2: Find unused/unreferenced explores and views

The next exercise was to identify views and explores that were not used anymore and views that were not referenced in any other view or explore. This clutter can easily occur over time if you operate a team of BI engineers that develop models independently from each other.

Identifying unused views and explores can be achieved through a tool from Looker which is called Henry. My experience was that it took very long to execute the Henry queries via command line, probably I would use the system activity data now in Looker if I would have to do the task again.

To find the unreferenced views we could again use our Searcher code from the previous section.

https://medium.com/media/5b5afc69e66380a2685412e74766e683/href

The high-level logic of the script is:

First, we need to get all Looker view file names in a list: loop through all Looker projects/repositories and append all file names with occurrences of “view.lkml” to a list (list_view_names). We only need the names of the views and not the full path.
Now, we need to find all view_names that are USED (i.e. referenced in the LookML code) and we can reuse the Searcher class from the previous section: for every view name in the list_view_names loop through all views files and check the LookML code for regex matches of a Looker keyword (from, explore, extends, join and view_name) plus the view name and the following patterns. Append all matches to a list list_used_views.

Next, we just need both lists’ differences: each item in the list_view_names which is NOT in the list_used_views is not used and can be added to a final list for deletion.

Now we can again use the Searcher class to get the full file paths of each unused view/explore. We need the full paths to remove these views from our local GitHub branches of the Looker projects. We export the results to a dataframe and then csv.

With the csv of files that can be removed from the Looker repositories, you can delete all files from your local machine/GitHub branch.
Communicate with other Looker developers of your company because you don’t want to delete code that somebody else may still be using. I shared the list of views and explores I wanted to remove and gave everyone 3–5 days time to respond to my email in case they wanted to keep a view or explore. A potential fallback solution: in the worst case that someone just finds out later that some deleted code is needed, you still have everything logged in git so you can recover the code easily.

Leverage the Looker API to automate testing and find SQL syntax errors

A quick intro to the python looker-SDK to retrieve data from the Looker API will make it easier to understand the next steps. For authentication, add your credentials to a looker.ini file (instructions). Import the looker_sdk and you can use the feature-rich SDK. For example, you can run queries or looks by the id which is used in the Looker UI URL. Multiple output options exist: you can either get the data in csv or json format (you could parse this into pandas dataframes for further use) as a result, the generated SQL code in text format, or even a picture (png/jpeg) of the visualization of the executed query.

The last feature I want to mention is that you can switch from the production workspace to the development workspace, which is a crucial option for the Looker SQL syntax migration — having a branch in development mode with the new Snowflake connection in Looker enabled us to find errors in the SQL syntax by running queries with the looker-SDK in the development workspace. Eventually, I was also able to compare the results of the queries against both connections:

This code returns two json objects which were queried each from the respective connection separately. It will come in handy when we want to compare the results of the queries once the SQL syntax has been changed.

Running automated tests to catch SQL syntax errors

My approach for validating all the SQL syntax changes in the development branch for each project with the new Snowflake connection was to first get the top 100 (after fixing them I did another round with 500) most often queried queries and looks from the History explore in the system activity data.

You can automate the whole error spotting by running the queries/looks in the dev workspace with the python looker-SDK (see section Query the Looker API). If the response contains either “looker_error” or “SDKError” just add the id and the error message to a dataframe.

I did the same for the most important dashboards and reports we send out to our data users each day. You can loop through the dashboard ids, get the query ids as response data in the json and use them on the fly to execute and catch errors.

Conclusion

After around four months of preliminary work, we were able to conduct the final rollout of the database migration in less than a week due to good preparation. Also, we could spot and fix the majority of SQL syntax errors in advance thanks to the automated testing. We did a partial rollout in Looker which means that we didn’t change the Looker connection for all projects at once. Here are a few learnings we made along the way:

Investigate and collect findings regarding the SQL syntax differences in advance. These were really helpful to onboard colleagues fast to the new syntax and share it with analysts.
Automate your SQL query testing before the final migration and run it against the most frequently used queries.
Set yourself targets to measure the success of the migration (e.g. 90% of queries are running without an error and the average query run time is less than 10s). It keeps you accountable and gives a target in case you need to make amendments to the Snowflake infrastructure in terms of query performance.
Track errors. We also created a Looker dashboard to monitor the rollout, which showed us the latest error messages, the amount of failed queries, and the average query time, grouped by project, model and explore. This enabled us to trace errors down and identify the projects with the most syntax errors. Luckily we didn’t need it — but it also makes sense to think about a rollback solution in case major breakages appear. Our fallback solution was simply to keep the Spark SQL branch in GitHub for recovery for a few days, so in case the error rate started to increase, we only had to merge the branch back to master.
You won’t capture all SQL syntax errors — once you flip the switch and deploy everything to production the first requests which report broken queries/reports will certainly come in. Think about a procedure to handle these requests and distribute them smartly among the team. We used a simple spreadsheet to capture the issues reported to us in a Slack channel and assigned owners for the requests.

If you’re interested in joining our engineering team, check out our open roles.

Migrating our self-service BI tool Looker from Hive (Apache Spark) to Snowflake was originally published in GetYourGuide Tech Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.

How we redesigned our search experience

GetYourGuide Tech Blog — Thu, 15 Jul 2021 13:45:31 GMT

Bhavna Banerji, senior full stack engineer, shares how the Search team made it easier for users to find activities they love.

This story was originally posted in our main blog. To see it follow here.

Bhavna Banerji, senior full stack engineer, gives a step-by-step look at how the Search team made it easier for users to narrow down to their preferred activities and experiences on the GetYourGuide search page. Before the big redesign, the team first migrated to a new tech foundation — a fundamental step to ensuring faster and smoother development.

Search is a critical phase of any booking funnel. At GetYourGuide, the Search team is striving to enhance the user experience for our customers. We’ve been aware for quite some time that there were some usability issues with our Search and Discovery experience. We conducted multiple user testing sessions and found some common themes

Too many filters that were not organized.
Many duplicate activities.
Inability to differentiate between activities in search results.
Filters were hard to find on mobile web.

When we decided to tackle these challenges, the Search team was formed. We were tasked with helping customers discover what’s possible in a destination and create refinement tools to help find the perfect activity.

You may also be interested in: 3 skills to elevate your influence as a UX researcher

Before we started this project, we decided that if we wanted to place bigger bets, our starting point would be to rework our codebase to ship changes and get quicker learnings. At the time, our codebase was a legacy monolith, and it consisted of Backbone with Smarty templates and PHP. This was a much slower developer experience, and features took longer to build and ship. We decided to migrate to a new tech stack while improving the user experience.

Preparing the foundation of the new Search experience:

Tech migration: We meticulously planned this migration to the new tech stack with VueJS and Node on frontend and Java and Elasticsearch on the backend service. We listed our dependencies on our old monolith and decided to start with an MVP version where we didn’t migrate all the components all at once. We called this the ‘Tech Foundation’ project.
New Search experience: We gathered and prioritized our most important bets via user research and brainstorming sessions between UX designers and engineers.

We invested a quarter on migrating and thoroughly testing out the tech foundation, and then we started redesigning and experimenting with the different search components in the following manner:

Image courtesy: our designer Roxanne Krumm

Redesigned filters: By flattening the hierarchy in the filters, we helped customers find relevant filters more quickly. We also introduced filtering capabilities based on interests to address feedback from customers.

Old nested filters:

New Flat Filters:

Applied filters and result counts: We added an applied filters section to allow users to quickly see applied filters. On mobile web, we also added a result count to provide feedback on the impact of applying filters.

Quick filters for Today and Tomorrow: Many of our customers search within a day of booking. To make this easier we added Today and Tomorrow buttons to quickly find available activities.

Redesigned activity cards: In the new design, we utilized the real estate on the cards more effectively, adding the most important labels and attributes (as collated from our user research sessions). This new design helped our customers differentiate better between activities.

Old cards:

New Cards:

Overall, we have already seen a 3.5% uplift in engagement on the Search page with the redesign project. Many more feature improvements are actively being worked on.

What’s next for the Search team?

By now you’ve had an insight into our principles and approach towards redesigning the Search experience. Since we migrated a year ago, we have run many experiments, gathered learnings and reiterated with more improvements. This framework has helped us make data-driven decisions while keeping customer needs at the core of all of our decisions.

We are currently redesigning the Datepicker and the Search autocomplete and also redesigning our most used filters based on user data. Additionally, we are preparing to introduce Kafka and Postgres to our tech stack.

Acknowledgements:
The redesign and Tech Migration projects resulted from the effective collaboration within the Search Engineering team, Data Analytics team, and UX, Design, and Product team members. From ideation to shipping these projects, everybody proactively carried out brainstorming sessions, provided valuable insights, and helped us gather learnings effectively as a team.

If you’re interested in joining our engineering team, check out our open roles.

How we redesigned our search experience was originally published in GetYourGuide Tech Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Meet Mira: What it’s like to be an associate engineer

GetYourGuide Tech Blog — Mon, 28 Jun 2021 12:33:19 GMT

Mira Vogt, associate full stack engineer on the Product Optimization team gets candid about what she loves about work and how she connects with her team from remote.

This story was originally posted in our main blog. To see it follow here.

Mira Vogt is an associate full stack engineer on the Product Optimization team. She tells us about the ins and outs of her role, the exciting projects that she works on, and what kind of learning and development opportunities are available to her. The associate engineer gets candid about what she loves about work and how she connects with her team from remote.

Can you tell us a bit about yourself?

I joined half a year ago after completing a coding boot camp last autumn. Before that, I worked in several other roles, starting my career as a business lawyer. I’m currently part of the Product Optimization team.

Our primary mission is to constantly improve the pages customers see when they come to your websites, such as our homepage or location-specific pages like city and country. What impressed me about the company was its strong core values and focus on growth. And who wouldn’t want to help people enjoy their travels and enable them to get to know other cultures?

Can you tell us a bit about your role as an associate engineer?

As an associate engineer, the main expectation is to learn how to be autonomous and improve our skills. I’ve been involved in many projects from the beginning, working on features independently or with support. I enable my team to iterate our pages faster and to improve the overall customer experience.

If you’re starting a career in a field as complex as software engineering, it’s easy to never feel ready to apply for a job and start working.

But in fact, you’ll make progress fast on the job with input from very experienced engineers around you. So don’t hesitate to apply because of the fear of not yet being good enough.

On the Product Optimization team, we run many A/B experiments on layout improvements, for example, to find better ways of communicating the value we bring to our customers. As we work by the SCRUM methodology, all projects are split into small, quickly achievable tasks, which we plan on a bi-weekly basis together as a team and then work on according to their priority.

On a typical day, we have a short standup to update the team on our current tasks and start working autonomously on whatever needs to be done. Besides the work within the team, we attend cross-team meetings to share learnings and collaborate on broader initiatives.

What kind of learning and development opportunities are available to you as an associate engineer?

We receive a learning budget to attend conferences or purchase resources on topics we want to deepen our knowledge. What’s more important, though, is the overall culture of learning and passing knowledge. All colleagues are open to taking time and explaining things to me, be it theoretical concepts or practical implementation.

Pairing sessions, where you work together on a problem, are highly encouraged and especially useful in remote working times. The expectations on personal development are very clearly communicated. There is frequent in-depth feedback from your manager and peers, making it easy to identify areas to develop further.

You might also be interested in: What are productivity pairing days?

What do you enjoy about work?

What I love about engineering is the “riddle-solving” nature of it. Even though all tasks are described precisely, and it’s clear what to do, there is room to decide how the implementation will look, what approach to take, and how to achieve it.

It can be tricky and frustrating, but in the end, you’ll figure it out, either by yourself or with some support, and you will have learned something new in the process. Besides the technical aspect, what makes work enjoyable is the people around. Even though I haven’t met many of my colleagues in real life, they are friendly, very diverse, and open, which makes collaboration and meetings a lot of fun.

How was your experience joining the team from remote?

The first few weeks were tough, as it was my first time working remotely and the first time working as an engineer, so I was overwhelmed by every aspect of work being different from what I knew before. Luckily I received a very clear onboarding strategy and was assigned a buddy who I could go to with any questions.

What I love about engineering is the “riddle-solving” nature of it. Even though all tasks are described precisely, and it’s clear what to do, there is room to decide how the implementation will look, what approach to take, and how to achieve it.

This was especially important as requesting a Zoom meeting for clarifying a problem felt a lot more annoying than just catching somebody between meetings in the office. But like everybody else, I got used to it quite fast and lost the anxiety of contacting people via Slack or Zoom. However, I’m still looking forward to seeing some friendly faces outside of my laptop screen in the future.

How do you connect with your team remotely?

As weather and restrictions ease up, I’ve finally managed to meet colleagues in the office or for a walk.

Do you have any advice for an aspiring associate engineer who would like to join our company?

If you’re starting a career in a field as complex as software engineering, it’s easy to never feel ready to apply for a job and start working. But in fact, you will make progress fast on the job with input from very experienced engineers around you. So don’t hesitate to apply because of the fear of not yet being good enough.

A lot of it is about a learning and growth mindset, which is best demonstrated by showcasing your past achievements and being aware of the areas you still need to improve. Practice telling your story, how you got here, what you learned along the way, what you want to achieve in the future and how you plan to get there. And when in doubt, contact somebody who is already working in the same position via LinkedIn. Most people will be happy to help and answer any questions you might have.

For updates on our open positions check out our Career page.

Meet Mira: What it’s like to be an associate engineer was originally published in GetYourGuide Tech Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.

What’s a good F1 score?

GetYourGuide Tech Blog — Tue, 15 Jun 2021 13:12:09 GMT

Dr. Ansgar Grüne is a senior data scientist for the Relevance and Recommendations team. He explains how to interpret the most popular quality metric for binary classifications.

This story was originally posted in our main blog. To see it follow here.

Dr. Ansgar Grüne holds a Ph.D. in Computer Science and is a senior data scientist for the Relevance and Recommendations team. He works on improving the ranking order of activities on our pages and has previously worked with GetYourGuide’s Catalog team on categorizing our activities. Both topics include binary classification tasks, for example, to decide whether an activity belongs to a category like “family friendly” or not. In this blog post, he explains how to interpret the most popular quality metric for such tasks.

Why does a good F1 score matter?

Last year, I worked on a machine learning model that suggests whether our activities belong to a category like “family friendly” or “not family friendly”. Following our Data Science principles, I came up with a simple first version optimizing for its F1 score , the most-recommended quality measure for such a binary classification problem.

You can see this for instance by looking at some of the top Google results for “F1 score” like “Accuracy, Precision, Recall or F1?” by Koo Ping Shun. When I presented the results to my product team, they wondered, “How good is this achieved F1 score of 0.56?” I explained how the metric was defined which made the value more understandable. In addition, I had done the task on a small sample set myself and showed that the model also performed well compared to the human F1 score. However, I wondered if I could give an even more intuitive meaning to the F1 score.

See appendix below.

A binary classification task.

Clearly, the higher the F1 score the better, with 0 being the worst possible and 1 being the best. Beyond this, most online sources don’t give you any idea of how to interpret a specific F1 score. Was my F1 score of 0.56 good or bad? It turns out that the answer depends on the specific prediction problem itself.

Today, I’ll explain how you can interpret a specific F1 score by comparing it to what is achievable without any knowledge. This interpretation uncovers some unfavorable aspects of the F1 score. I will therefore also mention an alternative metric in the end. But first, let’s quickly recap.

Recap: confusion matrix and classification quality

The confusion matrix divides up the results of a certain binary classification problem.

Confusion matrix.

Accuracy tells us what proportion of the data points we predicted correctly, i.e. accuracy := (TP+TN) / (TP+FN+FP+TN). The biggest and most well known problem with accuracy is when you have imbalanced datasets. Say, 75% of GetYourGuide’s activities were family-friendly. Then, a model that predicts family-friendly for all activities will get an accuracy of 75%. Our bad quality classifier gets a seemingly very good quality score.

The standard answer to this problem is that you consider instead recall and precision. Recall is the share of the actual positive cases which we predict correctly, i.e. recall := TP / (TP+FN). In our toy example, let’s say that family-friendly is the positive class. Then, always predicting family-friendly results in an optimal recall of 100%. Always predicting not family-friendly results in 0% recall.

The classical counterpart to recall is precision. It is the share of the predicted positive cases which are correct, e.g. precision := TP / (TP+FP). In this case, always predicting the positive class (family-friendly) will result in the worst possible outcome. Predicting almost all cases as negative will let you reach a precision of 100% on the other hand. All you need to get a perfect precision is to correctly predict the positive class for one case where you are absolutely sure.

Hence, using a kind of mixture of precision and recall is a natural idea. The F1 score does this by calculating their harmonic mean, i.e. F1 := 2 / (1/precision + 1/recall). It reaches its optimum 1 only if precision and recall are both at 100%. And if one of them equals 0, then also F1 score has its worst value 0. If false positives and false negatives are not equally bad for the use case, Fᵦ is suggested, which is a generalization of F1 score.

If you want a detailed introduction to these metrics, check out this great post on Medium, Confusion Matrix, Accuracy, Precision, Recall, F1 Score — Binary Classification Metric.

What is a good F1 score?

In summary, the F1 score is a good choice for comparing different models predicting the same thing. Yet, how good is a given F1 score overall, say my model’s 0.56?

To answer this we look at the best score we can reach without any knowledge, say by flipping a coin. This coin can be unfair. Let p be the probability of the coin predicting a positive outcome, i.e. a perfectly fair coin would have p=0.5. Let q be the share of actual positive cases. In this scenario it is not difficult to derive from the definitions that precision = q and recall = p, see Appendix 1. Hence, precision is not influenced by the set up of our coin.

And recall is best if the coin always predicts positive (p=1). Surprisingly, always predicting positive is the best we can do in terms of F1 score if we don’t have any information. It is due to F1 score being unsymmetric between positive and negative cases. It pays more attention to the positive cases. Our observations result in the maximum coin F1 score of ².

F1_coin = 2q / (q+1).

This shows that the F1 score depends heavily on how imbalanced our training dataset is. We can have an independent score by normalizing the F1 score. We could say that the coin should always result in a normalized F1 score of 0 and that the optimal score remains 1. This is achieved by the formula:

F1_norm := (F1-F1_coin)/(1-F1_coin)

My activity-category classification problem had only 1% actual positive cases, q=0.01. This results in F1_coin ≈ 0.02 and F1_norm ≈ 0.55. The prediction quality is roughly in the middle between the best guess without any knowledge and a perfect prediction.

Florian Wetschorek and his colleagues have recently used the same normalization approach in their interesting post RIP correlation. Introducing the Predictive Power Score. They use a slightly different baseline, predicting always the majority class, which however does not maximize F1 score as we have discussed above.

Beyond F1 Score

We have seen that the F1 score has two undesired characteristics: not being normalized and not being symmetric (when swapping positive and negative cases). There are other metrics like Matthew’s Correlation Coefficient (MCC) not having this problem. That one is explained and promoted nicely in Boaz Shmueli’s “Matthews Correlation Coefficient is The Best Classification Metric You’ve Never Heard Of “. A complete list and a more thorough investigation are outside of the scope of this blog post.

If you are interested in joining our engineering team, check out our open positions.

Appendix 1: Deriving the Formula for F1_coin

Remember that q is the share of actual positive cases and p is the probability that the coin predicts a positive outcome. Assume we draw randomly n cases, then the expected values are:

F1_coin = 2 / (1+1/q) = 2q / (1+q)

True Positives = TP = n*q*p

True Negatives = TN = n*(1-q)*(1-p)

False Positives = FP = n*(1-q)*p

False Negatives = FN = n*q*(1-p)

Hence:

Precision = TP / (TP+FP) = q*p / (q*p + (1-q)*p) = q

Recall = TP / (TP+FN) = q*p / (q*p + q*(1-p)) = p

F1 = 2 / (1/q + 1/p) = 2q*p / (q+p)

Looking at the first part of the equation above, the F1 score is clearly monotonously increasing in p. Hence, the maximum is reached for p=1 and it equals

F1_coin = 2 / (1+1/q) = 2q / (1+q)

Appendix 2: Unsymmetric Behavior of F1 and F1_norm

Let us look at this concrete example: TP = 40, FN = 40, TN = 16, FP = 4.

This means that q = (TP+FN)/(TP+TN+FP+FN) = 0.8 which implies

F1_coin = 2q/(q+1) = 1.6/1.8 ≈ 0.9.

From precision = TP/(TP+FP) = 40/44 ≈ 0.91 and recall = TP/(TP+FN) = 40/(40+40) = 0.5, we conclude F1 = 2 / (1/0.91 + 1/0.5) ≈ 0.65 and F1_norm ≈ (0.65–0.9)/(1–0.9) = -2.5.

If we swap the meanings of positive and negative, we get

TN = 40, FP = 40, TP = 16, FN = 4.

This means that p = (TP+FN)/(TP+TN+FP+FN) = 0.2 which implies

F1_coin = 2q/(q+1) = 0.4/1.2 ≈ 0.33.

From precision = TP/(TP+FP) = 16/56 ≈ 0.29 and recall = TP/(TP+FN) = 16/(16+4) = 0.8 we conclude F1 ≈ 2 / (1/0.8 + 1/0.29) ≈ 0.43 and F1_norm ≈ (0.43–0.33)/(1–0.33) = 0.15.

Both, F1 value and F1_norm value change a lot. They are very unsymmetric. In one direction, we predict better than the coin, while in the other direction we don’t. This could be a topic for a follow up in the future.

What’s a good F1 score? was originally published in GetYourGuide Tech Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Run updates across multiple GitHub repositories with our new auto-pr tool

GetYourGuide Tech Blog — Tue, 15 Jun 2021 13:03:44 GMT

Lidija Coha explains how her team created an open-source tool that enables engineers to perform bulk updates across multiple GitHub repositories.

This story was originally posted in our main blog. To see it follow here.

The Developer Enablement (DEN) team is always building new tools to help our engineers work more efficiently. Lidija Coha, senior engineer on the Developer Enablement (DEN) team, explains how her team created an open-source tool that enables engineers to perform bulk updates across multiple GitHub repositories.

Since the DEN team often needs to update configuration and CI/CD pipelines as code across multiple repositories, we wanted to create a tool that allows us to implement the changes and automatically pull requests from repository owners.

We took this as an opportunity to give something back to the open source community and build a tool that would serve this purpose, engineering it in a way that has no GetYourGuide specific logic or configuration. The end result is the auto-pr tool. It allows running code across multiple repositories without worrying about setting it up every time, thus focusing on the change itself. You can see all the details on GitHub. It’s been proven to be very effective for what we need.

Getting started is easy

Step 1. Initialize auto-pr project

All you need is an api-key which is a GitHub personal access token with repo and user:user:email scope, and a privileged (robot) account ssh key.

auto-pr init --api-key= --ssh-key-file=/path/to/ssh/key/to/push/.ssh/id_rsa

This command will create the scaffolding:

ls /home/username/work/auto-pr-project

config.yaml  db.json  repos

Step 2. Configure to your needs

When communicating changes, be clear on why you’re making the change, what the change is, and when the deadline to review and merge is.

credentials:

  api_key: 

  ssh_key_file: /path/to/ssh/key/to/push/.ssh/id_rsa

pr:

  body: >

    Body of the PR that will be generated

    Can be multi-line :)

  branch: auto-pr # The branch name to use when making changes

  message: Replace default pipelines with modules # Commit message

  title: 'My awesome change' # Title of the PR

repositories: # Rules that define what repos to update

  - mode: add

    match_owner: 

update_command:

  - touch

  - my-file

Step 3. Write, test, and update script cycle

This is the most important step. Write the code that will be run against the directory of each checked out repository. You can use the auto-pr test functionality to test it without actually committing changes. Do this as many times as needed until you’re happy with it.

Step 4. Run it 🤖

This will execute the script and open pull requests. If you save the auto-pr database file you can revisit some of the pull requests and do things like close and reopen.

How we used the tool

Updating outdated configuration or format

We used auto-pr to sunset outdated pieces of configuration and unblock future work. An example of an automatically created PR where we consolidated the build version format across all our services.

Updating CD pipelines

auto-pr was an excellent way to deliver fixes and optimizations to our pipelines as code — provided by the DEN team to all engineers — who wouldn’t have the time to work on themselves.

Docker Hub URL replacement

Due to rate limiting introduced by Docker Hub, we created a mirror of all images but still needed to replace all existing references to Docker Hub images in a timely manner. This is where the idea of auto-pr was born, as it saved the mission teams a lot of time and avoided tedious, repetitive work.

Contributing

While auto-pr has served many of our use cases, it’s still in its early days of development. There may be functionally missing or gaps in the developer’s experience with using the commands.

We would be happy to accept contributions to improve this tool and provide an even better experience. Make sure to check the contribution guidelines first.

If you’re interested in joining our engineering team, check out our open roles.

Run updates across multiple GitHub repositories with our new auto-pr tool was originally published in GetYourGuide Tech Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.

How we use Typescript and Apache Thrift to ensure type safety

GetYourGuide Tech Blog — Tue, 15 Jun 2021 13:01:42 GMT

Michael Bell shares how his team uses Typescript and Apache Thrift to ensure type safety when building their latest analytics library.

This story was originally posted in our main blog. To see it follow here.

The Partner Tech mission team works closely with our business development and partner support organizations to design and implement services and tools which support publishers, airlines, and travel agencies that grow their business with us. Michael Bell, senior full-stack engineer, shares how his team uses Typescript and Apache Thrift to ensure type safety when building their latest analytics library. Their work gives our publishing partners full visibility on the performance of our inventory on their platform.

Understanding the Partner Tech team

In Partner Tech, we display our inventory — you may know them as tours and activities — on publishing partner websites. By making our bookings readily available via travel blogs, airlines, travel agencies, and publishers, we’re connecting with the greater travel community. It’s a win-win for both us and our partners. They can earn commissions from bookings, and we connect our products to people who are excited about traveling.

To be successful, our partners need insights regarding which integrations on their website are performing well. This has a direct impact on the click-through rate and ultimately revenue for them. Our Partner Tech team supports them by building tools to show the performance of our inventory on their platform.

Publishers can see each integration on our partner portal. Each integration has the first date inventory was seen, visits, clicks and the conversion rate along with its status. The team also creates interactive data visualizations that show our partner’s performance over time.

Furthermore, we aggregate anonymized data and create data models to help our partners understand optimizations related to the number of integrations, their position on the page, and geographic specific insights around their content.

You might also be interested in: How to replace type methods in Swift to improve testability

Developing an analytics package and ensuring type safety

As part of an effort to provide this granular, insightful, and meaningful knowledge to our partners, we developed an analytics package that surfaces information about our partners’ website and GetYourGuide integrations. When we started this project, we had a few essential requirements that we considered mandatory:

Derive Typescript types directly from analytics pipeline definitions, which are currently written in Apache Thrift.
The library had to be small and have no negative impact on our partners’ website. (At the moment it is 4kb gzipped).
Served via a content delivery network (CDN)
Quick iterative development

We bootstrapped the typescript library with a great open-source tool, called TSDX. If you’re new to it, GitHub describes it as, “a zero-config CLI that helps you develop, test, and publish modern TypeScript packages with ease.”

TSDX provided us with basic scaffolding, live reloading, unit testing with Jest, and Rollup for building the minified files.

You might also be interested in: How we built our new modern ETL pipeline

Apache Thrift Definitions

Thrift is a powerful open-source software framework, but for this library, we are only really interested in the code generation that comes bundled with Thrift.

https://medium.com/media/2d3229c0314a3e417651866dcfaa409c/href

Above is an example of a thrift definition we defined to collect performance information gathered from the browser.

If you want to do this too, the first thing you are going to need is to install Thrift. You should then have the thrift command available in your terminal. Now you can simply run:

https://medium.com/media/affd972cbd2d46c07aebd147faecfb80/href

This command will generate a javascript and typescript file based on our thrift definition file.

https://medium.com/media/affd972cbd2d46c07aebd147faecfb80/href

Wunderbar. You have just completed the first requirement.

At GetYourGuide, we have a rule that all Thrift definitions should be backward compatible, meaning you can only add fields, never remove. This has served us well as it coerces developers into being practical and cautious when creating/updating definitions.

Now that our Typescript definitions are generated, we can be confident during development that the types conform to our thrift definitions, and our library’s data flow is type safe. In the example below, the as keyword is a Type Assertion in TypeScript, which tells the compiler to consider the object as another type than the type the compiler infers the object to be.

https://medium.com/media/d8c7bd35efbe8ccd6cffcf96bef876ec/href

You might also be interested in: 3 mindset shifts to customer-obsessed engineering in Inventory

Performance and impact

Since TSDX uses RollupJS for module bundling, we use ES Modules to structure our application and provide optimized tree shaking when bundling the library. We didn’t include any external dependencies, both for security and size performance constraints.

We ask partners to embed the script into their website manually or automatically via Google Tag Manager. Using the defer attribute, the file gets downloaded asynchronously but executed only when the document parsing is completed. This ensures the partner’s website isn’t negatively impacted or slowed down in any way.

When sending the analytics request, we didn’t want to cause a negative user experience by executing a blocking XHR request. Google Chrome 80 also disallows synchronous XHR during page dismissal when the page is being navigated away from or closed by the user.

The first step is setting up an event listener that listens to the appropriate event based on desktop / mobile events. We then use fetch with the keepAlive flag to ensure the request can be sent while the page is unloading.

We experimented using the modern navigator.sendBeacon() but found issues across various browsers, so opted for using fetch with an XHR fallback for older browsers. When the API matures in the future, I envision using the sendBeacon method.

https://medium.com/media/56a3d8caf818201f9e0fc052b190f28c/href

Content delivery and iterative development

The final step was deploying the assets to a CDN, this ensures the file is replicated to edge locations around the world, meaning people even from Australia (my country of birth) download the file quickly and effortlessly.

During our Continuous Integration (CI) pipeline, we execute all our unit tests and automatically deploy the new assets to our CDN with a new unique hash, this means all user requests will automatically receive the new assets, invalidating the older assets.

This entire process allows developers to be confident their types are safe, new features do not negatively impact the library or partners, and quick iterations can be added if and when required.

Types provide a safety net for developers. When you have developers writing code all day, five days a week, your code base can expand to hundreds of thousands of lines of code. Since a code base is a constantly evolving digital organism it becomes harder and harder to create a visual map of how everything is interconnected.

When a new feature is being worked on, it may inadvertently require the developer to change functions written by others. Types provide developers a way to be explicit about the structure of their data or arguments being passed to a method. This directly translates to less bugs, less stress and confidence that your code is future proofed.

For publishing partners who have embedded the library, they can see detailed analytics around how their GetYourGuide integrations are performing along with suggestions and improvements in real-time.

We are also working on allowing partners to dynamically adjust the integrations from the partner portal. We can then use aggregated anonymous data to provide partners with integration changes from our recommendation engine. We are excited to see what our partners can do with these new tools and which insights will be unlocked.

For updates on our open positions, check out our Career page.

How we use Typescript and Apache Thrift to ensure type safety was originally published in GetYourGuide Tech Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Implementing a reliable library for currency conversion

GetYourGuide Tech Blog — Tue, 15 Jun 2021 12:55:57 GMT

Roberta Huang, associate backend engineer, and her team built a Java library to provide exact currency conversion with reliability and short response times.

This story was originally posted in our main blog. To see it follow here.

Roberta Huang is an associate backend engineer in the Fintech team. The team focuses on both customers and supporting the internal teams. For customers, they create seamless checkout experiences. To help other teams shine, they build and scale financial services. Roberta shares how her team built a reliable Java library for currency conversions and the challenges they faced while developing it.

GetYourGuide offers a marketplace with thousands of activities provided by partners around the world. When partners create their activities, they set prices in their local currency. To make customers feel at home while they browse, we display activity prices in the customers’ preferred currency. Our platform currently supports 40 different currencies.

Activity cards displaying prices converted into customer’s preferred currency.

As we migrate from our monolithic architecture to a microservice architecture, the new services that support different currencies need a library that exposes the currency conversion logic. In the FinTech team, we have recently implemented such a library in Java — our FX SDK. The key requirements were for it to be fast, numerically consistent, and reliable.

Currency conversion is a simple multiplication (initial amount × exchange rate) from a mathematical perspective. Yet, it can raise different challenges from an implementation point of view. Let’s now dive into our development process of the FX SDK and the problems we faced while implementing it.

How are currency conversions computed?

The currency conversion logic is currently implemented in our monolith application, where we also store currencies and exchange rates data. We periodically fetch new data from our exchange rates providers and update our database with the latest exchange rates, stored with a certain precision. Whenever we convert an amount from a source currency to a target currency (for example, from 150.09€ to Japanese Yen, ¥), we perform the following steps:

Pull the currently applicable exchange rate from the database (e.g. 1€ = ¥123.590492)
Compute the raw converted amount into target currency (in our example, 150.09€ × 123.590492 = ¥18549.69694428)
Round the converted amount based on the target currency. For example, in the case of Japanese Yen, there are no fractions of a Yen, so the final converted amount is rounded to ¥18550.

What is the library’s interface?

The FX SDK’s interface is simple. To convert prices between different currencies, the clients need to work with monetary amounts and currencies. They need to have a function that performs the currency conversion. Therefore, we provide two client-facing domain classes:

The following code snippets do not fully reflect our actual implementation and are simplified for understandability.

https://medium.com/media/ac154fb03df3fd89bc7e7b1351e47f67/href https://medium.com/media/b8322e88524973b503dcdcc6d9f3d584/href

And finally, we also provide a Converter class with the currency conversion method:

https://medium.com/media/289b5fea5d6959fb181a529dbd37125b/href

What were the challenges and how did we solve them?

As we have seen previously, currency conversion is a critical feature for many areas of our product, so its availability and accuracy are crucial. Therefore, we need to guarantee both numerical precision and system reliability to assure positive customer experience and financial correctness.

🔢 Numerical representation with BigDecimal In Java

The first challenge for currency conversion is related to limitations on the numerical representation in computers. Real numbers are often stored using the floating-point representation. Because computers are finite-state machines, we cannot store all real numbers with arbitrary precision; some are rounded to the nearest representable floating-point number.

This means that we cannot compute currency conversions precisely with the floating-point representation, and the converted amounts would have a calculation error. Even though the error is relatively small and in most cases not noticeable in the converted amount after rounding, it becomes significant in magnitude when we deal with large values, and we still want to prevent it. Moreover, the errors can accumulate over mathematical operations, such as additions and multiplications.

An example of how a number stored with floating-points representation can differ from its real value.

Therefore, floating-point representation is not ideal for calculations with money, such as addition, subtraction, multiplication, VAT calculations or percentage discounts.

Fortunately, Java provides a way to represent decimal numbers more accurately with BigDecimal. BigDecimal receives a number as a string, and stores the value as an arbitrary precision integer unscaled value and a 32-bit integer scale (unscaledValue × 10^(-scale). Using BigDecimal, not only can we represent monetary amounts and exchange rates exactly, but we can also perform safe and accurate computations. With BigDecimal we have a precision of up to 10^(-(2³²)), which covers all needs that arise when working with money.

The smallest unit of a commonly used currency would probably occur for Bitcoin at 10^-8. The common operations of adding, subtracting and multiplying by an integer cannot increase the number of relevant decimals. Performing VAT calculations or applying percentage discounts can actually result in an increase in decimals, but there is always a clear bound — no one issues a 7.379842331467899% discount, and no government has such a definition of VAT. As the results of calculations need to be represented as meaningful amounts in the respective currency, subsequent rounding ensures that the decimals are reduced appropriately again.

For example, if a customer buys a ticket with a normal price of €10.35 that is 10% off, the final price could be calculated as 10.35 * 0.9 = 9.315, which is not a valid Euro amount as it contains half a cent. In this situation, rounding needs to occur and this might create or destroy cents.

It is important to understand that in this scenario, the provided discount should not be computed as 10.35 * 0.1, but rather the original price minus the discounted price, which requires no rounding and therefore enforces consistency without the possibility of creating or destroying cents.

💪 Resilience: How we prevented network delays and data inconsistencies

As mentioned above, currencies and exchange rates data are stored in our monolith. A simple approach to fetch the data needed for currency conversion is via Remote Procedure Call, by calling an endpoint for each conversion. However, in this way, the currency conversions strongly depend on the availability of the monolith application.

Another downside is that the network delays would not meet our requirements, as they would make conversions too time-consuming. Instead, to synchronize the latest exchange rates, we use Apache Kafka as a messaging system to deliver the most recent data to all clients. Kafka is built for resilience: it provides replication, fault-tolerance guarantees, and zero downtime, which are essential for providing correct and up-to-date exchange rates for our use case.

Whenever we update the exchange rates in the monolith, we publish them to a specific Kafka topic. During the process, we make use of the outbox pattern with Debezium, such that we record the new Kafka message as part of the database transaction that updates the exchange rates, guaranteeing consistency between database updates and Kafka messages being produced.

💲Supporting new currencies

In case we have to support a new currency, we want to make it available immediately to all clients using the FX SDK. If we hard code currencies data (e.g. their symbol and rounding logic) in the library, and if we were to introduce a new currency, then we would have to update the SDK and make all clients upgrade the library to the newest version. This is not ideal. We choose to provide all currency data (ISO code, symbol and rounding logic) in the Kafka messages, along with the latest exchange rates. In this way, the clients can access the new currency right away and an update of the SDK is not required.

https://medium.com/media/89f91920dc1ba744345748b880591657/href

Kafka message produced by the monolith with up-to-date currencies and exchange rates data.

✏️ Topic catch up

For our Kafka topic, we set up log compaction to automatically delete old exchange rates messages. While it guarantees that we have at least the latest message in the queue, it could still contain some old exchange rates data. To prevent the FX SDK from using outdated exchange rates for currency conversions, we implemented a monitoring system that emits an event when it reaches the latest offset in Kafka. The client can then use the event to assess if the FX SDK has consumed the latest exchange rates and is ready to compute currency conversions.

🔒 Transaction-safe operations

Sometimes we have to compute multiple currency conversions in a single job. An example is the search results page, where we display a list of relevant activities with their prices in the customer’s currency. Since the FX SDK listens to Kafka and continuously updates itself with the newest exchange rates, it might happen that while rendering the search result page, the exchange rates are updated. Therefore, we need to make sure that exchange rates remain consistent in this scenario. We implement a transaction-safe converter that uses the same exchange rates data (available at the time of its creation) for its entire lifetime.

https://medium.com/media/a49574fe811492ed1987af067e65365b/href

An example of transaction safe conversion, where the converter stores the same exchange rates data.

This way we make sure that the prices are converted using the same exchange rates data during the process. At the same time, we set a lifetime limit on the transaction safe converter, to assure that the converter object is used for time-limited jobs and prevent it from using outdated exchange rates.

Search results page where we compute multiple currency conversions during its rendering.

What is next for the library?

The FX SDK Java library we built is supported by a reliable system based on Kafka, that provides exact currency conversion with short response times. As we are moving away from our monolith, the library will be used by different services, providing the foundation for all money-related processing.

Acknowledgment

This project was carried out together with our Senior Backend Engineer Constantin Șerban-Rădoi, who brought his valuable expertise to the project. I would like to also thank our Engineering Manager Daniel Huguenin, who provided great insights during the ideation phase, and everyone who gave valuable feedback during the entire project. Thank you all!

If you’re interested in joining our engineering team, check out our open roles.

Implementing a reliable library for currency conversion was originally published in GetYourGuide Tech Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Deploying and pressure testing a Markov Chains model

GetYourGuide Tech Blog — Tue, 15 Jun 2021 12:51:07 GMT

Baptiste Amar, senior data scientist, explains how he deploys and pressure tests a fractional attribution model.

This story was originally posted in our main blog. To see it follow here.

In part 3 of this 3-part series on fractional attribution models, Baptiste Amar, senior data scientist, explains how he deploys and pressure tests the Markov Chains model. Deployment implies replicating the global findings from the design phase at the channels’ sequence level: every time a transaction fires, we need to distribute it across the marketing channels that triggered it.

Then, to ensure that the deployed model is valid and versatile, we designed and applied a testing methodology. This results in closing the narrative and institutionalising our data-driven attribution model as the single source of truth within the company.

Read part 2 to learn how Baptiste chose and designed the Markov chains model. Read part 1, Understanding data-driven attribution models, for an introduction to the topic and its business case.

Here’s a recap of the year-long project from choosing a model to pressure testing it:

Q1: Selecting and designing the model. In the first phase:

We built a solid dataset with paths to conversion for every customer
Tweaked Markov Chains model to estimate overall channel importance

Q2: Deploying chosen model

Q3: Validation / pressure testing the model on real marketing campaigns

After Q1, the project’s outcome was super insightful, but wasn’t yet the backbone of a daily marketing strategy that we were trying to provide. To go one step further and extract the highest value possible from the project, we dedicated the second quarter to deploying the model. Every time a transaction fires, the goal is to credit all involved channels for the right portion of revenue, depending on their impact.

One constraint for the deployment is that we did not want to retrain the model every day because:

it would have been computationally expensive
we wanted the channel’s revenue evolution to be model neutral so it’s only impacted by marketing interventions.

Hitting the rock: from global importances to individual estimates

This is where we faced an issue we did not foresee: How could we translate overall channel importances into individual-level estimations? In other words: for a given sequence of channels, how much should we credit each of them?

Theoretically, it was an unsolvable problem: getting channels credits at the transaction-level implies that paths are independent of one another. However, Markov Chains’ logic is built upon dependency across all the sequences in the graph. To translate global outcome from the model into individual-level, we needed to tweak (once again) Markov Chains logic and adapt it to our needs.

Asking for help: tapping into the community

At GetYourGuide, when we run into a problem, we’re always encouraged to ask our peers and share knowledge. In this case, we didn’t have an expert in this domain. So to solve the problem, we reached out to external sources: We contacted the author of the ChannelAttribution package, whose contact we found on CRAN.

We explained to him the operational problem we were facing. He agreed to help as it was also an opportunity for him to enhance the package’s outcomes from the deployment perspective. This was a fantastic learning experience, as we were both super committed to finding a solution in a timely manner. This required strong collaboration based on each of us, leveraging the best of our different skill sets.

This is where we faced an issue we did not foresee: How could we translate overall channel importances into individual-level estimations? In other words: for a given sequence of channels, how much should we credit each of them?

Together, we tested several different approaches, and after an incredible amount of trial and error — and a few tears — we came up with an original method. From the global credit allocated to every channel, we built an algorithm based on the transition matrix that estimates credit to allocate for each channel in every path, so the aggregation replicates global results.

You might also be interested in: 15 data science principles we live by

The underlying idea is that global results (channel importances) make sense: they come from Markov Chains logic. When applying weights to channels at the path-level — i.e. depending on the other channels in the paths and its position — and aggregating every path, we need to approximate the initial results to the best extent possible.

To do so, we designed an algorithmic approach that takes the actual sequences of events throughout the entire graph and applies weight. After normalizing the weights and aggregating all conversions, we compare the results to the channel importance and apply a corrective factor. Then we do the process again, apply another corrective factor, and so on and so forth. Once we reach the desired convergence level, the process stops and outputs channels path-specific weights for every sequence that we observed.

The last piece of the algorithm we needed to build relates to the fact that there is an infinite number of combinations that can lead a customer to conversion. Hence, many have not observed. Then, we built an algorithm that split it into triples of channels for every new path and replicated the path-level channel weights estimates that we calculated before.

The new approach is now embedded within the ChannelAttribution package.

You may also be interested in: How this display marketer and his small team make a big impact

Building the pipeline

By applying the above methodology, the outcome is that for any sequence of events towards conversion, we can credit the channels depending on their impact. To deploy the model and have every new transaction’s revenue distributed, we built the following pipeline:

1. We stored the results of the model at the path level: it is basically a three-columns csv containing the sequence of channels, a specific channel, and its weight for the specific sequence.

x% + y% + z% = 100%

2. Everyday, we gather transactions and their paths and apply the weights accordingly. If the path has never been observed before, we impute it.

Phase 3: pressure testing

At the end of phase 2, we refined a Markov Chains model distributing every transaction’s revenue across involved channels. A lot of sanity checks and backtesting ensured the model’s validity on the past interactions between our customers and our marketing interventions.

To ensure our model’s versatility, e.g. its ability to adapt to new situations, we dedicated a quarter to aggressively pressure testing the model on actual marketing campaigns. The hypothesis we want to verify is that, unlike heuristic approaches (e.g. our U-shape model), Markov Chains attribution captures incrementality signals.

What is pressure testing?

Pressure testing consists of designing marketing campaigns and measuring two elements:

Incremental revenue generated by the campaign
Channel weights’ evolution in the exposed group against an unexposed group

If a specific campaign is proven to be very incremental, the Markov Chains model should credit the involved channel with more weight than where/when there is no campaign.

Example of pressure test

We wanted to perform as many pressure tests as possible to ensure model sanity, but also respect a relevant timeframe so that the project is closed at some point — avoiding analysis paralysis. To do so, we planned all the tests happening over a quarter, from campaign design and launch and outcome analysis.

Examples of pressure tests in the pipeline are:

blacking out a specific segment from paid search
navigational searches uplift from Video campaigns
repeat purchase incentives CRM campaigns

In all those cases, we measure both the incremental revenue generated (or lost) and compare it against models’ weight to ensure that the incrementality signals were captured correctly.

You might also be interested in: How we built our modern ETL Pipeline part 1 and part 2

Conclusion

Taking these three steps, we designed, deployed, and tested our Markov Chains based attribution model. The last remaining step is then to socialize widely throughout the company to make sure it is instituted as the single source of truth regarding revenue attribution. For this matter, thoroughly understanding every stakeholder’s needs and expectations is key, and so is providing the model’s outcome in the right format (datasets, dashboards, presentations, specific analyzes, and so on).

Next steps towards measurement

Deploying a data-driven attribution model is a significant additional step towards making marketing decision-processes more data-driven and eventually more profitable. To go further and beyond into marketing measurement, though, it can’t solve every question standalone.

To ensure our model’s versatility, e.g. its ability to adapt to new situations, we dedicated a quarter to aggressively pressure testing the model on actual marketing campaigns. The hypothesis we want to verify is that, unlike heuristic approaches (e.g. our U-shape model), Markov Chains attribution captures incrementality signals.

Firstly, it is blind to some specific marketing interventions (offline) and interactions (impressions). This means that some channels, like display, will not get credited for the right value they are bringing to the company. To understand the value of those channels that are not trackable or those whose value does not only lie on click, we should refer to other marketing measurement standards such as Media Mix Model or lift studies.

You might also be interested in: How we scaled up A/B testing at GetYourGuide

Secondly, the attribution model provides a short-term vision of channel performance. Our model is based on the last 90 days before the conversion. To set the correct ROI targets for the channels, however, one needs to understand that the quality of captured traffic is important: some channels can participate in building longer-term relationships with clients, which results in their customer lifetime value is higher. In this case, attribution needs to be connected with a solid CLV framework and hacked with long-term ROI multipliers.

In general, each implementation of an advanced analytics tool makes the organization smarter and marketing interventions more efficient. But at the end of the day, blending all the tools to create a unified framework helps optimize budget allocation, steer channels efficiently, and set up the right targets and forecast outcomes.The more you build and blend them together, the higher the value your marketing analytics will generate — way beyond the sum of each value model.

If you are interested in business intelligence, data analysis or data science, check out our open positions in engineering.

Deploying and pressure testing a Markov Chains model was originally published in GetYourGuide Tech Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Understanding data-driven attribution models

GetYourGuide Tech Blog — Tue, 15 Jun 2021 12:34:15 GMT

Baptiste Amar, senior data analyst, designed a fractional attribution model. In this intro, he explains what it is and why we need to go beyond rule based modeling.

This story was originally posted in our main blog. To see it follow here.

Baptiste Amar, senior data analyst, designed a fractional attribution model to more accurately credit marketing channels for their impact on revenue generation. This article is part 1 of a 3-part series on how he designed and executed the model. In this first installment, Baptiste sets the foundation by introducing the fundamentals and challenges of attribution models.

In part 2 Baptiste follows up with the model design process, gathering and formatting the data, and modifying the Markov Chains model. In part 3 he deep dives into the challenge of deploying the data-driven model in the systems, and of pressure testing it on real marketing campaigns to ensure its relevance. If you haven’t already, follow us on LinkedIn to stay up to date for part 2 and 3.

Why do we need data-driven attribution models?

The days of faith- or expertise-based budget allocations in marketing are long over. With the increased penetration of data and analytics into business strategies, marketing managers are faced with even more challenges: They now need to constantly prove the value of their actions.

But marketers aren’t the only ones being confronted by this newfound challenge. Marketing-specialized data analysts like myself are responsible for providing valuable and actionable content to marketers, whether it’s quick insights or heavy-duty modeling. Ultimately, this helps operational marketing teams make better decisions such as building an optimal media mix, launching more performant campaigns, or creating more engaging content.

Without this fundamental understanding of the business problem, it is virtually impossible to design and deploy a relevant fractional attribution model.

In mature marketing organizations like GetYourGuide, analytics is essential when it comes to allocating resources: Media managers need material to get buy-in from financial stakeholders and eventually unlock the operating budget.

One of the biggest challenges in this context is measuring the return on media investments: How much revenue did the investment spent on specific channels or campaigns generate? This structural question can be answered within several approaches, which all require reliable data and sophisticated modeling.

One of the standard ways to tackle it is to divide revenue across the marketing channels depending on the impact they had on generating it. This is what attribution modeling is all about.

You might also be interested in: How a display marketer and his small team make a big impact

Marketing and conversion

Before purchasing a product online, customers can be exposed to a wide variety of marketing assets. An example of a path to conversion could be:

1. A customer sees a banner on a website that links to booking a Tour Eiffel ticket on GetYourGuide (display ad), and clicks on it. They browse our inventory without converting.

2. A few days later, they query Google search engine for Tour Eiffel tickets, and click on the GetYouGuide ad (paid search) to access our platform once again and refresh their memory on the activities we offer. While browsing, they opt-in to our newsletter.

3. A week after the customer’s last visit, they receive an action-based email reminding them of the Tour Eiffel ticket, click on the email, search on our website for the tour they had their eye on, and book the attraction.

In this journey towards conversion, three marketing channels participated: display, paid search and email.

If we want to credit those three channels to the right portion of revenue — depending on the impact they had on the conversion — which channel would we attribute to the most?

a. The display ad because it drove our client to the website for the first time and got them considering our brand?

b. The paid search click because it likely pushed the client much further into the purchasing intention?

c. The email touchpoint because it made the client convert?

An example of a path towards conversion

In all likelihood, all three of those visits had a substantial impact on the customer’s action.

Based solely on this simplified example, we already understand that there is no straightforward solution. Then there is another layer of complexity. With around 20 media in our mix, the number of channel combinations towards transactions is almost infinite.

Rule-based models: the easy way

Across industries, simple rule-based models are the most wide-spread method of crediting channels’ revenues because of two main advantages:

1. They are easy to understand for channel managers

2. They are easy to implement in the systems

Most classic models only credit the revenue to first-channel (first-click), to only last-channel (last-click) or equally across all channels involved (linear).

At GetYourGuide, we mainly based our attribution logic on a “U-shape” model where the first and last channels each get 40% of revenue, and the remaining 20% are divided across the intermediary ones.

Attributing revenue with a U-shape model

This logic, which is slightly more sophisticated, allows for better steering of both the initiating and closing channels. However, the channels that are often positioned in the middle of the path — those that reactivate customers’ intent, and influence other channels — are left understated.

Data-driven and fractional approaches: estimating impact

Rule-based models, even sophisticated ones, often fail at outlining the true impact of the channels. They don’t capture the complexity of the media mix and channels.

A data-driven model, on the other hand, aims at understanding the links between our marketing interventions and the customer’s response. Among many other factors, they have the potential to consider:

the sequence of events that led a client to purchase
the interactions between the involved channels
their position in the paths
proximity to conversion

By leveraging all this information, data-driven models allow for crediting channels’ revenue with more accuracy for the impact that they have on our transactional relationship with the customer, which carries over strong signals of incrementality.

Introducing data-driven fractional attribution modelling

Attribution modeling is the most essential tool for calculating channel performance (even rule-based models). Indeed, the key metric used at GetYourGuide is Return On Ad Spend (ROAS), which is the revenue generated by the channel divided by the spendings — and channel revenue necessarily involves some attribution logic.

Now, having an attribution model that credits channels for their impact on revenue will help:

Better understanding channels’ profitability and support budget allocation decisions
Setting up the right targets for channels over a defined period
Channel managers to adjust their media acquisitions quickly so they can keep track of the revenue they intend to generate
Design channels’ strategy by analyzing the outcomes of campaigns or specific interventions.

For all those reasons, a robust data-driven attribution model makes marketing more efficient, which leads to increased global revenue.

You might also be interested in: 15 data science principles we live by

A three-step approach for a best-in-class fractional attribution model:

We organized this project in three parts, each achieved during a quarter. We’ll go through them in the next blog posts to be published in the upcoming weeks.

The first quarter was dedicated to building the data and testing several models.

Outcome: model selection.

In the second quarter, we fine-tuned and deployed the chosen model in the systems. In this part of the project, we hit a wall and had to come up with original material to be able to build the pipeline.

Outcome: model in production.

Lastly, in the third quarter of the project, we aggressively pressure tested the model on actual marketing campaigns to ensure its relevance. Pressure testing consists of comparing the campaigns’ incrementally against the model’s weights on specific campaigns. This is to validate that we capture the right incrementality signals.

Outcome: model becomes the single source of truth.

Without this fundamental understanding of the business problem, it is virtually impossible to design and deploy a relevant fractional attribution model. The introduction helps us better understand the issue we are trying to solve: Estimating each channel’s revenue to measure their performance and make more efficient decisions in the future. In the forthcoming posts, we will share the multi-step plan where every part will generate valuable outcomes.

In part 2 of this series, we will focus on how we designed the model leveraging Markov Chains technique. Stay tuned.

If you are interested in business intelligence, data analysis or data science, check out our open positions in engineering or marketing.

Understanding data-driven attribution models was originally published in GetYourGuide Tech Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Preventing traffic routing regressions for Istio Virtual Services

GetYourGuide Tech Blog — Thu, 10 Jun 2021 13:50:22 GMT

Fernando Cainelli, senior site reliability engineer on the Reliability team, shares how we prevent routing regressions with a validation tool for Istio virtual services.

This story was originally posted in our main blog. To see it follow here.

Fernando Cainelli is a senior site reliability engineer on the Reliability team. His team’s mission is to provide scalable and highly available infrastructure for our engineering teams. He shows how his team made networking configuration changes safer using an open-source tool his team built to validate Istio VirtualServices.

Understanding our traffic management

One of the Reliability team’s responsibilities is to manage how traffic reaches our services, either from the internet, commonly known as north/south, or service to service, east/west.

We run all our services on Kubernetes, and we have been using Istio since the beginning of 2019 for both network use cases. This brings the benefit of having the same networking configuration for ingress and service to service traffic. All the external routing configurations live in a single repository with more than 60 VirtualServices.

A typical VirtualService configuration looks like:

https://medium.com/media/cdc96822457556909cb2f62ec2a7956c/href

The above VirtualService is bound to an Istio Ingress Gateway and the “mesh”, a special keyword that assigns this configuration to all sidecar proxies in the cluster. The nice thing about it is that routing works seamlessly, independent of where the requests originate.

curl https://awesome-api.getyourguide.com/customerResetPassword/

Request from Internet

https://medium.com/media/0f21753b456edbe0fc528cc3954d9d8a/href

Request from an internal service running in Kubernetes

You might have read about our journey towards a distributed, service oriented architecture. It’s a long and gradual process: We’re extracting business logic from the monolith piece by piece. Let’s take a look at a hypothetical example of functionality being extracted and roughly the process behind it:

The Travelers Platform team, responsible for core services in the monolith, wants to extract the user authentication functionality out of it as it’s too complex to maintain in the old stack.
After a round of design sessions and feedback, the decision is to go with a trustful external identity provider to store users’ credentials.
Existing helper functionalities in the monolith would be moved to a separate service such as custom reset passwords workflow or personal information clean up.

Once the team has implemented the reset password endpoints following the existing API contract, it then sets up a route in the VirtualService:

https://medium.com/media/ed062e9692b139a7474e9a9f6ee04f29/href

From that moment, all reset password requests are being handled by the new service instead of the monolith transparently for internal and external clients.

You might also be interested in: What it’s like to move from backend to DevOps

Some of the challenges with Istio

Having Istio in our toolbox helps teams split the monolith up, gain visibility, and improve the resilience of our services. However, in reality, it also brings complexity and other issues that need to be addressed. For the scope of this blog post we will limit the discussion to the following topics we have experienced:

VirtualServices can be big and often contain advanced regular expressions. People are afraid to extend or optimize them.
Engineers setting up the rules are not familiar with Istio configuration and regressions can easily be introduced.
Time spent in pull request reviews is not linear with the number of changes in it. VirtualServices needs to be reviewed as a whole as rules might conflict with each other.
Troubleshooting is hard. We need a deep understanding of Istio architecture and APIs, Envoy, HTTP Protocol, TCP, Kubernetes Networking, etc. When something goes wrong, the MTTR (mean time to recovery) can take hours.

As our engineering teams move fast, the changes to VirtualServices are quite frequent, increasing the chances of bad configuration being deployed.

Let’s get back to our hypothetical scenario. Now, a different team is extracting the customers data to its dedicated service called users. When the service is ready to be deployed, the engineer working on this finds the auth service route and copy/paste it, changing the path and destination for it with the new users endpoints.

https://medium.com/media/c099165566e40fbf252f1aba79df149d/href

The VirtualService looks legit until we read the whole file in detail. We can see that the route to users service is based on a prefix to /customer, and as the routes are evaluated top down, it also matches requests to /customerResetPassword. When this configuration is applied, password reset requests would go to the users service that doesn’t have this endpoint, returning a 404 error to the client. Such a mistake is easier to spot in a 30 liner file but is much harder in a 1000 lines one with complex rules in it.

You might also be interested in: Staff DevOps Engineer, Developer Enablement

Introducing a validation tool

One way of minimizing the risks is to introduce safety nets; the earlier we spot the issues, the more confident we are rolling out changes. It’s not new in software engineering to make use of unit tests to prevent regressions. Unfortunately, there’s no such tool for Istio VirtualService configuration, so we decided to come up with a validator that:

could assert if your VirtualService change is doing what you want while you rest assured that existing routing behavior will not break.
run it locally and as part of CI (continuous integration) without the need for a Kubernetes + Istio setup.
have a simple test case API. They should be short to write but have extensive coverage.

The idea was pitched in a Hack Day* project. Two other engineers, Alexander Mack and Andreas Jaggi, who are passionate about testing, joined the project. After a couple of days of working, we came up with an istio-config-validator. This tool reads all your local VirtualService configuration files and asserts against the test cases you create. Here is an example of how it works continuing with the VirtualService we have created:

We will start by writing our test cases for the routes we know of. That is typically what a developer will do before submitting a pull request:

https://medium.com/media/97418447a72638d7f8262ab447614eb0/href

Three test cases describing our intended behavior for routing were added. The full API specification for test cases is in the repository documentation, but let’s break down and understand what we’re defining here.

description: A short description of the test case. It is useful when a test case fails, and we want to identify which one.

request: It will be used as input for the VirtualService rule matching. A combination of all possible parameters such as authority, path, and headers will be used to mock requests.

route: What’s the exact destination expected? This is what will be used to assert against the result.

wantMatch: If you expect the assert to be true or false.

Let’s run istio-config-validator passing our test cases file and the directory where the VirtualServices are located.

$ istio-config-validator -t tests/ manifests/virtualservices/

running test: customers endpoints pointing to users service
PASS input:[{awesome-api.getyourguide.com GET /customers/1 map[]}]
PASS input:[{awesome-api.getyourguide.com GET /customers map[]}]
PASS input:[{awesome-api.getyourguide.com OPTIONS /customers/1 map[]}]
PASS input:[{awesome-api.getyourguide.com OPTIONS /customers map[]}]
PASS input:[{awesome-api.getyourguide.com PUT /customers/1 map[]}]
PASS input:[{awesome-api.getyourguide.com PUT /customers map[]}]
===========================
running test: authentication service endpoints
FAIL input:[{awesome-api.getyourguide.com GET /customerResetPassword map[]}]
2020-10-15T13:24:35.675470Z     fatal   destination mismatch=[destination: ], want [destination: ]
exit status 1

The validator failed on the authentication service endpoint test as a GET request to awesome-api.getyourguide/customerResetPassword went to the users service instead of the auth service. That’s enough information to fix the configuration.

https://medium.com/media/51ac6dd54e4c508dbf0ad12f8215a51f/href

There are different ways for solving it: here I chose to use regex matching over prefix. Running the command again, we see that all tests have now passed.

$ istio-config-validator -t tests/ manifests/virtualservices/

Test summary:
 - 3 testfiles, 3 configfiles
 - 3 testcases with 24 inputs passed

The validator mocks Istio’s rule precedence logic for finding the right destination to assert. This is a naive approach, and we already received good feedback on how to improve it.

In general, we are quite happy with the outcome. Since we introduced it, developers have created more than 1,000 test inputs. They can run it locally, get immediate feedback, and deploy with confidence, putting us in a much better position than six months ago.

We have decided to open-source the project as we think others might be struggling with the same type of problems we were, and they can benefit from it. If you want to start using the validator or contribute, please file an issue or PR in Github.

I want to thank Alexander Mack and Andreas Jaggi for their valuable contributions and for joining the project. It was pleasant to work with them while delivering something impactful for the Engineering teams.

*HackDays are 2-day hackathons where engineers are allowed to work on a project of their choice, with colleagues in varying domains across the organization. It provides an opportunity to try something new, build something impactful, and/or meet new people.

For updates on our open positions, check out our Career page.

Preventing traffic routing regressions for Istio Virtual Services was originally published in GetYourGuide Tech Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.