Product & Tech Ecosystem for Holiday Marketplace

Engineering & Data Science @ TravelTriangle — Building Complex and Scalable Holiday Marketplace (Part I)

Published in

TravelTriangle

11 min readApr 18, 2020

This is Part I of three-part series discussing generic and scalable framework being developed by our engineering and data science teams as well as to give you a glimpse of engineering guidelines and culture @ TravelTriangle

TravelTriangle(TT) is an online holiday marketplace that connects travelers to multiple local travel agents to help them create a customized and memorable holiday.

Holidays are a complicated entity involving many moving parts. It involves the finalization of destination, cities, flights, hotels, activities, sightseeing, cabs, dates, budget, company, agents. All this finalization happens in different sequences for different people. For some budget’s finalization comes first, for other activities’ finalization. At each step, there are multiple factors involved in finalization. An additional level of complexity is added by multiple actors like travelers, agents, travel advisors, and multiple products and telephonic interactions.

Due to the manifold complexity of the problem, for the last couple of decades, holiday sales have been happening over call, emails as it has been a very subjective sales process for the travel agents.

At traveltriangle, we are constantly striving to create the best holiday planning and booking experience for the travelers

Tech Vision and Generic Frameworks

You can read about our holistic product vision to solve this complex travel eco-system here. Multiple B2C and B2B product lines ( current and future) pushed us to have our technology architecture loosely coupled, highly configurable and effectively reusable as much as possible.

Further my take, as CTO, has always been that tech team should be working on newer things, as much as possible, while not just being busy in enhancing developed things just incrementally feature by feature. This also helps us developing things frugally keeping the utmost efficiency and not compromising on quality and in turn, empowering non-engineering teams to move fast and that too with minimal tech bandwidth (since it’s quite costly :)).

With that in mind, we always combined a group of feature(s) to understand first on the bigger framework, which should enable even further features on the top seamlessly and not needing to build each feature as and when it comes. In fact, the framework should be extendible to similar other product lines too with ease.

… with proper design, the features come cheaply. This approach is arduous, but continues to succeed. — Dennis Ritchie

Below are the few generic frameworks that our engineering and analytics team has built over so far, so as to realize our product & tech vision smartly and efficiently:

Configurable and experimental frameworks (CERE)

The first and foremost step to realizing our product vision was to how can our product team roll out enough experiments/variations, analyze data and keep churning experiments to fail-fast without getting stuck on tech bandwidth and/or release cycles. Experiments could be around optimizing current as well as test out new page(s) and/or product flows for both travelers and agents. Also, there was dire need to understand the psychology of the user to cluster them in one or the other segment by detailed data understanding using advanced ML algorithms and on basis of this explicit or derived intent, trigger/test out multiple actions on what will work to make the planning cycle quicker, more effective between traveler and agents.

We devised an early experimentation approach ( blog here) so as to enable teams to release and test new variations handling high frequent use cases. However, that approach was not extendible to other use cases, the frequency of which was increasing and furthermore, the same approach stopped working once we moved our pages on React+Redux stack. We started getting a flickering effect on any page loading with experiment(s) enabled. To avoid flickering, we searched for many solutions, then we concluded on building our own with the help of Nginx, varnish and our very own in-house developed dynamic UI framework which enables any pages to pick up components as well as data within components to render full-page dynamically. The framework was common for our mobile and desktop platform. For the app, we had to tweak it a bit due to nuances around in-app page rendering (complete blog here).

With GTM, VWO for basic HTML changes and in-house dynamic UI framework integrated with varnish and Nginx, we were able to enable product teams to test different varieties of frontend pages generated dynamically as well run A/B tests to get to results faster and objectively.

However, we were still not quite there yet. The same approach/tools can’t work for Backend workflow. To solve this challenge, we first moved our architecture to event-based so that we can configure and control triggers easily as well as associate desired actions on the same. We developed our own event stream on publish/subscribe pattern using Kafka. We took it one step further making it configurable and A/B test friendly for which we developed CERE (configurable experimentation and rule engine) over the rule engine and event-driven architecture, to have product team tweak and test different backend workflows too on the go without needing tech bandwidth.

Post this point, our framework has been well built-up for quick experimentation, dynamic flow on both frontend and backend side. Using this, our product team has been able to change our product pages and workflow, lead allocation flow, using different communication channels, etc. etc. way quickly and without needing any tech bandwidth and/or release cycle.

MicroServices Design for loose coupling and cost-effective scalability

Each of our product lines (B2B or B2C) would have different technical nuances and growth in scale. Eg — traveler facing browsing pages would need rich data and content to be shown at blazing speed while transactional pages need to be more reliable and consistent in terms of information. Internal services are more reporting heavy and need faster big data processing to enable teams with realtime data to analyze and make decisions. Further systems needing third party inventory or payment providers integration would even have further different nuances, as well as scale.

MicroServices approach comes as a great savior here for us. I wish we had taken this approach well at the start but it’s better to be late than never. Each of service scales up differently in their own aspect, needs different infra, a different type of databases as well as follow different development guidelines. Further, it even gave us quite a leverage to rewrite some of our services from RoR to Golang wherever we needed higher concurrency in the near or long term future. Further, we were able to choose different SQL/NoSQL databases easily spanning from MySQL, MongoDB, and Redshift.

When we started moving to microservices architecture around 2–2.5 years ago, most of our team members were new to developing microservices and while it looks quite lucrative, it has its own downside to take care of.

We had to be cognizant of fact that we are not decomposing services for just sake of breaking but only when it is necessary.
Workflows becoming more complex and single transactions may now be spanning many services and require much more granularity in states and state-transitions.
Rollback and data inconsistencies too needed to be taken care of since there is multiple data storage now.
From the testing and deployment aspect, QA & DevOps team needed to evolve their testing and CI/CD approach respectively.
Further, we had to make sure that service discovery and inter-service communication happens smoothly while services are resilient in themselves and circuit breakers are used effectively.
Over it, the dire need for general services like authentication/authorization, API gateway, service registry, etc. started arising (we are still in progress to build the same and currently created workaround by creating façade layer over original monolith layer itself).

That being said, moving to microservices from monolith looks quite an overhead in the start ( overkill for some.. :) ) and not to mention, a sharp increase in server costs, however, you will start seeing the results in the near long term. It would bring in way lot efficiency in development, scalability, failure handling, better segregation of team accountability and even saving server costs in the near-long run as well.

Data Warehouse, Data Lake and Data Modelling

Whatever doesn’t get measured, doesn’t get improved. There were multiple touchpoints as well as a platform for travelers and agents to interact with the product/system. There was a need to have a central data repository. Further, we also wanted to use millions of these data points like traveler browsing, booking, profile, call, email, etc data to suggest the next best action item to the traveler and also, what all meaningful insights can you derive from past data?

In TravelTriangle, one of the core cultural values is being data-centric. Below are the aspects to take care of whenever anyone thinks of a data lake, which is also aligned with the 4Vs of BigData (Volume, Variety, Veracity, and Velocity):

collecting data from all sources into one
Validating & Triangulating data for correctness
structuring data to remove noise and process it
analyze or run modeling on data to draw business insights (5th V of Big Data — Value)
integrate insights into systems and/or day-2-day decision making
keep training or updating insights on new variety as well as the new volume of data

While there is a whole separate blog coming soon around this, I’ll mention, in brief, approach, tools, and techniques that we have used at TravelTriangle to build our own data warehouses, data lake and trained as well as deploy various data models to draw insights from data & trigger certain actions in real-time for travelers and agents.

We used Segment tool to pass on events from various product lines to our central data warehouse (RedShift) as well as external tools used by the product and marketing team. However, Segment has become too costly for our volume and hence we are transitioning it with our in-house real-time messaging stream (“RTMS”) to relay events from any source to destination. Integrating a new destination, though, in “RTMS” is not as easy as in segment but that we can live-with as of now in lieu of cost and not requiring high tech effort whenever the need arises ( blog coming soon..)

Further, we have developed an in-house data pipeline to sync data between MySQL database and redshift. AWS data pipeline has failed to meet our expectations here due to frequent issues and/or customization needed and hence after using AWS DMS for a while, we had chosen to develop our own data pipeline in-house. We have also tried AWS Glue, AWS data pipeline but none has been able to meet our requirement of realtime sync. For time being, we had used external sources like FlyData and Stitch but they were getting quite expensive due to our volume of data growing multifold each month. ( blog coming soon..)

For day-2-day reports for stakeholders, we have chosen Superset and Redash. Out of all the tools that we have explored, these met most of our needs and moreover were open source. The only thing missing to be developed in Superset is funnel visualization and once available, it can then serve our exhaustive needs. Our engineers have also optimized Superset and Redash to support 5x requests and response time from 10 secs to sub of 1s and few heavy reports up to 3–4 seconds using various tech optimization techniques. We are now working on testing and integrating druid, PySpark and similar big data tools for us, to decide on one and integrate it with our main applications to process BigData faster and on-the-go.

Post having centralized data across TravelTriangle, as well as able to show realtime reports to our teams to perform their daily analysis, we moved in to develop and train data models for our business use cases. One such case was dynamic lead scoring which had to be a real-time basis for us to help agents to prioritize their leads effectively. We are currently using EMR to process model and integrated our application with Kafka

Communication/Notification Layer (Sandesh)

Notifications are an integral part of any system, be it promotional or transactional. We knew that it has to be omnichannel and hence should integrate any new communication channel / API quickly. Further, the flow needs to be highly configurable, reusable as well as seamlessly integrable with any of other microservices (through events or API)

“Sandesh”, as we call our notification service here, takes care of different mediums of interaction, be it SMS, IVRs, Emails, WhatsApp or push notification. It is further flexible enough to add another medium of communication by integrating just a mapping layer with provider APIs.

Since this service needs high concurrency and IO, we had to also implement generic rate-limiting, burst handling as well as queue management system. Notification templatizaton engine, built over the experimentation framework mentioned earlier, gave the product team very high flexibility to A/B test their messages as well as the performance of different mediums based on notification type and time and that too without needing tech bandwidth and/or fresh release.

Even further, we have data events created to monitor and analyze delivered, open (except SMS) as well as clicked rates of each message across different mediums, all stored at our central data warehouse, to analyze and draw meaningful insights from it.

Our promotional campaigns are currently being managed through WebEngage, an external tool, however, we are in process to bring campaign manager too in-house integrated with our notification service as well as experiment framework.

While this article mentioned few frameworks in alignment with our tech vision and roadmap, Part II will tell you about more frameworks as well as the one that our QA and DevOps team have built to improve release quality and turnaround time.

A lot of interesting things happening @ TravelTriangle. Let us know if you need to know more details, perhaps email at lead_on@traveltriangle.com to join us to see it first hand while solving more challenging problems and creating a world-class holiday B2C and B2B ecosystem.

If you know anyone in your network who might be interested in solving these problems, do share this article.

If you like the article, do like and/or clap so that others might stumble upon this article.

Originally published at https://www.linkedin.com.