Ad-Hoc Data Requests: an Infinite Backlog ♾️😰

Osman Ghandour
8 min readFeb 7, 2023

After deciding to pivot away from the credit card benefits space (explained here), we chose a different problem area to focus our efforts on. Our team has experience in data engineering and analysis, so we thought to investigate some of the day to day problems that we have encountered in these areas. We were curious about how common our pain points were across different data teams. And so began the needfinding process again.

Over the past few months, we chatted with 70+ data professionals (engineers, analysts, PMs etc.) about the problems that they face every day. These people span several industries and cover all company sizes, from startups to established corporations. It was a long process that took several months — but in the end it paid off! Our findings finally converged.

Data teams are faced with a barrage of ad-hoc requests each and every day. Data analysts are brought in as domain experts with an impressive mix of technical and business competencies. Unfortunately, they end up wasting their time and effort answering repetitive and non-needle-moving questions. This is costing companies an arm and a leg in terms of lost productivity.

Below is a description of how we arrived at this problem and why it matters.

PG 13 Disclaimer: The following might get a little technical in comparison to our previous posts! While everyone is a cardholder and payments is a somewhat familiar industry, data analytics is a more niche field. If you want to steer clear of the data heavy terms, feel free to skip to the last paragraph for a summary in plain English.

“Hey, quick question — I know you’re busy but can you re-run this analysis when you have a sec? And if you could also help us dig up queries for these old decisions as well that’d be great.”

Where we started

We started off with the hypothesis that data teams spend an exorbitant amount of time dealing with pipeline failures. Specifically, ones caused by constant shifts in upstream dependencies i.e. changes in application database schemas.

For example, consider the case where a source table is migrated or altered. The data transformation models (built by data engineers) may break or become stale. The effects of this change are felt downstream: e.g. broken dashboards, inaccurate metrics, etc. Because pipeline failures like this start at the application level, we took a closer look at the interface between data producers and data engineers.

In order to prevent these failures, data engineers would ideally be made aware of schema changes in advance. This would allow them to update the transformation models to account for the new data being produced. Every change should be communicated efficiently to ensure minimum downtime for data products. We quickly learned that most companies do not have the processes in place for this.

We then asked ourselves how painful this problem actually is for data teams. For most companies, it is not the end of the world if there is some downtime on internal data products.

The real issue was around the discovery of the upstream schema changes. This problem is actually well served by the current market for data observability tools. Tools like Monte Carlo, Metaplane, and BigEye do a great job of detecting upstream schema changes and sending alerts for pipeline failures. They establish a starting point in the investigation of what went wrong. In addition, the oft-discussed concept of “data contracts” presents a paradigm for the robustness of business-critical data products.

In short, we realized that there is not a sufficient gap (read opportunity) in tooling for the interface between data teams and data producers.

The Real Problem

This is when we shifted our focus to the interface between data teams and data consumers.

We began to notice an emerging pattern in our conversations. Many data practitioners (specifically analysts) had mentioned the burden of dealing with ad-hoc requests. Answering these questions could easily eat up half their day, if not more. Think one off data pulls/analyses, bug fix requests, questions about a data product, etc.. Basically, any request that is unplanned and not a part of the data roadmap.

These data teams are essentially being treated like customer service desks by others in the company. We found this to be a very expensive problem. This is why:

1. Wasted time and effort on unnecessary work for irrelevant requests

Data teams are not oracles. They can’t magically summon the data that is needed to prove a certain point or reveal a specific insight. We commonly see requests that are not rooted in the structure of the data. Answering such a request would require many convoluted joins, several key assumptions, and a large investigative effort.

In other cases, data teams are asked to “find something interesting” about a certain subject. The person asking the question has no idea what they are looking for. Data teams don’t have the time to conduct directionless data exploration. Requests should be clear, well-documented and tied to a specific business decision.

It also turns out that data teams get many similar requests over time. This gets exponentially worse as the organization grows. Members of larger teams are naturally less aware of what each of their team members has worked on. This leads to more repeated work and less efficient use of time. There’s absolutely no reason why an engineer or analyst should be recreating queries from scratch every time.

2. Missed objective key results (OKR’s) and low data team return on investment (ROI)

Ad-hoc requests are distracting and can completely sidetrack the data team from their core responsibilities. If they are being pulled in multiple directions at the same time, how can they be expected to deliver on their longer term goals?

We are all familiar with this problem in our personal lives. We receive endless emails, texts, and notifications throughout the day. We can’t possibly be expected to drop everything and act on each incoming ping to our phone or computer. Context switching is a costly problem. Bending over backwards for every ad-hoc request will come at the expense of better data products and infrastructure in the long term.

3. Stressed internal dynamics

There’s often friction between data teams and their internal stakeholders over this issue of ad-hoc requests. Data analysts and engineers are annoyed at having to spend 25%-100% of their day working on them (not an exaggeration! We had many people tell us this). On the other side, internal stakeholders feel frustrated with long response times. Nobody is happy.

What are people doing about this?

Dealing with ad-hoc requests feels like drinking water from a fire hydrant. There were two commonly cited solutions that came up in our discussions. We think that both leave a lot to be desired.

Solution 1: Tell every internal stakeholder to write a Jira ticket if they have a request

Whenever someone is asked a request through slack, email, or another channel, just tell them to write a ticket into the ticketing system (ex. Jira, Linear, etc..). The data team will then go over the tickets according to some regularly scheduled time interval (say once a week). During this meeting, they will groom and assign tickets to team members.

This solution leads to poorly documented requests, which will require extensive follow up and more wasted time. That’s assuming the ticket is written in the first place. Ticketing systems for technical teams are not built for non-technical stakeholders. The added barrier reduces the number of tickets that end up being written. This may sound great to data teams, but it is a terrible experience for stakeholders. As a result, they will incorporate data less and less into their work, which will lead to worse outcomes across the board.

Solution 2: Hire someone to manage the requests

Many companies try to hire their way out of the problems that they face. This is an expensive way of doing things that is simply not sustainable for most companies, especially considering the current business climate.

The solution is as follows: if the data team is overburdened with ad-hoc requests, let’s just hire someone that all the requests can be sent to. They can go through them, do follow-ups, handle assignment, decide priority, etc. Effectively, this person acts as a barrier between internal stakeholders and the data analysts/engineers. The data team will have more time on their hands and the problem will be solved. How does this look at scale? Most companies have understaffed data teams as it stands now. Is it financially responsible to go out and hire people to handle this issue of ad-hoc requests? We don’t think so.

Solution 3: “Self-Serve Analytics”

It’s trendy nowadays to talk about self-serve data solutions. Why does the stakeholder need to ask the question in the first place? After all, BI solutions have been building a lot of great features to make exploration easier and more intuitive. Whether it’s through drill-down capabilities or today’s hot topic of “natural language interfaces”, they should be able to find the answers themselves.

While these solutions claim to reduce the load on data teams, they often cause extra confusion. Drill-down features require the end-user to fully understand the underlying dataset (forgetting a simple filter can lead to wrong metrics). As for natural language interfaces, they deliver un-nuanced and misleading answers to complex data questions. Poor quality answers = poor quality decision making.

The Takeaway

We intend on building a business rooted in a painful and valuable problem. The needfinding process has taken us a long way towards this goal. It has guided us to the issue of ad-hoc requests. As we dug deeper, it became clear that this problem plagues data teams across industries and at many different company sizes. This issue leads to many wasted resources, poor data team ROI, and degraded relations with internal stakeholders. We are unconvinced by the current solutions to this costly problem. Solution 1 reduces the use of data in the organization (which has many second and third order effects) and hurts the relationship between the data team and the internal stakeholders. Solution 2 is financially irresponsible in most cases. And Solution 3 actually hurts decision making by providing misleading answers.

This is where we are and we have some interesting ideas on how to solve this problem. We are currently finished with the backend of our MVP (minimum viable product) and are working on the frontend. We are also in discussion with potential partners for our upcoming pilot. We are really excited about it and we look forward to updating you as we make progress! As always, I’d love to hear your questions/comments/thoughts.

--

--