photo from

Big Data Checklist

The Joel Test for big data.

I started with Big Data before the term was coined.

Today it is trending. More and more projects understand that “Data is King” and are moving towards it.

Big data requires different framework for your gut feeling and intuition to keep finding the right routes. Good thing is, it’s a skill and it can be mastered.

Below is my “Joel Test” for big data projects.

It is not aiming at making everyone a big data expert. But it should help great engineers become great big data engineers. And make it more fun too.

If you are just starting to look into big data, you may think of the questions below as a guide to follow. If you already are into big data, you may find a few areas to sharpen your focus in.

  1. Do you have big data separated from the rest of your product?
  2. Do you have established metrics?
  3. Do you have a live dashboard?
  4. Do you have the fastest possible prototype-evaluate-ship-repeat cycles?
  5. Do you log everything?
  6. Do your designs enforce repeatability of results?
  7. Do you run live experiments?
  8. Do you run regression tests?
  9. Do you have infrastructure ready for big data?
  10. Do you understand the importance of organic data?
  11. Do you know how much headroom you have?
  12. Do you do research outside your main task?

Deep Dive

Having identified and prioritized twelve bullet points above, I am going to elaborate on them in more detail.

Down the text “big data” is used in a broad sense, that includes machine learning, data mining, information retrieval and other scientific disciplines along with infrastructure, evaluation pipelines and other implementation details.

On Metrics

  • The metrics are to help you track your progress, but their value drops below zero once improving the metric no longer improves the end user experience.
  • The metrics should allow comparing apples to apples.
    If the metric you like does not allow comparing its value today to yesterday, you may want to introduce another metric to track day to day progress.
  • The metrics should have a rationale behind them.
  • The metrics should be easy to explain to people who are not into engineering.
  • Imposing numerical goals on metrics can be a curse.
    If you do impose them, base your expectations on headroom analysis and proven big ideas to go live soon — not on past improvements rate or on how well your competitors are doing.
  • Do reality checks on your metrics often.
    Your localized precision or relevance or engagement should correlate well with higher-order metrics like growth rate, retention rate, how often your users comment on or share what you show them, etc.

On Evaluation

  • Along with data infrastructure, evaluation pipeline is what enables you to run dozen iterations per week instead of barely one.
    With big data 10x iterations is what makes a difference between good and great.
  • The evaluation should be as automated as possible.
  • The evaluation should be as fast as possible.
  • Offline evaluation should ideally run in a few minutes.
    If it’s all about going through under a gigabyte of labeled validation data, why on Earth should it be taking longer?
  • Online evaluation, on the other hand, often does take longer.
    You may have to settle for something on the scale of 24-hour timeframes. It’s OK — but make sure you can run multiple experiments in parallel within one 24-hour window.
  • Sometimes it helps to have a model created manually, without involving machine learning techniques, to have the baseline to compare against.

On Dashboard

  • Put a big screen in the office with the sole purpose of showing off how great are you doing numbers-wise.
    Be honest with yourself and don’t just hide it if the numbers don’t look good.
  • Dashboard is way more useful if it is using post-cooked big data.
    This way it serves as the first-order customer of logs processing logic and pipelines.
  • Key big data metrics should be on the dashboard, along with the basic usage numbers.
    The basic usage numbers may come from outside of the big data infrastructure, but most should go through it.

On Organic Data

  • Organic data is the data that captures the behavior of your users best, without pruning or filtering.
  • Be well aware of user behavioral patterns and the 80%/20% rule.
  • If some type of action accounts for the majority of user actions, it would account for the majority of user actions in organic data.
  • If some content accounts for the majority of content your users go through, organic data will have the same skew.
  • Evaluations using the organic data are the most valuable.
    Whenever possible, top-level metrics should be based on organic data sets.
  • Having said that, you will need to sub-sample the data for more accurate metrics on deeper levels. But make sure the top-level metric does improve as well. It will take more work and more time to get noticed, but one should be able to see the improvement.
  • Have a good idea of your headroom.
    It is not the absolute best value for the metric you have crafted: it is where can you get in short-, mid- and long term in a decent, yet realistic, scenario.
  • Understanding headroom requires manual work. The people doing the job of data scientists should get their hands dirty from time to time.
  • It also requires creativity, so make it as much fun and as little routine as possible.
  • In smaller teams a useful habit to have is to dedicate a day or half a day per month trying to pretend you are “the real users” to get a feel of how their lives differs from what you thought it is.
  • At times, you may want to involve more people, unaware of your current direction, whose only job would be to tell you how and where can you do better.

On Labeled Data

  • Labeled data is orders of magnitude more expensive.
  • Don’t hesitate to label some data yourself.
    Ten minutes of looking through some side-by-side results of old and new models is a good start of the day.
  • Reuse your labeled data as much as you can. Don’t invalidate it until you absolutely have to — or until it becomes obsolete on its own.
  • In particular, keep at least part of your labeled data excluded from training for validation purposes.
  • Looking at the value of your high-level metric on labeled data is OK.
    Manually examining concrete cases in labeled dataset instantly marks this set ineligible for further clean experiments.
    Don’t do it and don’t let your teammates to.
  • Rotating labeled data is a good habit.
    If you are willing to have 1'000 labeled examples per week, keep the ones from the most recent weeks “clean” and use the older ones for deep dives.

On Live Experiments

  • Live experiments help a lot. Unless you have a good reason, don’t hesitate to route 1% or 0.1% of traffic to experimental models.
  • In fact, in a lively product with big data team at work, multiple live experiments running non-stop is a healthy environment.
  • Sharding your traffic may be harder than you think.
  • For a stateless search engine you can afford to shard
    by query hash. But the world seems to be pretty much done with building stateless search engines.
  • Sharding should be designed in a way where splitting off 0.1% of traffic keeps both 0.1% and 99.9% parts organic.
  • For example, if you are building an app store and some app has significant share of traffic, sharding by app ID does not work since it denies you the opportunity to fan out 0.1% of it.
    Shard by user sessions or come up with something smarter.
  • For most applications it is perfectly fine for the same query from the same user to end up in different shards from time to time. The users would not hate you if sometimes the results they see get altered by a bit — while in return you will get valuable apples-to-apples comparison results to explore.
  • Once you have established live experiments infrastructure, back tests are a great way to confirm you are going in the right direction.
  • Tee-ing some traffic to test/canary machines is a good thing too, assuming your data stays immutable.

On Serving Infrastructure

  • Big data logic should run on dedicated machines.
  • At the very least this covers logs cooking jobs, modeling processes, evaluation pipelines and serving infrastructure.
  • Of all components, serving infrastructure is the first one you want to have dedicated environment for. Now.
  • REST APIs are your friend.
  • Have your results repeatable. Take it seriously. More seriously.
  • Two decent ways to ensure repeatability are: 1) put everything into source control (usually git) and make the configuration parametric on commit hash, 2) keep server code and models as VM images.
  • Spawning a new serving job, production-ready on a fresh machine or locally for testing, should be a matter or running one script.
  • Top-level logs cooking jobs should also be possible to spawn via one script.
  • Regression tests are shame to not have.
  • It only takes gathering some data, anonymizing if necessary, running it through your service, saving the results into a golden results file and diff-ing against that file later on.
  • A good regression test can also test live experiments and sharding logic.
  • A good regression test is also a load test.

On Data Infrastructure

  • Along with evaluation pipeline, data infrastructure is what enables you to run dozen iterations per week instead of barely one.
    At risk of repeating myself, with big data 10x iterations is what makes a difference.
  • “It’s not big data yet when it doesn’t fit into Excel!”
  • Early on you may well live with CSV/JSON/BIN files and Python/Ruby/Clojure scripts.
    There is no need to set up or maintain a Hive/Hadoop cluster until you hit terabytes scale.
  • Make sure you log all the data you need.
  • It goes beyond the user clicks on your website. Consider viewport, from where did this particular user land on your service, IP / proxy information, browser / OS / screen resolution, mouse hovering, actions involving external websites, co-installed applications on mobile — and much more.
  • Mobile is especially important: lots and lots of data is available once you have an actively used mobile app.
  • You never know where the next insight would come from — but, chances are, it will come from the data.
  • Log your own work along with user actions.
    Which live experiments were running and when, which user requests got routed to which experiments, labels you have obtained, by what means, which portions of data did you send out for labeling and for what reason — all these in-house things count as the data you must log and keep.
  • Log data cooking is usually harder than serving.
    And it is one the few pieces that falls in between the big data and the other part of the product.
    KISS is your friend. I’d totally bless something like “the server stores logs in certain directory in the cloud, the big data infrastructure parses those logs as they arrive”.
  • Normally, most features would be computed on data infrastructure side.
  • If this is your case, plugin structure works best.
  • Often times it is more efficient time-wise to first implement the logic that adds a new feature and keep it running for a few days. After the new feature is already stamped along with the existing ones, it is much easier to experiment with.
    Therefore, make sure new featurizers are easy to hook up — perhaps, automatically, when the code is checked in.

On Modeling

  • Make it enjoyable and comfortable to dig into your data — the world of modeling is where most of creative time is being spent.
  • The efficiency of modeling depends largely on how fast the iterations can be.
    Multiple full-stack iterations per day should be your goal.
  • If viable, make it possible to run modeling on a single machine.
    Running stuff locally is way faster and has less or zero external dependencies.

On Prototyping

  • Do whatever you want and have fun —as long as you are moving forward.
  • Try any idea you feel is worth trying — but aim at getting headroom estimate soon and don’t hesitate to drop the idea as soon as you believe there may be a lower hanging fruit.
  • Use any tools you feel like using.
  • Don’t hesitate to invest into building reusable tools for the future.
    At the same time, don’t hesitate to live on dirty hacks if it lets you run a reality check of some idea quickly.
  • Don’t bother if the implementation looks ugly. It’s one of very few places it’s allowed to.
  • However, once you have something you can demonstrate business value with, switch from prototyping to productization and clean up the mess before it hits the fan.

On Research

  • No matter how strong of a team you have, make sure to communicate to the outside world.
  • Sending data scientists to conferences asking them for trip reports in exchange is a practice that works well.
  • Dedicate some time to explorations that do not have immediate value.
  • For example, if your job is to do supervised learning and categorize your users into paying and not paying customers, find time to train a few unsupervised models and look at the clusters you get.
    A few insights coming from this may be well worth it very soon.
  • Give talks, open-source stuff that does not carry immediate business value, write blog posts about how amazing are your data challenges — make sure you establish presence in the community.
  • Interns are a great way to accomplish all or most of the above.

Bottom Line

The field of big data is different from other software engineering disciplines. The intuition and gut feeling you used to rely on may play a joke on you. And with data-driven projects it often takes more time to realize the wrong route was taken — and sometimes it may be too late.

Getting big data done right should become easier with twelve high-level concepts embodied above.

I have done plenty of machine learning and can say with confidence that changing a “no” into a “yes” for the questions above has been the right thing to do consistently — and would probably keep being the right thing for quite some time.

Next Story — Thanks Bridestory.
Currently Reading - Thanks Bridestory.

Thanks Bridestory.

I never imagine before that someday I will work for a biggest wedding marketplace in Indonesia which is Bridestory. What I kept in mind since I was on my first work (Icehouse Corp) was the will to kept staying there until I start my journey as graduate student. Then things happened and life changed very quickly.

Approximately 10 months ago, the Bridestory’s CTO approached and offered me an opportunity to learn something that I was looking for, so called Data Analytics. By that time, I’m still working on mobile app development. So no wonder if this offer came with a strict requirement: I had to help building the Bridestory User App and we work on data analytics afterwards. Oh, this man know how to make a trap :p Okay. Deal!

In the first 4 months, I had to fulfill my promise to help this company building its iOS app. I can say there were no significant troubles that we faced in developing this app. Everyday was just like a normal day. We talked about stories, requirements, code review, repository management and then we iterated back. Let me give credits to the team: Edi, Reta, Ibnu, Bobi, Amang, Viskaya.

Enough for iOS development.

While working on big data analytics works, I’m trusted to initialize the best data infrastructure that suit the company’s needs. I was also assigned a search engine problem that I have to solve on the same time. I bet on Spark, Hadoop, Kafka, Elasticsearch and its allies at the beginning to start creating data pipeline in Bridestory and solve this issue. What’s on my mind by that time was it’s going to be god d*mn scalable and it will help a lot improving the data analytics in the future when the bridestory system become bigger.

The best part of working in data analyics for me was the review that I’ve got for data pipeline that i’ve tried to build. We can say that Spark, Hadoop and friends are becoming the trends in big data analytics. However, at this scale, we think that we are not ready for this technology. Those techs were overfitting and we still dont know well how to control and maintain it. We need something simpler but still can help us in achieving the goal and harvesting more value from our data. At the moment, I think i was dumb because I was not able to see the company need from the beginning and provide a better solution for it and I guessed I will become a person to be blamed for the resource that I’ve wasted. Fortunately, my prediction was wrong. I’m really grateful to have a manager who let me failed, learned and saw another approach that might be work for all of us. After all of this, we currently are trying to create our data pipeline with simpler step and hope that it’s going to work properly :)

I’m writing this post in my last week in Bridestory where the only work left was initiating the recommendation engine based on syalalalaa requirements. I’ve left the data infrastructure design to the person who will be in charge. Hope he can take lessons and make it real (this is a must :p).

The biggest thanks is for the Mr. CTO, mas Doni. Thanks for being a manager who always critize my proposed idea and let me think about the blind spot that I couldn’t see. Thanks for advices for my future careers and for being a good brother! So sorry for all mistakes and everything that didnt work well as planned :p. Then, thanks to Aldi (the ex-microsoft, our former CIO) who introduced the concept of semantics layer to all of us and teach me how data analytics supposed to work. Thanks for all of engineering team whose name I cant mention all. You guys rock! You’ve made my day!

Last but not least, thanks for giving me those bunch of excellent experiences, dear Bridestory. Finally, I can resign peacefully (if you know what i mean :p), be 99% focus on my graduate school and spent a lot of time to explore the Silicon Valley. See ya!

God will lead you to an unexpected journey. So just be grateful and do the best thing!
Next Story — The Romantics of the Third Wave
Currently Reading - The Romantics of the Third Wave

The Romantics of the Third Wave

The painting above was painted by Eugene Delacroix, the leading member of the French Romantic school, to commemorate the Revolution of 1830 against King Charles X of France. The representation of the half-naked woman carrying the flag stands for the victory of the people and for liberty.

While this symbol is clearly an allegory, there are other subtle elements that distort the reality in a typical Romantic fashion. Did you notice that there are three corpses laying on the ground and yet there is little blood depicted? Overall, an idealistic representation of war, over emphasising the victory and downplaying the costs.

War was only one source of idealism throughout history. Romance was another. Love stories from movies, literature and music are far from being representative of the typical, real-world relationships. Dalai Lama considered this idealisation a fantasy, hence unattainable and a source of frustration.

The ideals of war and romance might seem outdated today. No boy wants to be a Romeo, as no girl wants to be a Juliet. However, in recent decades, there is a new topic subject to a lot of idealisation: computers.

We are witnessing a new wave of romantics. Ever heard someone in tech about changing the world? A common trait of a Romantic is disappointment with the current state of affairs and the hope for a better imagined alternative.

Making the world a better place through technology? Why never, or very rarely, a better place in some respects and worse in others? Whole-hearted endorsement or total rejection is another characteristic of Romanticism (partners should love everything about each other, philanthropists should draw no personal benefit from acts of charity, etc).

Both Romantic stories and successful computer companies share the leitmotif of the hero. If you don’t know who is leading Mercedes or Ikea, but know who’s the key figure behind Facebook and Apple, you’ve made the point stronger. Add to that starting in a garage or in a dorm room to build a company, and you’ve got the whole story, but please don’t confuse an entrepreneur with a Gavroche.

There is a sense of romanticism even lower down the ranks. The code programmers write is rarely labeled robust or resilient — more often it’s… beautiful. Badly written code is code smell. Automation and scaling, the cool stuff, often precede real usage. Engineers complain that something is not worth doing not because it is not potentially valuable, but because it is mundane and … not sexy. Paradoxically, pragmatism is undervalued.

Computer technology is especially romanticised because it is a high reward business. At the time of speaking, the top 4 companies by market cap are Apple, Alphabet (the mother company of Google), Microsoft and Amazon — all are computer-related businesses.

Before entering this market however, consider these three realities:

  1. Computer technology is a “Winner takes it all” field, with the first place getting all the reward, and the second getting almost nothing. What is the second social network you use? What is the market share of the second search engine compared to Google’s? As an online company, you need to have millions of users (at least) to matter.
  2. As opposed to a restaurant or a barbershop, which serve only a local audience in a contained geographical area, technology companies serve a global market. The competition a company faces is global. If you want to go to a theatre, you go to the one in your local city. This doesn’t happen in technology, where you pick the best option available worldwide. It’s like everyone going to the Globe in London instead of the local theatre — if this would happen, the local theatres will go out of business (or will work for the Globe).
  3. The low level of entrance allows almost anyone to compete. You don’t need to have the billions to start an oil company or the millions to start a retail store. A laptop and some servers suffice.

Technology startups taking over big corporations is an appealing story and is similar to David winning over Goliath. Due to the “winner takes it all”, a global market and low level of entrance, the chances of this to happen are incredibly slim.

Writing a letter to his brother, Delacroix admitted he did not take part in the war: “I may not have fought for my country — at least I shall have painted for her”. Delacroix did not have first-hand experience in war, so we should judge him only artistically. It is without doubt that he was an exceptional painter, as it is that technology has a positive net effect on society. The world of technology is pragmatic and dictated by rules — maybe the Romanticism of computer scientists and engineers is just our human response to that.

Next Story — The Seven Signs of Dysfunctional Engineering Teams
Currently Reading - The Seven Signs of Dysfunctional Engineering Teams

The Seven Signs of Dysfunctional Engineering Teams

This is an updated version of a post I wrote back in 2013 on

This post was inspired by a re-read of Heart of Darkness, which I think of not as a deep allegory but rather as the narrative retelling of a really bad job choice. There’s a passage in which Marlow requires rivets to repair a ship, but finds that none are available in spite of the fact that the camp he just left further upriver is drowning in them. That felt familiar. There’s also a passage that describes a French warship blindly firing its cannons into the jungles of Africa in hopes of hitting a native camp situated within. I’ve had that job as well.

There are several really good lists of common traits seen in well-functioning engineering organizations. For example, there’s Pamela Fox’s list of What to look for in a software engineering culture. More famous, but somewhat dated at this point, is Joel Spolsky’s Joel Test. I want to talk about signs of teams that you should avoid.

This list is also partially inspired by Ralph Peters’ Spotting the Losers: Seven Signs of Non-Competitive States. Of course, such a list is useless if you can’t apply it at the crucial point, when you’re interviewing. I’ve tried to include questions to ask and clues to look for that reveal dysfunction that is deeply baked into an engineering culture.

Preference for process over tools. As engineering teams grow, there are many approaches to coordinating people’s work. Most of them are some combination of process and tools. Git is a tool that enables multiple people to work on the same code base efficiently (most of the time). A team may also design a process around Git — avoiding the use of remote branches, only pushing code that’s ready to deploy to the master branch, or requiring people to use local branches for all of their development. Healthy teams generally try to address their scaling problems with tools, not additional process. Processes are hard to turn into habits, hard to teach to new team members, and often evolve too slowly to keep pace with changing circumstances. Ask your interviewers what their release cycle is like. Ask them how many standing meetings they attend. Look at the company’s job listings, are they hiring a scrum master?

Excessive deference to the leader or worse, founder. Does the group rely on one person to make all of the decisions? Are people afraid to change code the founder wrote? Has the company seen a lot of turnover among the engineering leader’s direct reports? Ask your interviewers how often the company’s coding conventions change. Ask them how much code in the code base has never been rewritten. Ask them what the process is for proposing a change to the technology stack. I have a friend who worked at a growing company where nobody was allowed to introduce coding conventions or libraries that the founding VP of Engineering didn’t understand, even though he hardly wrote any code any more.

Unwillingness to confront technical debt. Do you want to walk into a situation where the team struggles to make progress because they’re coding around all of the hacks they haven’t had time to address? Worse, does the team see you as the person who’s going to clean up all of the messes they’ve been leaving behind? You need to find out whether the team cares about building a sustainable code base. Ask the team how they manage their backlog of bugs. Ask them to tell you about something they’d love to automate if they had time. Is it something that any sensible person would have automated years ago? That’s a bad sign.

Not invented this week syndrome. We talk a lot about “not invented here” syndrome and how it affects the competitiveness of companies. I also worry about companies that lurch from one new technology to the next. Teams should make deliberate decisions about their stack, with an eye on the long term. More importantly, any such decisions should be made in a collaborative fashion, with both developer productivity and operability in mind. Finding out about this is easy. Everybody loves to talk about the latest thing they’re working with.

Disinterest in sustaining a Just Culture. What’s Just Culture? This post by my colleague John Allspaw on blameless post mortems describes it pretty well. Maybe you want to work at a company where people get fired on the spot for screwing up, or yelled at when things go wrong, but I don’t. How do you find out whether a company is like that? Ask about recent outages and gauge whether the person you ask is willing to talk about them openly. Do the people you talk to seem ashamed of their mistakes?

Monoculture. Diversity counts. Gender diversity is really important, but it’s not the only kind of diversity that matters. There’s ethnic diversity, there’s age diversity, and there’s simply the matter of people acting differently, or dressing differently. How homogenous is the group you’ve met? Do they all remind you of you? That’s almost certainly a serious danger sign. You may think it sounds like fun to work with a group of people who you’d happily have as roommates, but monocultures do a great job of masking other types of dysfunction.

Lack of a service-oriented mindset. The biggest professional mistakes I ever made were the result of failing to see that my job was ultimately to serve other people. I was obsessed with building what I thought was great software, and failed to see that what I should have been doing was paying attention to what other people needed from me in order to succeed in their jobs. You can almost never fail when you look for opportunities to be of service and avail yourself of them. Be on the lookout for companies where people get ahead by looking out for themselves. Don’t take those jobs.

Of course there are plenty of other ways that teams fail, but these seven are a good start. Often when I interview people, I’m surprised when they don’t have any good questions to ask me. Feel free to use some from the list above.

Next Story — A Data-Driven Approach to Improving our Customer-Professional Matching
Currently Reading - A Data-Driven Approach to Improving our Customer-Professional Matching

A Data-Driven Approach to Improving our Customer-Professional Matching

By: Ben Anderson & Xin Liu

When a customer posts a request on Thumbtack, we want to match them with the right professional for the job. When the marketplace was small, this was easy — just blast the request out to all of the pros in the request’s category and location. Today, with millions of requests a year and hundreds of thousands of active pros, we can’t rely on that simple algorithm anymore. The definition of “right” is no longer obvious — the pro and the customer each have their own preferences, and we need to balance how we benefit customers and pros to grow a healthy marketplace in the long run.

In this post, we’ll explore some of the early work we did to improve our marketplace by building systems to leverage our growing historical data. In particular, we’ll discuss our effort to model pro interests in order to send pros more compelling requests.

How we used to think about the model

In the old days of Thumbtack, we only matched on logistics: is this pro in the right category (wedding photography, house cleaning, etc.) and geographical area to serve this customer’s request? As we grew, we introduced a simple binary “limiting” system on the pro side — if a pro didn’t engage with our platform for a certain amount of time, we would limit them to a low fixed number of requests per week. This was much better than nothing, but had a long list of issues. Pros who went on too long of a vacation would suddenly find they weren’t getting requests. Pros in very active markets would get more requests than they could handle. Finally, when we limited requests, we weren’t sending the requests that the Pro was most interested in.

Last year, we started working on the first step to a more efficient marketplace, by improving our understanding of Pros’ interests. The goal of our new system was to encourage Pros who were not engaged with Thumbtack to come back, by showing them more relevant requests. We did this by building a model that used our historical data on Pro engagement to optimize for a Pro’s interest in a request. Specifically, we predicted the probability that a given Pro would quote on a given request if we sent it to them, P(quote|pro,request), and used that information to determine who to send the request to.


There are a variety of potential approaches we can use to model the attractiveness of customer requests to Professionals. We mainly considered collaborative filtering and class probability estimation. We chose logistic regression for its good interpretability, ease of implementation, and extensibility to future improvement. We aimed to predict whether or not a professional would quote on a given request by looking at historical Pro engagement data.


One way to measure how likely a Professional would be interested in a request is to check if the professional has engaged with similar requests before. For example, if a professional had a much higher quote rate (quotes/notifications) in House Cleaning around SoMa area (in San Francisco) than in other categories (e.g., Carpet Cleaning) and locations the Professional provides services for, we may consider this Professional to be interested in any future requests in House Cleaning and SoMa. In addition to category and location, we also considered the Professional’s past engagement in other dimensions such as request time, job size, etc.


Counting is fundamental in computing the engagement based features above. Intuitively, a quote from six months ago should not be counted as the same value of a quote three days ago. One straightforward approach is to have several versions of a feature based on different tracking time windows. However, there are several downsides to this approach: the number of features will increase quickly, we can only have a limited number of time windows (1 day, 1 week, or 1 month), and cannot track a time window in between (e.g., 2 weeks). To address the above issues, we use the decayed value over time to represent engagement counters. For example, If we set the half life of a quote to be one month, then a quote submitted last month will be counted as 0.5 quotes in today’s feature computation. Different half lives can be used for different types of features to capture various decay speed. For example, customer review ratings should have a longer half life (e.g., a few months.) than quotes. This approach has been effective and generic in counting a variety of metrics in our system.

Online Tuning

Now that we have a model for predicting P(quote|pro,request), we need to decide how to use this number to determine whether to notify a Pro of a request. The easy thing to do might be to simply rank by this score, and pick the pros who are most interested to notify. However, doing this would result in some bad outcomes for the marketplace. We ultimately want to notify Pros probabilistically so that they have some chance of being notified on every request. This allows for changing preferences and corrects for overfitting. We calculate the probability we notify a given pro of a request, P_{notify} using a combination of factors. Consider the following groups of pros:

  • New Pros: we don’t have historical engagement data for new Pros, but we want to notify them of requests to give them a chance to engage. We override the regular P_{notify} for these pros until we understand what they’re interested in.
  • Disengaged Pros: we know that Pros disengage with our platform for various reasons, and will often re-engage in the future. We want to always send a minimum number of weekly requests to a Pro to give them the option of re-engaging when they’re ready.

A/B Testing and Next Steps

We ran this new model against the old one as an A/B test and saw a significant increase in quotes/pro with a drop in notifications sent, resulting in a huge improvement in quotes/notification.
Of course, we’re not done here. Ultimately, our goal is to optimize the match for both the customer and the pro. We’re only beginning to build out our data driven systems. Join Thumbtack and help us build it.

Sign up to continue reading what matters most to you

Great stories deserve a great audience

Continue reading