How We Ran a Successful Failed Experiment

Jim Shields
YipitData Engineering
8 min readAug 16, 2019
Photo by Kevin Ku from Pexels

Experiments At YipitData

At YipitData, we take our cultural values seriously: we talk about them (a lot), we put them into action, and we praise each other for exemplifying them (often in company-wide emails).

One of those values is experimentation. Across all of our teams, we often run experiments to test new ways of doing things and to find areas to invest more time and people. We have various examples of successful experiments, where we validated a hypothesis and doubled down on the outcome. I’m going to talk about one that failed and what we learned from it.

Problems We Wanted To Solve

YipitData’s core products rely on public web data collected by web scraping systems. The biggest challenge with web scraping is that we don’t control the websites, and they change often. Our engineering team has focused on making those web scrapers as simple, efficient, scalable, and easy to fix as possible in order to keep the reliability of the scrapers, and ultimately the data, as high as possible.

An important part of web scraping is extracting the desired data from a page, usually from HTML or JSON, a process we call parsing. Typically, the system owner writes custom Python code to parse data using XPaths or other selector code, combine the data into a Python data structure, and store the data in a database. This parsing code is usually the code that changes most frequently because the structure of the page and its data change often.

For some time, the engineering team was responsible for building and maintaining these scrapers. At the same time, our product team was responsible for designing system methodologies and for writing a spec for each page — specifically, which data points to collect.

This led to a disconnect between the teams that is likely typical in other product organizations: the people with the most context about the product (the product team) weren’t empowered to own or change the product. Often, this manifested itself in a back and forth between the product owner and the engineer about what data to collect, what to name the fields and tables, and how to optimize our scraping.

Ultimately, we wanted to solve a few problems:

  • Reduce the back-and-forth between product owner and engineer
  • Get rid of repetitive code
  • Decrease time to fix broken parsing

Reduce back-and-forth

Other than the upfront work of building the system, parsing was the primary back-and-forth between the product owner and the engineer, and often required several iterations to get right.

Get rid of repetitive code

Additionally, parsing required lots of repetitive code: code to extract data from HTML, code to collect the data in a dictionary, and code to define the table (usually copied from the exact same names as the dictionary). For engineers, this led to a lot of repetitive code and frustrating mistakes.

Decrease time to fix broken parsing

Finally, we wanted to decrease the time to fix broken parsing. Monitoring was lacking, we didn’t have great tools for testing parsing logic, and once parsing logic was fixed, it typically took at least 20 minutes to get the code working in production. This is valuable time for web scrapers — every second missed is a potentially stale or missing data point, which could affect the accuracy of our product and the happiness of our users. We hoped that by standardizing the parsing of our systems, we could build robust monitoring on top of it to reduce the time to fix broken parsing.

Our Solution

Given those problems, we started by building a small Python library, yparser, to do the parsing. Along with yparser, we created a standard format for our parsers: a JSON with specific fields. It looked something like:

yparser would take this JSON specification and do all of the parsing for you.

We quickly realized it would be annoying for engineers, and eventually other users, to write these on their own. Additionally, the workflow for testing these against real web pages was painful and slow. At this point, we decided to work on a web app, Parse Central, to create and test these parsers.

Here’s an example of a parser on Parse Central (which evolved, as you see here, to have lots of options, like “text mode”, “custom” fields, and the coolest, autofilling the whole parser from a JSON):

And the parsed results — the output of the parser — which updated interactively as you added parsing “rules”:

Parse Central was designed to:

  • Create and save parsers
  • Enable interactive testing of parsers against web pages
  • Allow the deployment of parsers without touching any code (via yparser)
  • Share parsers between analysts (who typically decided the field names and / or parsing logic) and engineers (who integrated the parsers into their web scraping systems)

With a lot of iteration over months, it eventually solved those problems effectively, and had many users on the engineering team and a few users on the product team.

At this point, it might sound like this project was successful! But, at about this time, external changes forced us to think about ending this experiment.

Ending The Experiment

Two external changes forced us to rethink this solution.

First, we started to invest in an internal scraping platform, Readypipe, to address some of the above problems, and many others. A few of the learnings from this experiment informed our work on Readypipe:

  • The longer it takes to get scraping code from development to production, the harder it is to iterate
  • Writing code to define tables is tedious and unnecessary
  • The back-and-forth between engineer and product owner does indeed slow down product development

In Readypipe, we decided to enable coding in JupyterLab (a great open source project that allows web-based code editing, Jupyter notebooks, terminals, and more), and to reduce the deploy process to the click of a button. Immediately, this solved the “time to fix a broken parser” problem, since the system owner could go to the notebook, change the broken code, and click “Deploy” to have their new code running in a matter of 1 or 2 minutes, far quicker than before.

In addition, we eliminated the need for table schemas. Defining table schemas was a major source of code duplication and frustration that our parsing tools aimed to solve, but it was solved instead by Readypipe.

With these features, Readypipe also reduced the back-and-forth between the product team and engineers. Another major decision reduced it even more: we decided to give the product team full ownership over their scraping systems. This decision made the Readypipe platform a much easier and more powerful tool for changing parsing logic — everything could be done in one place by the person who owns the project.

Because of those decisions, we ultimately decided to deprecate yparser and Parse Central. But it was a successful experiment, in that we learned a lot and ended up using what we learned to solve the same problems in a simpler way.

What We Learned

We learned a few things we could do in the future to make experiments faster and easier:

  • Set a timeframe for measuring success
  • Make decisions reversible
  • Validate assumptions without building too much
  • Name tools clearly

Set a timeframe for measuring success

We ran this experiment over the course of about a year, with engineers, like myself, working on it part-time. For some experiments, that could be a good timeframe for measuring success. However, I think we could have set checkpoints for ourselves, maybe every month or quarter, to decide whether it was successful and we should continue, or whether to stop working on it.

Make decisions reversible

When we decided to end the parsing experiment, we had a problem: we’d changed the code of lots of our projects to use the new tools. I hadn’t considered the idea that we’d have to roll back.

Luckily, a few other team members (shout out to Angely Philip and James Farner!) came up with and implemented a simple solution: write a script to convert the code from the new tools (which required an extra library) to Python-native code (which didn’t) that’s readable by anyone.

In our case, we didn’t waste too much time unwinding the experiment, but I think we could’ve saved some time and stress by spending an hour or two in the beginning to think about how we might. I think it would’ve also made adoption quicker — anecdotally, I’m much more likely to make a change that’s easy to reverse.

Validate without building too much

We started this experiment by building yparser, and quickly followed by building Parse Central. In retrospect, we may have found success more quickly by listing the hypotheses we wanted to test, and testing them without building too much.

For example: we could have worked with one pair of product owner and engineer to refactor their parsing code for one scraper to fit the same interface we’d build out with yparser and Parse Central, without actually building either. From there, we could let the product owner write the simpler parsing code, and interview the pair to check whether this reduced the overall friction and time spent.

In our experience, building something has a larger cost than building nothing, and we likely could’ve validated and refined our approach without building anything (to start).

Name tools clearly

The names of the tools, yparser and Parse Central, were unhelpfully generic (and I take full responsibility!) A major lesson I learned from this experiment is that clear, memorable naming could have significantly reduced friction in user adoption. I’d often have to re-explain what each tool did, a good indication that the names were unclear.

Conclusion

While we diverged from our original approach, I think this experiment was very successful. Aside from the above techniques we learned, the project team learned a lot about library development, product management, and backend and frontend web development (including some modern frameworks like React and Redux).

Ultimately, we’ve taken those skills, and the lessons learned, to other projects (and other companies, which is super exciting!). Even “failed” experiments can be impactful and drive innovation and personal growth.

Acknowledgements

Thanks to Hugo Lopes Tavares (https://twitter.com/hltbra) for his thorough and thoughtful review and suggestions to make this much better.

--

--