A Personal Retrospective on Prophet

Sean J. Taylor
7 min readJun 4, 2023

February 2017 was a pretty exciting time for me. Ben Letham and I were on the cusp of open sourcing Prophet, now a popular forecasting library but at the time just a library for internal use at Facebook. We had developed the method for a specific forecasting use-case, and then gotten a few additional wins helping internal teams improve the accuracy of their growth forecasts.

The Prophet open source launch was so successful I was taken by surprise, but with hindsight we had a lot of things going for us:

  • We were coming from a big and prestigious company.
  • We had a beautiful and convincing website.
  • Our tool was available in R and Python, and easy to install and use.
  • We used the Stan library, which had a devoted following.
  • We invested in strong documentation and examples.
  • Perhaps most importantly, there were few alternatives available at the time.

We were invited to give many talks about Prophet and a lot of people now associate me quite closely with the project. Being a co-creator of a popular open source project has had a number of unseen consequences, and I think it’s instructive to share this more personal part of the journey and some lessons I’ve learned.

Unearned credibility

Prophet is an approach to forecasting born out of pragmatism — at the time we had found few off-the-shelf methods that were easy to use and accounted for the complexities we faced. It just wasn’t a rigorous research project like a forecasting researcher might complete. We validated the method on a few use cases at Facebook and then deemed it worth sharing. Some people now include me in a group of expert time-series forecasters, when a more accurate portrayal would be that I spent about a year working on a specific forecasting problem, had a great collaborator in Ben, and was pretty good at writing usable software. I think anyone who’s an actual forecasting expert could justifiably feel we got a little more credit that we deserved for the project. I regret not engaging with the forecasting community earlier in the project and trying to understand how the method fit within existing approaches. A more thorough research project might have helped us create a more broadly useful technique, rather than something tailored to the use cases we were tackling.

Contribution guilt

After open sourcing Prophet I couldn’t keep up with the community in helping to maintain the project. Part of it was prioritization — I am usually working on a variety of things and I viewed the project as “finished” in the sense that I had already made the improvements I wanted for the teams at Facebook. Ben took on a lot of maintenance work, and other folks stepped in and helped do the dirty work to make sure people could continue to painlessly install and use the software. I’ll never be able to fully transfer the credit I get over to them, but I wish I could. I’m always going to feel like I abandoned our users by not participating more fully in the ongoing improvements and maintenance.

The tool creator’s dilemma

I did not face some difficult choice about whether to share Prophet, it felt obviously useful and relatively low cost to open it up. But I now feel some misgivings about it on a recurring basis — it’s hard for me to conclude it was unambiguously positive to share it publicly. Many folks would have been worse off if Prophet were not open sourced (I’ve heard many success stories!), and the competition that it helped foster in the forecasting software space has been very beneficial for practitioners. But there are many plausible negative effects as well, as people mis-apply Prophet to problems and overly trust the resulting forecasts. The central problem is that the method isn’t as great or general as some people believe it to be. I sum this up here:

I do actually feel a bit of shame about the project at this point — of course I wish it worked better! Many people go out and find obvious problems in Prophet forecasts, or they build better methods and demonstrate that it is a worse method. A large number of the citations on the Forecasting at scale paper are showing it underperforming other approaches. I see no flaws with these studies, Prophet is not a reasonable model in many settings. I’m glad people are motivated to create better approaches and to do the evaluations needed to show they are superior. I wish I could definitively communicate that I never intended for Prophet to be the best and that I believe there are better options in many cases.

One reason I haven’t continued to work on forecasting is because there are better researchers to work on it and better approaches available to build on. But I struggle with how to “unspread” an already widely proliferated tool — I guess this post may help a bit, but ultimately it’s challenging to communicate the nuanced view that people should consider a variety of methods besides Prophet and carefully choose among them based on their requirements.

Sharp edges

A running joke on Twitter is that Prophet was the cause of Zillow’s 2021 stock price collapse (some folks didn’t really get the joke). This bit is very funny to me because it takes the “overly trusting your tools” pathology to the logical extreme: what if you bet billions of dollars on the accuracy of models you didn’t understand and didn’t evaluate properly? Yeah that would be bad!

Every tool can be used in ways that result in mistakes, but with models the mistakes can maybe feel more surprising. To paraphrase some wisdom I’ve heard attributed to Andrej Karpathy:

You can usually anticipate and enumerate most of the ways your model will fail to work in advance. Yet the problems you’ll encounter in practice are usually exactly one of those things you knew to watch out for, but failed to.

For forecasting that common failure is to have not conducted a thorough evaluation, which has a variety of challenges (scaling computation, specifying a scalar objective you care about, extrapolation problems, etc).

In my experience, fitting models and making predictions is fun and you can easily find some “good news.” But evaluating models you’ve fit is very much not fun, and more often leads to bad news (your new model is not better).

So the mistake I will own up to is throwing one more tool into the ring (giving people a good news machine) without giving people the information or encouragement they need to evaluate it properly (giving them a bad news machine). If I could launch Prophet again, the homepage would be an explorable dashboard of model performance on a variety of tasks and including a number of common baselines, just like Erik Bernhardsson provides for Annoy. People who build comparison tools like this are unsung heroes, dramatically improving the signal-to-noise ratio for practitioners choosing what to do.

A spectrum of tools

Tools lie on a spectrum between being so simple they can’t reasonably be blamed for mistakes (if you smash your thumb with a hammer, that’s on you) to so complex they can cause mistakes in a variety of ways (an LLM practicing law). So how much paternalism should we be applying to managing the risk of complex tools? Was part of the Prophet problem that it make forecasting too accessible?

Tool builders usually want to offer a better product than existing ones, so it should be more powerful/complex. They also want it to be intuitive and usable, so it attracts a variety of users and use cases. So it sort of manufactures a situation where people potentially less equipped to detect mistakes are given the ability to make a broader range of them than they could have before.

My current perspective is that the underlying problem isn’t from tools being too inherently mistake-prone or from attracting less discerning users. I think we introduce mistakes because it is too challenging, annoying, or cumbersome to validate our tools and compare them to alternatives. So the right question isn’t about whether specific tools should exist or not, but rather how we build trust in them and know what we should use in a given situation. The tool paternalism is misguided and most of that energy should be focused on creating the culture and incentives for checking our work.

Wrapping up

Now that I’m working on an analytics product at Motif, and I’m wondering what I can apply from my Prophet experience to making it more successful. I’d summarize as:

  1. Consulting with existing experts: I had underrated how much there was to be learned from existing forecasting approaches and I think Prophet would have been a more durable contribution if I had asked for more help and advice in the early phases.
  2. Be in it for the long-haul: Commit the requisite time to responding to feedback and engaging with the community. Successful tools will have long lives and you need to be able to support all your users, make them feel heard, and ensure they have the resources they need.
  3. Invite and encourage comparisons: There are usually existing ways people are approaching the tasks your tool addresses, and it will help them to understand how your approach differs and how it may lead to different results in specific situations. It behooves you to be upfront about where your tool may be worse than alternatives.
  4. Ease of installation and quality documentation work: It’s clear to me that lowering friction for people using and understanding your tools is successful at getting people to try your work.

--

--