Open Sourcing Duckling, our probabilistic (date) parser

The Team
Oct 1, 2014 · 2 min read

We’ve previously discussed ambiguity in natural language. What’s really fascinating is that even the simplest, seemingly most structured parts of natural language, like the way we humans describe dates and times, are actually so difficult to turn into structured data.

The wild world of temporal expressions in human language

All the following expressions describe the same point in time (at least in some contexts):

  • “December 30th, at 3 in the afternoon”
  • “The day before New Year’s Eve at 3pm”
  • “At 1500 three weeks from now”
  • “The last Tuesday of December at 3pm”

But wait… is it really equivalent to say 3pm and 1500? In the latter case, it seems that speaker meant to be more precise. Is it OK to drop this information?

And what about “next Tuesday”? If today is Monday, is that tomorrow or in 8 days? When I say “last month”, is it the last full month or the last 30 days?

A last example: “one month” looks like a well defined duration. That is, until you try to normalize durations in seconds, and you realize different months have anywhere between 28 and 31 days! Even “one day” is difficult. Yes, a day can last between 23 and 25 hours, because of daylight savings. Oh, and did I mention that at midnight at the end of 1927 in Shanghai, the clocks went back 5 minutes and 52 seconds? So “1927–12–31 23:54:08” actually happened twice there.

There are hundreds of hard things like these, and the more you dig into this, believe me, the more you’ll encounter. But that’s out of the scope of this post.


At, the built-in entity that’s the most used by the community of developers is wit/datetime. So we had to work on this problem. From our past experiences with NLP, we knew that a fully rule-based approach was a recipe for disaster. Unfortunately (or not), humans are very bad at following strict (syntactic) rules. On the other hand, temporal expressions are quite regular and hierarchical compared to other sides of language. A fully machine-learned approach like we have in other parts of seemed difficult. So we started to design a hybrid system, based on both rules and examples: Duckling.

Open sourcing Duckling

Today, we are both happy and eager to share our approach with the community by open sourcing Duckling. Duckling is far from perfect, but we think it may help a few developers with similar problems. Meanwhile, as we wrote earlier, natural language is such a hard problem — we need to join forces!

We’ve been using Duckling in production for one year now, and while it’s still a very early-stage library, it parses hundreds of thousands of weird temporal expressions in five languages with a lot of success.

You can try out Duckling and read the documentation here. The code is available on Github.


Moving forward, we’ll continue to open source more and more of Please give us lots of feeback about Duckling, and of course, contribute!

Team Wit

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium