Open Sourcing our new Duckling
Wit relies both on machine learning and rules-based systems. One of these rule-based systems is Duckling, our open-sourced probabilistic parser to detect entities like dates and times, numbers, and durations. Due to our extensive growth, Duckling was not scaling as fast as we were. After considering several options, we decided to go with a rewrite in Haskell.
We were looking for a performant language with a strong support within Facebook. Two natural candidates emerged: C++ and Haskell.
A strength of Duckling is the simple domain-specific language (DSL) we provide to write rules. It allows us to get amazing open-source contributions in languages that we don’t speak. Haskell is notoriously one of the best choices out there to provide a type-safe DSL, whereas C++ would have required ninja coding skills (which we don’t have) to do the same without segfaults!
Haskell is highly scalable (it powers Facebook’s Sigma service, serving millions of requests per second) and offers type safety. Haskell is a purely functional and very expressive language. This makes it the perfect choice for Duckling!
Our main purpose in rewriting Duckling was scalability. We have been working a lot on improving the core engine algorithms as well as the time module internals, which is by far our most complex component. There is still plenty of room for improvement, but Duckling is more efficient than ever. As an example, Duckling is now used at scale internally by Facebook.
Due to Haskell’s strong type system, we had to get rid of our free-form Clojure maps. The end result is a more structured and readable codebase.
We have started housing language-agnostic rules under a common umbrella. As a result, each new language gets a basic support for amounts of money, emails, phone numbers and URLs for free.
In order to get us up and running quickly on Haskell, we are deprecating the Clojure library today and we will commit those resources into making the Haskell version word class. The new Duckling is up-to-date with the latest Clojure code.
Because we want to make sure the transition is smooth, we are providing a standalone HTTP server to get Duckling results as JSON.
Since we’ve open-sourced Duckling two years ago, our Wit community has contributed in many languages and driven real world usage, enhancing Duckling’s capabilities. Thank you to all our contributors who help make Duckling more robust and polyglot for everyone!
Duckling is now available on GitHub.
You can start diving into the new codebase today. To give you a head start:
# Clone the repository
$ git clone https://github.com/facebook/duckling.git && cd duckling
# Install Stack
$ curl -sSL https://get.haskellstack.org/ | sh
# Compile and run the example
$ stack build && stack exec duckling-example-exe
# Run the tests
$ stack test
We are looking forward to see how you will use it, help extend the rulesets to make Duckling more robust, and teach it new languages.