Introducing Dactyl (preview)

A linguistic fingerprinting API to empower writing resources

Andrew Brown
Indent Labs
4 min readJan 3, 2016

--

At the heart of innovation, one driving force remains constant: data.

Meet Dactyl. He’s pretty darn smart.

Data tells us what works. Data tells us what doesn’t work. Every metric, every measurement, every validation of every thought is backed by some datum somewhere, and collecting that data is becoming increasingly important as technology advances and the tools to do something with that data are finally beginning to appear.

Writing, however, is a very subjective thing. It’s notoriously hard to programmatically look at a passage and say, “yeah, that’s some good writing!Instead, writing has always been one of those things where you know it when you see it. It’s easy to collect data on whether a stock did well or not, because you know what “doing well” means financially; it’s not so easy to collect data on whether a reader gets lost in a story, relates to the characters, or even comprehends the vocabulary. Yet.

But there are things you can measure in writing, and they’re not always easy to measure by hand. Of course, you can measure the simple things like your word count or your average words per sentence, but you can also measure a lot of other cool stuff based on the simple stuff, like the grade level a passage can be expected to first be understood at, the emotional sentiment behind some text, or even just how clear something is to read.

Depending on your preferred readability index, scores will differ slightly. Most indices, however, report a suggested grade level for Dr. Seuss’s Green Eggs and Ham between first and fourth grade.

In computational linguistics, all these metrics commonly feed into something typically called a linguistic fingerprint. Rather than describing the content of a passage, these fingerprints — or dactylograms, if you will — instead tell a story of how that passage was written, at a structural and quantitative level that can be measured, compared, and related to.

Dactyl creates these dactylograms for you.

Linguistic fingerprints are commonly employed during natural language processing tasks like authorship detection or plagiarism and spam detection. While useful, these fingerprints are typically nontrivial to calculate algorithmically, and designing one requires a fair amount of domain experience spread across writing, statistics, and programming.

A little data on Edgar Allan Poe’s Tell-Tale Heart

To support the writing community and empower the next generation of writing companion software, a preview of Dactyl’s dactylogram API is now available to provide linguistic fingerprinting data for any text, completely free of charge, on demand.

Simply visit www.dactyl.in, paste some text in, and check it out for yourself.

Statistics on this Medium post — in this Medium post!

The API is JSON, open source, and getting faster every day.

Of course, more metrics will become available in Dactyl as they are developed. On the horizon are measurements for dialogue analysis, jargon detection, passive/active voice, and overused words and phrases. What Dactyl is capable of now is just the start of something greater.

For example, right now he’s entirely capable of confirming Half-Life 3’s existence; who knows what else he’ll be able to do tomorrow!

As Dactyl is in preview, there are a few things to keep in mind. Most importantly, it’s still in active development; if something is unclear or could be improved, I want to hear about it! Other than that, there are still several areas to optimize speed-wise, so large pieces of prose are not quite ready for prime-time and things may run a little slow as more people use the site. Right now Dactyl works great with short stories and essays, but novellas and novels are going to have to wait until a full release is ready. Soon!

Your nouns, adjectives, verbs, conjunctions, prepositions, and other parts of speech are all listed in Dactyl.

With this open API available, I’m excited to see what resources begin to pop up in the writing space. If you build something awesome, feel free to shoot an email over to apps@indentlabs.com and I’ll add you to our “Powered by Indent Labs” list; or if you have any questions, comments, or concerns, feel free to reach out any time to support@indentlabs.com or me personally, at andrew@indentlabs.com.

Thanks for reading — now get writing!

What is Indent Labs?

Indent Labs is, at its core, an ambitious open-source natural language processing project aimed squarely at moonshots in the field of writing. Wouldn’t it be awesome if you could generate quality stories from outlines, or automatically outline a story? What about generating a story as you make decisions on behalf of a character?

The first word processor showed up in the 60s and revolutionized writing through technology. Isn’t it time for another shift forward?

--

--