Scription Text Format

Daniel W. Hieber
Digital Linguistics
3 min readMar 23, 2019

The canonical way that linguists represent linguistic data in their publications is with an interlinear gloss. This is typically a 3- or 4-line format that shows a phrase in the language of interest, the words and morphemes inside the phrase, what each of those morphemes means, and its overall translation. Here is a short example of an interlinear gloss for a phrase in a language called Chitimacha:

Wetkx hus naancaakamankx weyt hi hokmiqi.          (Transcription)
wetkx hus naancaaka-mank-x weyt hi hok-mi-qi (Morpheme Breakdown)
then his brother-PL-TOP he there leave-PL-3sg (Glosses)
‘Then he left his brothers there.’ (Translation)

Unfortunately, these interlinear glossed examples are difficult to produce! Linguists using Microsoft Word spend inordinate amounts of time formatting examples and aligning words and glosses within them. The software program Fieldworks Language Explorer (FLEx) makes the glossing process fairly quick, but this software is built for Windows only, and it is difficult to export that data to other useable data formats, or copy-paste examples into a text editor.

Wouldn’t it be nice if we could write our interlinear examples in a plain and simple text format like above, and rely on software to parse those lines and convert them to different data formats for us? In other words, can we write software that knows how to “read” an interlinear gloss?

Enter the Scription text format. The Scription project is a set of guidelines for how linguists can write plain text interlinear glosses in a way that software can parse them. Though they seem complex, interlinear glosses follow a fairly standardized set of conventions. The Scription text format simply makes these conventions explicit and enforces consistency. Programmers can then write software that can read any correctly-formatted Scription text.

Read the complete Scription guidelines here.

Here is an example Scription text file for the first 3 sentences of a story in Chitimacha (isolate; Louisiana):

---
title: How the Indian came
abbreviation: A1
---
wetkx hus naancaaa-mank-x wetk hi hok-m-iqi
then he brother-PL-TOP DEM DIST leave-PLACT-3SG
Then he left his brothers.
kun cuu-g-x cuu-g-x xeeni-nk hup hi nicw-iqi
some go-PTCP-TOP go-PTCP-TOP pond-LOC to DIST go.to.water-3SG
He went and went till he came to the edge of a pond.
wetkx we xeeni-nk hi nicw-i-nki-x wey-k hi kixut-iqi
then DET pond-LOC DIST go.to.water-3SG-TEMP-COND DEM-NOM DIST swim-3SG
When he got to the edge of the pond, he swam it.

The format is straightforward: a Scription text can start with a header containing some metadata about the text (between the triple dashes ---), followed by sets of utterances. By default, if an utterance has 3 lines, the format assumes that those three lines are the morpheme breakdown, the glosses, and the translation.

Here is another version of that same text, also in a valid Scription format:

---
title: How the Indian came
abbreviation: A1
---
\schema
\txn
\tln
wetkš hus na·nča·kamankš wetk hi hokmiʔi
He left his brothers.
kun ču·gš ču·gš še·nink hup hi ničwiʔi
He went and went till he came to the edge of a pond.
wetkš we še·nink hi ničwinkiš weyk hi kišutiʔi
When he got to the edge of the pond, he swam it.

In this version, each utterance only has 2 lines: a transcription and a translation. To tell software what each line in an utterance represents, we include a \schema at the beginning of the text. The above schema says that the first line of an utterance should be read as a transcription (\txn), and the second line of an utterance should be read as the translation (\tln). Using a schema, you can structure your interlinear examples in any format you’d like, with as many or few lines as you’d like.

As you can see, both of these formats are easy to type by hand, and don’t require any special software other than a text editor. So you might want to consider starting to use the Scription format in your own data workflow!

Read the complete Scription guidelines here.

The Scription format is an open-source project, and contributions or suggestions for changes to the format are welcome! If you’d like to contribute code or suggest a change, simply create a new issue in the project repository.

Acknowledgments: This project, and the term “Scription”, were inspired by Patrick J. Hall (University of California, Santa Barbara). Pat prototyped various versions of a simple linguistic text editor using a Scription-like format. This project has attempted to standardize that early vision.

--

--