Speaking of Us: Documenting the World’s Languages with Dr. Anthony Woodbury

This transcribed interview is part of Speaking of Us: A podcast and blog series by Wikitongues exploring the world’s linguistic diversity and how languages expand humanity’s notion of Us.

Raanan J Robertson

Published in

Wikitongues

22 min readMay 25, 2020

This interview was conducted by Raanan Robertson and transcribed by Nicole Bennett-Fite.

In 2018, Wikitongues spoke with Dr. Anthony Woodbury from the University of Texas at Austin about his personal background in documentary linguistics. We wanted to ask big picture questions: what are current methodologies in transcription, translation, and documentation? How have these changed over the years? How do linguists grapple with the fact that many languages are dying or going dormant at a pace faster than the current rates of documentation can keep up? How has technology shaped this effort?

Dr. Woodbury is uniquely positioned to answer these questions. He began teaching at UT Austin in the early 1980s, having received his Ph.D. from the University of California at Berkeley in Linguistics. His work over the decades has focused on Indigenous languages of the Americas such as Chatino and Yupik-Inuit-Aleut, yielding a broader understanding of linguistic diversity. His writings range from syntax and morphology to the delightfully fascinating field of ethnopoetics, whereby transcribers record oral narratives not in prose, but using poetic tools such as lines and stanzas to capture the subtleties and cadences of utterances. An attention to the aesthetic quality of language is one of Woodbury’s hallmarks, and a topic we had the chance to ask him about.

Perhaps of greatest importance, Dr. Woodbury shared with us a profound shift he has witnessed in the locus of linguistics, from its rather narrow mission in the early decades of discovering linguistic universals to a more nuanced view of what it means to value the diversity of the world’s languages.

You can learn more about Dr. Woodbury’s personal biography at this link on the international online language community Language List.

Raanan: You were elected Vice President and then President for the Society of the Study of Indigenous Languages of the Americas (SSILA) in 2004 and 2005. Could you tell us how you got that position? And a little bit about the organization?

Anthony: I have no idea how I got it [laughs] — I was basically told “you’ve been elected.” I have now been a member for a long time. SSILA is a professional organization for people who study the Indigenous languages of the Americas. It was established in the early ’80s and continues to this day.

Raanan: What are some methodologies that you’ve used to document languages in your work?

Anthony: If you want to be really minimalistic about documentation, what you want to do is to create preservable records of how a language is spoken that will be interpretable 500 years from now even if that language is no longer around. So the first step in doing that, if you think about things like Rosetta Stone, it’s good to have translations to another language that may be more known 500 years from now. Deciding on a major language like English, and then translating everything into English is a pretty good step. If you make video-tape let’s say, you can put English subtitles on it, and you would have the original language and you’d also have a translation.

Usually linguists go further than that. They also want to have a transcription of the original language form. So that entails figuring out what is the actual sound system of the language. Furthermore, it’s often very hard to really pick up on what’s being said if you’re not a native speaker. So there’s quite a lot of work that goes into coming up with a system of writing that adequately represents the sounds of the language and then also to implement it so that you’re accurately transcribing what actually was said in the course of the video tape or other medium. And that gives you a second layer of what you can call annotation. So the first layer of annotation is translation, the second is transcription.

Translation is never a one and done process. Translation is really a process of interpretation and explanation and not just a process of rendering an entity, let’s say a paragraph, in one language into a comparable paragraph in another language. This is what you might call the Google translate theory of translation.

Translation is really much more about interpretation. So having both types of annotation on whatever you record is really the best way to go in order to preserve it for the future. For whoever might have whatever use for it. Then you can get into translation at lots of different levels. You can translate the gist of everything, sentence by sentence. Or you can go to each word and translate what it means, or even go to its internal morphological structure. With each different layer you are getting deeper and deeper into the material and showing more clearly what it is.

And so, if you have a language documentation project that has ambition to that level, you are going to be needing to do a lot more analysis of the grammar in addition to the sound system. You need an analysis of the parts of speech, the different ways a noun would be declined or a verb conjugated.

In one sense you could look at the grammar of a language, which for us includes the analysis of the sound system, as being a sort of helper to the process of documentation, or a kind of general go to guide.

But linguists are actually very fixated on those products — the grammar, the dictionary — as products that are in a sense a representation of the essential facts about a language. For our work we are interested in comparing the grammars and dictionaries of different languages for various purposes. There is a difference in perspectives among linguists who are oriented towards documentation and description, versus those who are organized towards general principles of language. For those of us working on documentation, we’re usually working from the actual uses of the language.

Whereas linguists who look at general principles might say, I’m interested in sound systems and how sound systems are different from one language to another. A historical linguist might say, I’m interested in looking at the vocabularies of a bunch of languages that are related, in order to determine exactly how they are related to each other. What historical events and processes must have led to their current state of differentiation. Those are different perspectives that people might take.

Going back to methods, actually what I described leaves out one very key set of methods and those methods are based on what you could call introspection, so based not just on what people actually do and how you might transcribe that, but also what people know about the language they speak. In general we’d say that there’s a level of methods that gets at people’s intuition about language.

So if you’re a native speaker of a language you might be able to not only speak about what’s grammatical, but also what’s not grammatical, and give negative examples as well as positive examples, in the process of coming up with a general linguistic description. In order to know if a theory is correct you have to follow it out to the end and you have to say, if the theory about how the sound system works predicts that you would pronounce a word in a certain way, is that true? As the native speaker you might say no, and immediately catch yourself and say, ‘I can’t say that.’ So a part of the method for you, if you are a native speaker linguist, is testing things against what you know. And if you’re not a native speaker then you keep coming up with ideas and bouncing it off of native speakers and see how they respond.

Raanan: It seems the process of providing the negative is an important component that a lot of people don’t really think about. Dictionaries and word lists tend to be more positive-oriented, explaining what a word means semantically, what it’s related to. But you don’t often get the tug and pull of where the boundaries are with what ways it can’t be used.

Anthony: Right. Another general set of methods involves trying to push things. So if you are just sort of passive and say well what type of talk is going on in the community and from that, what kinds of talk do people want to record, that’s going to narrow the exploration of the language. There may be areas that you just don’t happen to run into. So there may be areas of vocabulary and you are just never around people or situations where they are using that vocabulary.

Let’s say there’s a lot of vocabulary for different botanical taxa, and you are never out in nature, or around the stuff that would be referred to, you are never going to get that. So one of the things you want to do, is be as comprehensive as you can in pushing people into using the language in as many facets as you can. That leads to an approach that you could call experimental in the sense that you say alright let’s figure out how we refer to different kinds of breaking and cutting, let’s say.

So you get a bunch of videotapes showing all kinds of different acts of breaking and cutting and you ask people to describe what the person is doing in the video. And you end up eliciting these things with a kind of control, because you might use those same videos for languages around the world, so you get to compare how different scenarios of breaking and cutting get subdivided in the vocabularies of different languages.

Most languages are full of shortcuts. They don’t take everything that is logically distinct from everything else and come up with a different word for it. It would be impossible. But not every language has the same set of shortcuts. They might differ a lot from one language to another. You have to figure out what’s going on.

To take another example, your body from your shoulder joint on down is divided up in different ways in different languages. In English we basically take the hand, and the forearm and the upper arm. Other languages might take the forearm and hand together as a single entity but separate out the fingers. Another might look at the upper arm as being a part of the torso. So there’s a lot of different variants on that.

Raanan: That’s really fascinating — your point about going in with some ideas in mind, going in with a flashlight versus a floodlight: you are searching for specific things but if you don’t take the chance to step back and listen to conversations as they happen naturally or broaden the scope, you don’t get to see some of the subtleties and distinctions that you might have seen otherwise.

Anthony: Right. And if you listen to the conversation and you compare it to a person telling a story, there’s some things that are going to be in evidence in one but not the other.

For example, there are a lot of stories that don’t have any direct quotation of characters. It is unlikely you are going to find out how to make imperative sentences in that language if that’s what you’re working with, because typically imperative sentences come up in interaction, or when it is presented in direct quotation in a story.

Raanan: That’s a great point and leads me to another question. There’s a lot of time and work that linguists need to spend to capture all of these elements of language. Do you see that as an obstacle, because you could say there’s a race against the clock, in terms of language diversity and dying languages around the world? Are there new methodologies that are trying to approach that obstacle which is that some languages are dying? How do you capture them in a faster process than could be done in say a Ph.D. spent on one particular area?

Anthony: As a general thing it’s a very, very, very long process. Sometimes that’s because certain steps are really, really hard. So figuring out how the sound system works, if you have a language that has a very complex sound system or tones, there’s time in trying to come to grips with really what’s going on in terms of the architecture of the language.

There’s also a question of the scale of vocabularies, how varied they are, how they might not be all in the hands of the same person. Take Oxford Dictionary, you would never think one person would know all of those words, despite there being a common core that everyone knows. So there is a huge territory to cover.

There’s a lot of hidden structure in language. So much so that a grammar can actually be a work of over a thousand pages even on the first bat. So that’s a lot to cover and then you have these dictionaries that are huge, so that’s a big thing.

And the question you might ask, does the ability to computationally manipulate the data that you collect, does that really help? And it might not help as much as you think.

One of the trade-offs is that to the extent that you are automatically manipulating data it’s not going through your fingers and becoming committed to your memory, so you lose some of the agility that people have working within their own knowledge and memory, when you make things automatic. But on the other hand when you make things automatic you have some advantages too.

Let me give you a very particular example of technology affecting the workflow. Sometimes people talk about what they call the transcription bottleneck. Which is the immense amount of time that’s required for doing transcription as I described. Typically for many languages, there’s about a 1 minute to 1 hour ratio. So for one minute of recorded speech it takes you an hour to transcribe it. Even if you think about it, you have a solid hour of recorded English and you’re supposed to sit down with a headset and you have the best automatic transcription equipment — let’s just say everything is automated with respect to that — it’s still going to take you a long time. Maybe you could get 10 minutes in an hour.

As you know one of the things we can do, not always super well, but you can dictate into your phone and it will transcribe it. You might be embarrassed if it does it terribly, but it can be done. But to what extent can we actually create automatic transcription for random languages? And how much do we have to actually know the language before we can do that? Can we rely on a universal phonetic capacity of a transcription system, in lieu of actually knowing the language?

These are things that people are experimenting with. Another question is, alright let’s say you still kind of know the language, how can you build in what you do know about the language to narrow down the possibilities of the transcription?

This process of transcribing, sometimes identifying, not just sound-sound-sound, but identifying word-word-word, and then identifying what that particular word means from a dictionary. Then you can overcome the transcription bottleneck and the immense amount of time and cost in doing transcription. An amazing amount of headway has been made even in the last five years. So that’s something that addresses the question that you raised.

Raanan: Right, it’s not an insurmountable task, but certainly a long process. I wanted to step back towards the human component and see how you approach the problem of building trust in the community when you begin to document? This speaks to the length of time involved in documentation that has nothing to do with technology, but more to do with human relationships.

Anthony: This is such a wide question. Part of the question goes to, where is the initial motivation for documentation?

You have to look at communities and individuals in communities and their ideas about the situation of their language, how they feel about it. Whether they want to continue speaking the language, whether they want to get other people to speak the language, whether they want to write down the language.

There are all these questions that a person might have or not have in regards to their language, and in regards to their observation of very sudden or gradual shifts going on in their community, and the language that they associate traditionally with their community falling out of use.

So that’s one component, then there is this: from a narrow scientific perspective of linguists, we say okay there’s 7,000 languages out there, we can find information about half of them, and even in that case the information isn’t at all that detailed most of the time. So how are we going to find out about more languages? That puts you in a position of saying alright, here’s a language, you know about it and nobody has really described it, so let’s describe it.

Then the linguist, who is presumably not a speaker of that language, says alright, how are we going to develop some sort of cooperative relationship in order to describe it? So the person might simply proselytize what they want to do, or they might try to find out where people are in the community. What kinds of ideas do they have about doing language description? Or maybe they have no idea about the sorts of things that linguists typically do. Or whether they would be at all interested incorporating that in their own agendas for their languages. So that’s kind of one sort of scientific outsider scenario that you could look at.

Linguists have been slow, although there’s quite a bit of exceptionality to this, to realize that in order to accomplish even their most narrow goals, it would be a great idea to train as linguists people with the most linguistically diverse backgrounds.

If everybody who’s trained as a linguist is a native speaker of English, Spanish, Russian, or Japanese, you aren’t going to get very far, particularly as we recognize the special advantages of describing your own language.

Now that itself is a big discussion, and I think most people think about it carefully enough to realize that there are advantages both for the outsider and the insider in describing a language. Most of us trained as linguists actually have experience of both. Sometimes I’ve worked on English, and sometimes I’ve worked on languages I’ve learned, sometimes I’ve worked on languages I haven’t learned. Sometimes I come to insight more readily through my knowledge of a language and sometimes I come to insight by taking a distant approach to it. So I think the goals of linguistics are best met if people, both speakers and non speakers become involved in the process of documenting and describing languages. So the question is, how can the existence of linguistics from various different language backgrounds be brought into the process of developing ideas about language documentation, and eventually out of that project it can go forward.

Raanan: Nice, the simple recognition of the goal from the beginning and what different people hope to get out of it, and having that kind of self awareness, especially if you are a speaker of a particular language, especially in these days if it’s Indo-European, that you may have some kind of bias going into how you might document.

Anthony: Right, and you know every agenda is just that, it’s an agenda, a set of goals and priorities and values about working with a language and to a linguist it seems absolutely bizarre that you would not want to write down a language. You’d feel that there’s no analysis going on.

But at a really basic level, I think to my claim that the most elementary documentation is one that involves let’s say a video with subtitles in some language of wider communication. That’s the essential core, and from there you build a lot of different things. In one direction you build ideas about what’s the best way to speak and what’s not so good. On the other hand you might work on representing the way people do speak, so there’s lots of things you could be looking at.

And then even linguists differ quite a bit. There’s an old school type of historical linguist who’s going to be really interested in getting a dictionary with words and maybe some analysis of the words, but they really aren’t going to care about the syntax so much. Because there’s not really a very definitive kind of reconstruction of syntax that establishes the relationship between languages. It’s much more productive to do that based on words, morphologies, and sounds. So there’s going to be differences among linguists, let alone people who don’t have linguistic goals.

Raanan: That brings us to my next question, you are about to teach your freshman course on dying languages, yes?

Anthony: Right, it’s called, “Dying Languages: What World Linguistic Diversity Means For Us.” And I know that’s sort of a narcissistic view. The idea is to really explore the meaning of linguistic diversity not just for us but for anybody. Whether you’re a speaker of a given language, whether you’re not, and then looking at its value scientifically as well as humanistically, so basically what does it do for our knowledge of the world, our appreciation for what people do? That’s in a nutshell how I defined the science versus the humanities side of it.

Raanan: I would have loved to take that as a freshman.

Anthony: It wasn’t in existence I don’t think. I believe I first taught it in 2011.

Raanan: So you are teaching students, some of whom may go on to be linguists, some of whom might not really think much about it anymore, but nevertheless might end up affecting communities that speak endangered languages either directly or indirectly. What are some of the takeaways that you hope people will come away with at the end of the course?

Anthony: Every once in a while somebody comes along who goes on to other things, there was a guy who was a freshman in the course then went with Patty Epps to the Amazon that summer and ended up spending a year abroad in Australia with some very well known language documenters. So he definitely got a good start from that. On the other hand, I think the purpose is actually to create a general literacy about language. And nowadays since so many people coming into UT have interesting linguistic backgrounds, they are very often multilingual, and often not born in the US and came to English during their childhood at some point. So they have interests and questions about language in general that this course, like other linguistics courses, is likely to open up.

Raanan: What are some of the things you’ve found most interesting in the languages you’ve studied or documented? Thematically, syntactically, culturally, or all of the above?

Anthony: There’s lots of stuff that’s very interesting about any language.

We [linguists] would be the first to admit you can write a grammar of a language and then you can figure out what goes on the back cover of the book by saying what’s so absolutely cool in a nutshell about that language. There’s usually something or another that’s kind of unusual and interesting about a language.

Now you could say that ‘unusual’ is defined by it being different from the most commonly used languages, so that it’s kind of a biased idea, and again we’d be the first to admit that. But if you can find something that is just rare among languages then that has a very general and unbiased way of being something interesting.

Just to give you one thing in Yupik Eskimo, a language I worked on a lot. It has gigantic, huge, long words, but not only that, the gigantic, huge, long words are built in a very uniform way.

If you look at English words, we build the words with a number of different techniques. For example, I can say ‘speak’ and ‘spoke’ and those words differ simply by changing the vowel, I can make it past tense. I can say, from the past tense to the participle, ‘spoke’ to ‘spoken’, I’m putting a suffix at the end. If I say ‘I’m speaking’ I can say ‘that bespeaks the answer’, and I’m getting a totally different meaning from putting a prefix on it. So you can do all these different things to a word in English and get new words.

Whereas in Yupik, and in other Yupik-Inuit languages, Eskimo-Aleut languages, you can only put a suffix on and after you’ve done that you can put another suffix on it and other one and so on, to build up rather elaborate meanings by just adding suffix upon suffix. The question then, is how does this work?

Are words in languages like this the same as they are in languages like English? Pretty much for English we probably memorize each word that we have because we can’t rely on the fact that the parts of the words are completely moving parts lets say. Taking an adjective and making it negative in English you put on a prefix un-: friendly, unfriendly; useful goes to unuseful. But the more you think about it, you realize that doesn’t work all the time. Eligible becomes ineligible, and its not uncorrect, it’s incorrect. And so in English you could plausibly say that learning the language and grammar involves memorizing a whole bunch of words.

In Yupik that just can’t be the case because the suffixes are so completely productive. For example the only way you can say the word ‘to be’ is by putting a suffix that means ‘be’ onto the end of the noun. So ‘be a car’, ‘be a book’, ‘be a bicycle’, all of those take the noun and put the ‘be’ on it. So you’d no longer be able to say that you memorize every word in this language because you’d be in the millions before you knew it. Because the suffixes are so productive in that sense.

This means that the fundamental build of the grammar is just different, and it builds words in the way we build sentences. Because nobody would say that you memorize all of the sentences in English, that would be nuts. That’s an example of the kind of thing that’s different and intriguing.

I’ll give you another example, in the language group in Mexico that I’m working with called Chatino [one variety], the words are little short mono-syllables just like in Chinese [Mandarin]. And — not in every variety of Chatino, but in this one — every word has a tone, just like in Chinese. In Chinese there are only four tones, but in this variety of Chatino, there are 15 tones. And not only that but the tones are more specified in their function in Chatino than they are in Chinese.

In Chinese what do the tones really do? They might make a distinction between ma with a high pitched level tone, meaning ‘mother’; ma with a low-to-high pitched rising tone meaning ‘hemp’, ma with a low pitched level tone meaning ‘horse’, and ma with a high-to-low pitched falling tone meaning ‘scold’ — and all of those might be different words, just randomly different tones with different meaning. In Chatino there is that, but also differences in the tone can give you differences in the grammar. So for example, if I say ntyku with a low pitched level tone, it means ‘I eat’ or ‘every day I eat.’ But if I said, ntyku with a mid-to-high pitched rising tone, it would mean ‘right now I’m eating.’ You can see there are ways the tone system is actually creating different new words through the grammar.

Raanan: That’s incredibly fascinating. It’s amazing we have so many ways of speaking, and hence so many ways of thinking about the world around us.

Anthony: Yes, and you know it’s possible to think about it even just aesthetically. There’s a different feel when things are being done via tone than when they are being done piling up suffixes. So the poetic possibilities, just in terms of sound texture if nothing else, are going to be different. And then you start to look at differences like categories that are obligatorily specified versus not specified.

Consider Spanish vs. English. In Spanish, whenever you decide to refer to a singular second person you have to decide whether you’re on a formal or an informal basis with them to decide whether to say ‘usted’ or ‘tu’. In English we don’t have to bother with that. These kinds of juxtapositions are really common, when one language is worrying about something and another language is not.

It can be pretty complicated. The crude takeaway that you might think of is never really the case. “Oh English speakers have no concept of deference or informality.” That’s of course nonsense. But on the other hand you do have to say that Spanish puts you on the spot in ways that English does not. You can go a long way being noncommittal in how you refer to someone as long as you don’t have to refer to them as Mr. or Ms. in English.

There’s so many different category choices that come up in one language and not another, which suggests that the organization of your thoughts is going to be built around certain requirements of the language and that might have an effect on your thinking. Again it doesn’t have an effect at the crude level of, oh we just can’t think certain things in certain languages. You can always think pretty much anything. But it does affect the way that the process of ‘thinking for speaking’ takes place, to use a term that is often invoked for that.

Raanan: Is there anything else you’d like to add?

Anthony: I have been a linguist since I started doing linguistics in college in 1972. I’ve seen a really strong growth of awareness about the question of linguistic diversity. What does linguistic diversity mean, not just what kinds of principles of universality can we find, but also what about the value of diversity in itself? And this shift from a narrow focus on universality to a more holistic focus on the ways in which languages vary as well as ways in which they don’t vary. This has been an interesting trajectory and also the inclusion of the different perspectives that come up in different communities where languages are living, thriving, maybe not thriving, is a huge development in this field and something I’ve been able to witness.

Raanan: That adds a positive spin to the subject. When you look at language disappearance rates and the trajectory we’ve been on, it’s very encouraging to hear that the ways at least that we’re thinking about it and approaching it have evolved and hopefully will continue to evolve.

Anthony: Yes, and entities like Wikitongues are a new thing and that’s amazing and great.

Raanan: Thanks so much for taking the time!

Anthony: Absolutely, thanks for talking with me!

Follow Speaking of Us on Spotify, Apple Podcasts, Google Play, or Stitcher. If you would like to donate to support the work of Wikitongues or if you would like to get to know our work, please visit wikitongues.org. To watch our oral histories, subscribe to our YouTube channel or visit wikitongues.org to submit a video.

Speaking of Us: Documenting the World’s Languages with Dr. Anthony Woodbury

This transcribed interview is part of Speaking of Us: A podcast and blog series by Wikitongues exploring the world’s linguistic diversity and how languages expand humanity’s notion of Us.

Written by Raanan J Robertson