Say Hello to ADIOS

William L. Weaver
TL;DR Innovation
Published in
4 min readFeb 24, 2018

Unsupervised Learning of Linguistic Structures

“I brang it to school yesterday” is the most recent conversation starter with our fourth-grader. Like many parents, I would like to have the phrase “Use brought; ‘brang’ isn’t a word” emblazoned on a shirt so I can simply point to it several times a day. My correction is always followed by the retort “But it’s ‘ring-rang-rung’ and bring sounds like ring.” Explaining ‘there are just about as many exceptions as there are rules’ can make teaching a language almost as difficult as learning it.

Image by strecosa on Pixabay

Programming computers is often equated with teaching children. Once a specific rule or task is learned or programmed, it is performed exactly as it was taught, but frequently applied as the solution to the wrong problem. With more education and experience, children learn when to use the appropriate solution by following the examples of their teachers or peers. As they continue to mature and develop, they also synthesize novel solutions using knowledge of the foundational rules. Traditional computer applications typically do not evolve with experience or learn new rules; however, that is beginning to change.

Researchers at Cornell and Tel Aviv Universities have been tackling the task of infusing the rules of grammar into a computer. Rather than creating a massive database of rules and exceptions for the computer to follow, Cornell professor Shimon Edelman and his colleagues have developed an unsupervised algorithm that scans text containing properly-phrased sentences and infers the underlying rules of grammar. The algorithm, named Automatic Distillation of Structure (ADIOS) finds complex patterns and phrases by repeatedly aligning sentences and discovering portions that overlap.

Professor Edelman describes the following example: The sentences, “I would like to book a first-class flight to Chicago,” “I want to book a first-class flight to Boston” and “Book a first-class flight for me, please” may give rise to the pattern “book a first-class flight” — if this candidate pattern passes the novel statistical significance test that is the core of the algorithm. If the system also finds the sentences, “I need to book a direct flight from New York to Tel Aviv” and “I would like to book an economy flight,” it may infer that the phrases “first-class,” “direct” and “economy” are equivalent in the context of the new pattern.

ADIOS is based on two components: (1) a Representational Data Structure (RDS) graph, and (2) a Pattern Acquisition (PA) algorithm that learns the RDS graph by detecting patterns in the text in an unsupervised manner that does not require the classification of phrases into a predetermined taxonomy. In the initial phase, the text is segmented into the smallest possible components (modifiers such as -ed for past tense and -ing for present tense are removed through a statistical analysis of the text to discover root words that appear in multiple tenses). Since these modifiers are not pre-programmed, the algorithm works on any written language, as diverse as English and Chinese, individual characters, and phonemes if applied to speech. The initial set of unique components forms the vertices of the preliminary RDS graph. When adjacent components are found in the text, a directed edge is inserted between the vertices to indicate the transition from one word to the next.

In the second phase, the PA algorithm recursively scans the body of text for “significant patterns” that consist of a sequence of graph edges (words that follow each other to form common phrases, such as “book a” in the example), an equivalence class of vertices (first-class, direct and economy), and a terminating sequence of graph edges (flight). At each level of recursion, the PA updates its collection of equivalence classes to form more and more complex patterns that ultimately represent a hierarchy for the text. The discovered relationships between the distilled patterns can be viewed as a classification tree. For English text, the leaves of the tree represent parts of speech such as nouns, verbs, adjectives, and adverbs, while the order of the branches indicate patterns of their use, as in Subject — Conjunction — Subject — Verb — Article — Adjective — Predicate. The algorithm can ultimately synthesize new, grammatically-correct sentences by selecting leaves in branch order to generate a sentence such as “George and Pam have a fast car,” following the example pattern described.

ADIOS was applied to a collection of phrases selected from the Child Language Exchange System (CHILDES) consisting of 9665 transcribed sentences (containing 74,500 words) produced by parents addressing their pre-school children. The algorithm discovered 317 patterns and 404 equivalence classes and is currently being used to investigate how toddlers learn their native languages. ADIOS has been tested on the full text of the Bible in several languages and on artificial context-free languages with thousands of rules. In addition to text analysis, it is being extended to other pattern-rich data including musical notation, computer vision and biological data. When used in protein analysis, ADIOS distilled common amino acid sequence patterns that were correlated with the functional properties of the proteins. Perhaps the ADIOS algorithm will help to uncover all kinds of human-made and natural patterns; but for now, “brang” isn’t a word because Dad says so.

This material originally appeared as a Contributed Editorial in Scientific Computing 22:12 November 2005, pg. 12.

William L. Weaver is an Associate Professor in the Department of Integrated Science, Business, and Technology at La Salle University in Philadelphia, PA USA. He holds a B.S. Degree with Double Majors in Chemistry and Physics and earned his Ph.D. in Analytical Chemistry with expertise in Ultrafast LASER Spectroscopy. He teaches, writes, and speaks on the application of Systems Thinking to the development of New Products and Innovation.

--

--

William L. Weaver
TL;DR Innovation

Explorer. Scouting the Adjacent Possible. Associate Professor of Integrated Science, Business, and Technology La Salle University, Philadelphia, PA, USA