A Brief History of Natural Language Processing — Part 1

Antoine Louis
4 min readJul 7, 2020

--

Natural language processing (NLP) is a theoretically motivated range of computational techniques for analyzing and representing naturally occurring texts at one or more levels of linguistic analysis (Liddy, 2001). The purpose of these techniques is to achieve human-like language processing for a range of tasks or applications. Although it has gained enormous interest in recent years, research in NLP has been going on for several decades dating back to the late 1940s. This review divides its history into two main periods: NLP before (part 1) and during (part 2) the deep learning era.

Part 1 — NLP before the Deep Learning Era

The big stages of NLP before the deep learning era.

It is generally agreed that Weaver’s memorandum (Shannon and Weaver, 1949) brought the idea of the first computer-based application related to natural language: machine translation (MT). It subsequently inspired many projects, notably the Georgetown experiment (Dostert, 1955), a joint project between IBM and Georgetown University that successfully demonstrated the machine translation of more than 60 Russian sentences into English. The researchers accomplished this feat by using hand-coded language rules, but the system failed to scale up to general translation. In fact, early work in MT was very simple: most systems used dictionary-lookup of appropriate words for translation and reordered the words after translation to fit the word-order rules of the target language. This obviously produced very poor results, as the lexical ambiguity inherent in natural language was not taken into account. The researchers then progressively realized that the task was a lot harder than anticipated, and they needed a more adequate theory of language. It took until 1957 to introduce the idea of generative grammar (Chomsky, 1957), a rule based system of syntactic structures that brought insight into how mainstream linguistics could help machine translation.

Due to the development of the syntactic theory of language and parsing algorithms, the 1950s were flooded with over-enthusiasm. People believed that fully automatic high quality translation systems would be able to produce results indistinguishable from those of human translators, and that such systems would be in operation within a few years. Given the then-available linguistic knowledge and computer systems, this thought was completely unrealistic. In 1966, after more than a decade of research and millions of dollars spent, machine translations were still more expensive than manual human translations, and there were no computers that came anywhere near being able to carry on a basic conversation. That year, the ALPAC released a report (Pierce et al., 1966) that concluded that MT was not immediately achievable and recommended the research community to stop funding it. This had the effect of substantially slowing down not only MT research, but also most work in other applications of NLP.

Despite this significant slowdown, some interesting developments were born during the years following the ALPAC report, both in theoretical issues and in construction of prototype systems. Theoretical work in the late 1960s and early 1970s mainly focused on how to represent meaning. Researchers developed new theories of grammar that were computationally tractable for the first time, particularly after the introduction of transformational generative grammars (Chomsky, 1965), which were criticised for being too syntactically oriented and not lending themselves easily to computational implementation. As a result, many new theories appeared to explain syntactic anomalies and provide semantic representations, such as case grammar (Fillmore, 1968), semantic networks (Collins et al., 1969), augmented transition networks (Woods, 1970), and conceptual dependency theory (Schank, 1972). Alongside theoretical development, this period of time also saw the birth of many interesting prototype systems. ELIZA (Weizenbaum, 1966) was built to replicate the conversation between a psychologist and a patient, simply by permuting or echoing the user input. SHRDLU (Winograd, 1971) was a simulated robot that used natural language to query and manipulate objects inside a very simple virtual micro-world consisting of a num- ber of color blocks and pyramids. LUNAR (Woods et al., 1972) was developed as an interface system to a database that consisted of information about lunar rock samples using augmented transition network. Lastly, PARRY (Colby, 1974) attempted to simulate a person with paranoid schizophrenia based on concepts, conceptualizations, and beliefs.

The 1970s brought new ideas into NLP, such as building conceptual ontologies which structured real-world information into computer-understandable data. Examples are MARGIE (Schank and Abelson, 1975), TaleSpin (Meehan, 1976), QUALM (Lehnert, 1977), SAM (Cullingford, 1978), PAM (Schank and Wilensky, 1978) and Politics (Carbonell, 1979).

In the 1980s, many significant problems in NLP were addressed using symbolic approaches (Charniak, 1983; Dyer, 1983; Riesbeck and Martin, 1986; Grosz et al., 1987; Hirst, 1987), i.e., complex hard-coded rules and grammars to parse language. Practically, text was segmented into meaningless tokens (words and punctuation). Representations were then manually created by assigning meanings to these tokens and their mutual relationships through well-understood knowledge representation schemes and associated algorithms. Those representations were eventually used to perform deep analysis of linguistic phenomena.

It wasn’t until the late 1980s and early 1990s that statistical models came as a revolution in NLP (Bahl et al., 1989; Brill et al., 1990; Chitrao and Grishman, 1990; Brown et al., 1991), replacing most natural language processing systems based on complex sets of hand-written rules. This progress was the result of both the steady increase of computational power, and the shift to machine learning algorithms. While some of the earliest-used machine learning algorithms, such as decision trees (Tanaka, 1994; Allmuallim et al., 1994), produced systems similar in performance to the old school hand-written rules, statistical models broke through the complexity barrier of hand-coded rules by creating them through automatic learning, which led research to increasingly focus on these models. At the time, these statistical models were capable of making soft, probabilistic decisions.

--

--