Natural Language Processing (NLP): Foundations of Linguistics. Layers of Language

eHealth First
4 min readDec 27, 2017

--

By Eray Ozkural, Director of Artificial Intelligence, Machine Learning and High-performance computing, eHealth First Project.

The core technologies and areas of eHealth First Project include Natural Language Processing, Artificial Intelligence and Machine Learning.

Particularly, all they will be applyed to analyze biomedical publications, reveal the context, build and test useful models of biomedical information and apply the latters to the individual health indicators of an EHF Personal Health Application’s user. Moreover, these technologies will be core components of the second Application — EHF Biomed — for health professionals and researchers.

In this and upcoming articles on Medium we’ll show, what ways and ideas our team will use in order to build EHF Platform.

The problem of processing and translating language was one of the first hard problems that motivated initial research into Artificial Intelligence. The subject of Natural Language Processing lies at the intersection of Computer Science, and Linguistics; and it borrows heavily from methods in both fields. We present a brief summary of the main problems and approaches in NLP in this chapter. For further study, we recommend a standard reference in computer science (Jurafsky and Martin 2009).

The logical, symbolical inquiry into the mechanisms underlying language was initiated by analytic philosophers beginning with Frege — who invented predicate calculus, and later Russell who identified the fundamental logical and conceptual problems in language (Griffin 2003). These early studies form the foundation of the propositional (or denotational) theory of meaning wherein logical propositions are derived from linguistic expressions. Hence, when we say that “The king of France is bald.”, we will have assumed that there is a (reference to a) person who happens to be the king of France and at the same time, that person is bald, which may be cast into first-order logic.

The later philosophical works of David Lewis introduced the powerful theory of possible world semantics, which allows a statement to be analyzed within any possible context (world), resolving references adequately. The method is so versatile that it has been used to analyze the meaning of even programming languages.

Other philosophers of language introduced significant deviations from the standard logical treatment. Davidson pursued a logical theory of meaning substituting meaning with truth, and later based his philosophical program on Quine’s idea of radical translation which is a behavioral and holistic account of utterances, eloquently affixing language in context.

Chomsky on the other hand posited an innate linguistic capability, which is called the Universal Grammar hypothesis in Linguistics, and an internal language (often called I-language) to account for the human linguistic capability. Chomsky also introduced a systematic, mathematical treatment of syntax formalizing first generative grammars (Chomsky 1957), and later transformational generative grammars (Chomsky 1965), a feat which established Linguistics as a rigorous mathematical discipline, improving upon previous theories of syntax.

Wittgenstein, on the other hand, extended Frege/Russell approach on his own and introduced two theories, one is the picture theory of meaning which suggests that language is a logical mirror of the physical world (Wittgenstein 1922), and in his latter work suggested that meaning is use which puts pragmatics first and is similar to Davidson’s theory (Wittgenstein 1953). The philosophical inquiries helped form a theoretical basis for language processing as they translated language from the domain of literature to the logical formalisms that are amenable to computational methods.

Layers of study in Linguistics (not exhaustive)

Linguistics is a vast scientific field with many objects of study; every human utterance and inscription constitutes evidence, and every discourse, and every component of language concerns the linguist. The linguist is concerned with the natural means of human communication, and the field is usually decomposed into layers of study which are phonology, morphology, lexicology, syntax, semantics and pragmatics. Phonology studies the acoustic regularities in our speech. Morphology studies the forms in our words. Syntax is the way words are arranged in sequence. Semantics is the study of meaning, and pragmatics analyzes the use of language, and discourse.

Although languages may be analyzed separately, the over-arching goal of Linguistics is to unify the theoretical treatment of all human languages; the commonalities among languages drove the initial interest in the matter, field studies and taxonomies of language were the principle subject of early linguists. The curious diversity of human languages, yet the elegance of its unifying principles and its fundamental role in cognition, makes Linguistics at once the most mathematical social science and the most social of mathematical sciences. Let us take a closer look at this bedazzling variety of levels and taxonomy of human language — See upcoming article.

Visit www.ehfirst.io for the details of EHF Project!

--

--

eHealth First

An IT-platform for Personalized Health and Longevity Management