Lexicon: gain insights into the history, culture, and evolution of a language.

Mohamad Mahmood
Lexiconia
Published in
9 min readJul 7, 2024

Understanding the lexicon of a language is important for a variety of purposes, such as language learning, effective communication, literary analysis, and the development of natural language processing technologies. By studying the lexicon, we can gain insights into the history, culture, and evolution of a language.

  • The mental lexicon is a complex cognitive system representing information about the words/concepts that one knows. Over decades psychological experiments have shown that conceptual associations across multiple, interactive cognitive levels can greatly influence word acquisition, storage, and processing. How can semantic, phonological, syntactic, and other types of conceptual associations be mapped within a coherent mathematical framework to study how the mental lexicon works? Here we review cognitive multilayer networks as a promising quantitative and interpretative framework for investigating the mental lexicon. Cognitive multilayer networks can map multiple types of information at once, thus capturing how different layers of associations might co-exist within the mental lexicon and influence cognitive processing. This review starts with a gentle introduction to the structure and formalism of multilayer networks. We then discuss quantitative mechanisms of psychological phenomena that could not be observed in single-layer networks and were only unveiled by combining multiple layers of the lexicon: (i) multiplex viability highlights language kernels and facilitative effects of knowledge processing in healthy and clinical populations; (ii) multilayer community detection enables contextual meaning reconstruction depending on psycholinguistic features; (iii) layer analysis can mediate latent interactions of mediation, suppression, and facilitation for lexical access. By outlining novel quantitative perspectives where multilayer networks can shed light on cognitive knowledge representations, including in next-generation brain/mind models, we discuss key limitations and promising directions for cutting-edge future research.[2024]
  • Morphological processing in visual word recognition has been extensively studied in a few languages, but other languages with interesting morphological systems have received little attention. Here, we examined Malay, an Austronesian language that is agglutinative. Agglutinative languages typically have a large number of morphemes per word. Our primary aim was to facilitate research on morphological processing in Malay by augmenting the Malay Lexicon Project (a database containing lexical information for almost 10,000 words) to include a breakdown of the words into morphemes as well as morphological properties for those morphemes. A secondary goal was to determine which morphological variables influence Malay word recognition. We collected lexical decision data for Malay words that had one prefix and one suffix, and first examined the predictive power of 15 morphological and four lexical variables on response times (RT). Of these variables, two lexical and three morphological variables emerged as strong predictors of RT. In GAMM models, we found a facilitatory effect of root family size, and inhibitory effects of prefix length and prefix percentage of more frequent words (PFMF) on RT. Next, we explored the interactions between overall word frequency and several of these predictors. Of particular interest, there was a significant word frequency by root family size interaction in which the effect of root family size is stronger for low-frequency words. We hope that this initial work on morphological processing in Malay inspires further research in this and other understudied languages, with the goal of developing a universal theory of morphological processing. [2022]
  • Using a megastudy approach, we developed a database of lexical variables and lexical decision reaction times and accuracy rates for more than 25,000 traditional Chinese two-character compound words. Each word was responded to by about 33 native Cantonese speakers in Hong Kong. This resource provides a valuable adjunct to influential mega-databases, such as the Chinese single-character, English, French, and Dutch Lexicon Projects. Three analyses were conducted to illustrate the potential uses of the database. First, we compared the proportion of variance in lexical decision performance accounted for by six word frequency measures and established that the best predictor was Cai and Brysbaert’s (PLoS One, 5, e10729, 2010) contextual diversity subtitle frequency. Second, we ran virtual replications of three previously published lexical decision experiments and found convergence between the original experiments and the present megastudy. Finally, we conducted item-level regression analyses to examine the effects of theoretically important lexical variables in our normative data. This is the first publicly available large-scale repository of behavioral responses pertaining to Chinese two-character compound word processing, which should be of substantial interest to psychologists, linguists, and other researchers. [2016]
  • Semantic networks are often used to represent the meaning of a word in the mental lexicon. To construct a large-scale network for this lexicon, text corpora provide a convenient and rich resource. In this chapter the network properties of a text-based approach are evaluated and compared with a more direct way of assessing the mental content of the lexicon through word associations. This comparison indicates that both approaches highlight different properties specific to linguistic and mental representations. Both types of network are qualitatively different in terms of their global network structure and the content of the network communities. Moreover, behavioral data from relatedness judgments show that language networks do not capture these judgments as well as mental networks. [2016]
  • One of the traditional access routes to the emotions consists in studying the lexicon of emotional terms, that is, words with which the different forms of emotive and affective experiences are identified, isolated, and distinguished in the various human languages. Analyzing the way people consider the meanings of these words and particularly how people organize the relationship between each word with each other, we can discover the semantic structure of emotional lexicon: a sort of guideline by which our knowledge about emotional experience is structured and organized in our mind. [2014]
  • This collection of papers takes linguists to the leading edge of techniques in generative lexicon theory, the linguistic composition methodology that arose from the imperative to provide a compositional semantics for the contextual modifications in meaning that emerge in real linguistic usage. Today’s growing shift towards distributed compositional analyses evinces the applicability of GL theory, and the contributions to this volume, presented at three international workshops (GL-2003, GL-2005 and GL-2007) address the relationship between compositionality in language and the mechanisms of selection in grammar that are necessary to maintain this property. The core unresolved issues in compositionality, relating to the interpretation of context and the mechanisms of selection, are treated from varying perspectives within GL theory, including its basic theoretical mechanisms and its analytical viewpoint on linguistic phenomena. [2013]
  • By now, it should be quite clear that words and phrases that convey positive or negative sentiments are instrumental for sentiment analysis. This chapter discusses how to compile such words lists. In the research literature, sentiment words are also called opinion words, polar words, or opinion-bearing words. Positive sentiment words are used to express some desired states or qualities while negative sentiment words are used to express some undesired states or qualities. Examples of positive sentiment words are beautiful, wonderful, and amazing. Examples of negative sentiment words are bad, awful, and poor. Apart from individual words, there are also sentiment phrases and idioms, e.g., cost someone an arm and a leg. Collectively, they are called sentiment lexicon (or opinion lexicon). For easy presentation, from now on when we say sentiment words, we mean both individual words and phrases. [2012]
  • A lexicon is a linguistic object and hence is not the same thing as an ontology, which is non-linguistic. Nonetheless, word senses are in many ways similar to ontological concepts and the relationships found between word senses resemble the relationships found between concepts. Although the arbitrary and semi-arbitrary distinctions made by natural languages limit the degree to which these similarities can be exploited, a lexicon can nonetheless serve in the development of an ontology, especially in a technical domain. [2009]
  • The English Lexicon Project is a multiuniversity effort to provide a standardized behavioral and descriptive data set for 40,481 words and 40,481 nonwords. It is available via the Internet at elexicon.wustl.edu. Data from 816 participants across six universities were collected in a lexical decision task (approximately 3400 responses per participant), and data from 444 participants were collected in a speeded naming task (approximately 2500 responses per participant). The present paper describes the motivation for this project, the methods used to collect the data, and the search engine that affords access to the behavioral measures and descriptive lexical statistics for these stimuli. [2007]
  • How does a shared lexicon arise in population of agents with differing lexicons, and how can this shared lexicon be maintained over multiple generations? In order to get some insight into these questions we present an ALife model in which the lexicon dynamics of populations that possess and lack metacommunicative interaction (MCI) capabilities are compared. We suggest that MCI serves as a key component in the maintenance of a linguistic interaction system. We ran a series of experiments on mono-generational and multi-generational populations whose initial state involved agents possessing distinct lexicons. These experiments reveal some clear differences in the lexicon dynamics of populations that acquire words solely by introspection contrasted with populations that learn using MCI or using a mixed strategy of introspection and MCI. Over a single generation the performance between the populations with and without MCI is comparable, in that the lexicon converges and is shared by the whole population. In multi-generational populations lexicon diverges at a faster rate for an introspective population, eventually consisting of one word being associated with every meaning, compared with MCI capable populations in which the lexicon is maintained, where every meaning is associated with a unique word. [2006]
  • What aspects of an utterance must be stored in long-term memory, and what aspects can be constructed on-line in working memory? This question has not played a significant role in linguistic theory, and indeed it would seem to be a question of performance rather than competence. However, if taken seriously, it leads to some radical conclusions about the organization of the grammar. In particular, the lexicon — the store of memorized elements — contains not only words but regular affixes and stems, plus phrasal units such as idioms and constructions. One consequence is a much less rigid divide than usual between lexical items and rules of grammar. The resulting architecture in part resembles the approaches of HPSG, Construction Grammar, and other non-Chomskyan versions of generative grammar. It offers the possibility of a better rapprochement between linguistic theory and psycholinguistic studies on language processing than has been possible in more traditional Chomskyan architectures. [2001]
  • Context-free phrase structure grammar (henceforth PSG) is capable of describing infinite languages consisting of finite strings drawn from a finite vocabulary and associating with each string of the target language a division into immediate constituents that can be represented as a labeled tree. This much, it would seem, is the least that we can expect of any grammar that would pretend to adequacy as a scheme for describing languages of the kind spoken by human beings. PSG has several advantages over competing systems of grammatical description.[1996]
  • Taking for granted the importance of a lexical component in any linguistic application within the so-called ‘language industry’, this paper aims at highlighting some of the issues which have to be carefully evaluated with regard to any future actions for building large-size computational lexicons. The following aspects, considered of crucial relevance, particularly if one aims at a set of interconnected lexicons for different languages, are presented: design of lexicon architecture, role of user needs, need for standardization, multilinguality, role of textual corpora, organizational aspects, intellectual property rights, and the role of national and international bodies. [1994]
  • The three lexicons used by KBMT-89 are described: A concept lexicon constitutes the sublanguage domain model for specifying semantic information; it is maintained by ONTOS, a knowledge-acquisition and maintenance system. An analysis lexicon is a dictionary containing syntactic information and mapping rules required for semantic parsing. And a generation lexicon, similar to the analysis lexicon, is employed in the generation phase. [1989]
  • Two major principles govern the lexicon. The first, that words have conventional meanings, I will call the principle of conventionality, and the second, that words differ in meaning, I will call the principle of contrast. To illustrate the workings of these two principles, imagine constructing a dictionary for some new language: After collecting notes on words and their meanings, one would organize them by putting the first word on the left of the page, say, with its conventional meanings on the right, the second word and its conventional meanings below the first, and so on until the page was filled. The principle of conventionality captures the fact that each word listed has one or more conventional meanings. Notice also that the second meaning differs from the first, and that the third differs from both the first and the second, and so on down the page. The principle of contrast captures the fact that every meaning of every word listed differs from the other meanings. In this contribution, I will argue that the acquisition and growth of the lexicon is much like the construction of a dictionary: what is continually being added are conventional meanings that contrast with those meanings already available. [1983]
  • Lexicon — The term lexicon has been used by Bloomfield in the sense of an appendix to the grammar of a language which lists the total stock of the morphemes of that language, and deals especially with the irregularities of its linguistic forms (Language 158–169; 264ff.). On the other hand, Karl Brugmann uses the term in a less rigorous and restricted way: for him lexicon is synonymous with vocabulary (Wortschatz), or the list of the words of the language. It is in this sense that I use the term. [1969]

🤓

--

--

Mohamad Mahmood
Lexiconia

Programming (Mobile, Web, Database and Machine Learning). Studies at the Center For Artificial Intelligence Technology (CAIT), FTSM, UKM, Malaysia.