NLP for composition of Value Tree

Sergey Kamenshchikov, Skolkovo Foundation

6 min readSep 29, 2019

Tree of Keywords: graph analysis of semantics attractors

Application domain. According to the WHO depressive disorder is the leading cause of disability worldwide. It is found that generating mechanism lies in rumination: long-term stressful attachment to the problem and underlying Super Value in danger. Efforts of predictive pattern recognition are made in several fields - Machine Learning is among them.

Previous research. According to Al-Mosaiwi, single words-absolutisms may be used as predictors of cognitive errors following rumination. However it is recognized that analysis of single word frequencies is not enough. The reason is simple: random mixing of word collection does not affect the relative frequency, but removes information — bag of words effect. This means that the semantic source of rumination, value, is outside the scope. Besides, according to Kahneman and Tversky, any human may unconsciously switch on individual irrational schemes for making faster decisions: increase speed sacrificing rationality. However they admitted that this effect may be smoothed in group due to decorrelation of individual biases. The Researchers from MIT have developed and trained neural network, which predicts depression rate in clinical interviews with precision of 71% and recall of 83%. However they admit that their model discovers cognitive impairments as well, ignoring cognition structure.

Assumptions. According to Relational Frame Theory (RFT) developed by psychologist Steven C. Hayes bidirectional links of entities are cognitive building elements. This psychological theory of human language states that reality is reached as a multidimensional graph in process of associative learning. Language is a mental projection into the entities space, while bigrams of entities express mental schemes. Studies validated that
associative schemes are specific and stable for each individual. Pathological states like depression correspond to destructive associative schemes: specific graph clusters which trigger ruminative loops. We assume that these clusters may have high betweennes centrality (BC), clustering feature. Entities with extra high BC may be considered as information hubs, speech attractors, which influence the semantics in high extent. High number of ruminative clusters may rise up the integral betweennes centrality of graph. Recognition of ruminative clusters, their centers and undelying values are among the goals of this research. Decorrelation of biases, noticed by Kahneman and Tversky, suggests that differences in BC may be associated with communication intensity and group values. These hypotheses have been tested through the instance based model.

Data preparation. According to the WHO Russia is number three in the age-standardized list of suicide rates. We explored the most popular Russian speaking Wall of Help: more than 150,000 visits per day. Traffic is distributed among Russia, Germany and UK. Top 3 search engine requests are: #depression, #meaning of life and #suicide. Responces are provided by psychotherapists, charity volunteers and common people.

Response/request collections have been parsed: 25,000 of records in 2018. Text cleaning includes age, sex, text length standardizations and abstract mining (first 100 words). Metric analysis and text cleaning were implemented with use of Python ecosystem including NetworkX and NLTK libraries. Sex standardization was reached with use of name — sex recognition. Morphological cleaning and tokenization allowed getting nouns in standard form. Stemming was applied as well to reduce dimension. Vocabulary of bigrams with corresponding frequences was mined. Bigram sets are ordered by frequency and normalized to equal volume by the cutoff criteria. Each group, Request/Responce, is characterized by unique bigram matrix. Increase of information as inverse to Shannon entropy is shown after the transition from single words to bigrams: 30% of increment. Further increase of n-gram length gave no significant increment: I(3)-I(2)=6% for the 3-grams, [H(4)-H(3)]=2% and less than 1% for N>4. It seems that bidirectional associations, proposed by RFT as the semantics blocks fit.

Data compression. Сonversion was implemented by Open Ord force-directed layout algorithm — Gephi software was integrated with Python core. Bigram matrix was used as a generator of weighted undirected graph for interpretation of Big Data by psychologists at glance. Open Ord makes transformation from 2D matrix to the tree topology graph. Weight of each node in matrix corresponds to the single word frequency while edge length is the inverse function of bigram frequency. Nodes are ranged according to the betweennes centrality and marked at the transformed graph. Closest neighbours are based on co-occurrence frequency analysis. High BC node and its neighbours form the cluster.

Results. Deviation of BC across the graph D=|BC(max)-BC(mean)| was considered as the integral clustering metric. 43% of ordered bigrams have been used in both groups. Each graph is based on 10,000 most frequent bigrams. Comparison of Request/Responce graphs showed a significant difference in centralization: D(Request)/D(Responce)=1.7. Relatively high BC-clustering in the test group is validated by Zip’s law factor: 47 in Request Group against 25 in Responce Group, 1.9 times higher. Zip’s law was applied to unordered bigrams. The integral under the Zip’s curve is 1.5 times higher in the test group. It seems that ruminative clusters and bigrams in Request sample give significant input as it was expected.

Standard deviation of sentence length is 1.7 times higher, which marks higher emotional instability of the test messages. Top 5 entities, arranged by BC are #Year, #Life, #Man, #Job, #Family/Procreation. Tags in figures are translated into English. #Year tag confirms a long-term stressful state, but is not interpreted as the Value. #Man, #Job, #Family/Procreation are considered in relation to the Root Value: #Life. Nearest neighbours technics is applied: keywords extraction in relation to the given topic

#Man value is almost fused with #Life root-attractor in the Request Group. It remarks probable subjective/objective isolation and high need of qualitative social contacts. According to Kahneman and Tversky the isolation factor reinforce cognitive errors and reduces flexibility in communication. Harmful effects of isolation are already observed by psychotherapists. However it was not clear whether it plays the key role in suicidal tendencies. Method of irrational vocabulary was compared with the graph technique. Two lists of single word absolutisms and ‘shoulds’ were generated on the basis of Mosaiwi research. However it turned out that irrationality frequency is only 1,1 times higher in the request group vice the responce group. Fraction of messages, containing at least one word from irrational vocabulary is 84% against 78%. It seems that D(Request)/D(Responce) and Zip’s metrics are more sensitive and are closer to the trigger: rumination.

Findings and interpretation. Bidirectional language associations, bigrams, are optimal semantics blocks and validates RFT assumptions. They provide 30% of information increment, comparing with single words analysis. Ruminative attractors provide sufficient increase of centralization metric: betweennes centrality. It is more sensitive than irrationality frequency despite the topic bias in the responce group. Instance analysis of text/speech and normalized BC may give insights about the suicidal risks without applying more complicated techniques like neural networks (NNs). Value analysis may be validated by fast visual interpretation in comparison to the “black box” rules of NNs. Ruminative clusters may be discovered directly: they are formed on the basis of Group Values (#Man) more than Undividual Values (#Job). Communication value may be underestimated in the standard protocols of depression cure. Its therapeutic effect may be explained in frame of bias smoothing. It should be noticed that betweennes centrality is the measure of ‘diversification’ as well. It means that over focus may be as risky for the psyche as ‘single-equity’ investment portfolio for your pension plan.

Application rescaling. The given algorithm may be used for mining of core entities in frame of relative ranking. It provided rescale stability under the conditions of noisy reference text. The instrument may have applications in HR evaluation and speech keywords extraction: AI problems. Authors conduct relevant research and look for collaboration. The full version of research is pending in the peer reviewed journal. However you may ask draft upon the personal request.

I would like to thank Doctor Ann Butkovskaya for psychological interpretation of results and co-authorship in full version article.

Written by Sergey Kamenshchikov, Skolkovo Foundation