Mahad Afzal
9 min readNov 2, 2019

Opinion: The Pseudo-Psychology Behind the Art of Reading People (For the better) — Predicting Behavioural Trends

Note: I don’t have subject matter expertise in the area of behavioural psychology, however after working as a therapeutic assistant, with years of interaction with people suffering from different mental illnesses, and from different socio-economic backgrounds, coupled with research in graph theory and neural networks, I wrote this as a topic of interest, among many other.

If you aren’t used to long reads, with tangents that violate the basic conventions of writing, then the structure of the content might irritate you at most. This article might have some basic math and discussion on decision trees. At the bottom, I have listed some resources. Due to the degree of the depth of the topic, I gave a very myopic and basic overview on quantifying human behaviour, which might raise some eyebrows.

Can you truly read someone? This is the paradox that incites superfluous debates at cafes that sell overpriced coffees to insomnia-tic nomads, who exhibit a facade of being a nihilist. One can not truly answer that question, for the question itself is incorrect. To truly read someone, you must be absolutely certain about who they are, which is insurmountable. Such absolute certainty might exist in a utopian model, but is far from being commonsensical. If it was true, therapists might be out of a job. The question shouldn’t be whether you can truly read someone with absolute certainty. It should be whether you can read someone and predict their behavioural patterns with moderate probability? But before you dwell into the realm of probabilistic behaviour, you must seek to understand accelerated thinking, which is the pre-requisite of probabilistic behavioural analysis.

Accelerated Thinking

Any problem, that might seem convoluted can be boiled down. Feynman, Euclid, Musk and Edison all used something known as accelerated thinking or learning to solve problems. Let’s look at a basic problem:

2 + 2 =4

Now, we know that 2 and 2 is 4. But that is because we are likely to think analogously, as in to compare it to objects like 2 apples in 2 baskets. The conflict lies in the thinking process that assumes that 2 and 2 make up 4, when that might not be the case. 2 apples in 2 baskets might be 4 apples, but that is if they weren’t continuous variables. No assumptions are taken about the wholesomeness of the elements of the matter or the apples in this case. Why do we consider an apple one? Do separate identities of the object constitute it’s oneness? If so, why are we bound by that assumption in the first place? If the apples are different sizes, why do we still count them in whole numbers? Relatively, the first apple could be 0.83, while the second apple could be 0.82 based on an arbitrary benchmark.

The answer to the above question is that apples can’t really be considered continuous variables, for simplification purposes of the basic count function. When a series of randomly picked stochastic analogies are applied to complex problems around us, we deem them too complex to solve and are paralysed by pre-defined assumptions. The trick is to always think to invalidate the pre-defined assumptions, or in other words have an alternative hypothesis. But, understanding the movement of stochastic analogies is beyond the scope of this. You can learn more about advanced cognitive mechanisms and techniques in accelerated learning for adults in work done by Dr. Brookfield here.

Assumption Deduction from Events

One momentous step in setting the paradigm for reading behavioural patterns is to make well-defined assumptions. This requires the use of accelerated thinking, and invalidating any prior hypothesis that would be an impediment to the behavioural probability density function.

A common notion is that people don’t like being figured out. You never want another person to absolutely read you. That is why we get irritated when someone gives a verdict on us, that might be true, yet we would go to any length to discredit it. The irony is that some people spend their entire lives trying to be understood, yet never want to succumb to full disclosure.

“You can never truly know the events of someone’s life. But you can always deduce resulting patterns of a mix-match combination of those events. You might not know why someone suffers from the PseudoBulbar effect, but you can deduce a lot from the symptoms of it”

For example, Amy is your childhood best-friend who has recently gotten distant from you. You are trying to figure out why. Let’s define some known events for Amy, that only you know:

Event A: Amy suffered from child abuse at the hands of her step-mother at the age of 8. Her dad suffers from a severe case of degenerative disk disease, and is busy taking care of himself than to pay attention to her.

Event B: Amy is a linguist. She studied colloquialism one summer in Prague. She dropped out after her brother lost some of the family money in a pyramid scheme.

Event C: Amy consistently goes to bars and gets hit on quite often. She re-directs the attention given to her, to her friends, only when her friends from high school are with her, but not when her close friends with her.

Event D: Unknown

Event E: Unknown

Note: Events titled “Event A1” are sub-events.

Now, you can use the events to invalidate certain misconceptions, and answer a very specific question like:

“Why didn’t Amy compliment me on my wedding dress when all of my best friends did?”

To answer this question, you first need to come up with all possible permutations of the events and sub-events:

Permutation

where n is the total number of events and k is the available spots you have to assign at all cost to come up with better features or predictor variables to answer the question. In reality n could be 100, but if at least 5–6 different events show multi-collinearity, your result could be lopsided. It is always advised to assign different weights w to main and sub-events. There could be events much farther down if you branch. Again, weighting isn’t much important when doing permutations, but you could if you believe the assumptions are weaker on some ends or if there is asymmetric information about the events. In some scenarios, your perception about the event could really be different from how your friends might have perceived it. So you need to adjust by using an error function.

Hypothesis 1: Adult fixation on male attention might be a result of lack of attention from father figure in life.

Seq: Event A — Event D — Event A3 — Event B2 if k is 4

Note: you can draw up more sequences if possible. But for purposes of understanding, only one sequence is given.

Assumption: Studies show linguists are more likely to associate themselves with elite post-modern culture, defiant of their middle class upbringing if they had one. This causes distancing of one’s self from their own roots, out of a misguided notion of entitled separation. Amy’s brother was the reason for her being stripped of this very entitlement. This led to an unknown Event D. As a symptom of Event D, Amy started to distance herself from her friend Lara.

Note how in the above assumption, you can draw up symptoms of the unknown Event D and treat it as it’s own separate question. But to answer one question, it is very important to not branch out too much into the depth of the tree.

Hypothesis 2: Adult fixation on male attention might be a result of lower-self esteem issues since Event C shows a behavioural difference between her high school and close friends.

Seq: Event C— Event D — Event A3 — Event B2 if k is 4

Assumption: If event C happened right before D, this means Amy could have felt insecure about anything that happened in Event D. The problem is we don’t know that. Just like hypothesis 1, we just have to carry on and look at the symptoms.

Note: In reality, the sequence of these events can vary by a large degree, can have many sub-events, varying actual and perceived weights. If there is an error function for your perception of an event, that helps adjust the sequence.

A normal continuous graph of the sequence of these events could look like this, for better visualisation purposes, where G could be Hypothesis 1 and T Hypothesis 2. The crossed out nodes represent unknown events. For every unknown event, you can draw up another spanned weight graph. But that could get very cumbersome very soon.

Probability

Remember that our sequences are:

Event A — Event D — Event A3 — Event B2 if k is 4

Event C — Event D — Event A3 — Event B2 if k is 4

Now, one might think what would child abuse have to do with Amy not attending your birthday. Probably nothing, or probably something. But there is a “probably” in there somewhere. The idea is to quantify that probability.

Before we get into assigning cumulative probabilities to the sequence of events, it is important to smooth out older events. For example, Sub-Event C2 is Amy having a migraine at the bar when she went out with her friends. The question is how to smooth it out. If the migraine was before going out and got better with time, but she revealed it to her friend at the bar, so by the time her friend told you it seemed like she had it at the bar and not actually before it. This asymmetric information problem can be adjusted by smoothing out the weights on the basis of time by using exponential smoothing.

Exponential Smoothing

Where s vars can be treated as smoothed and non-smoothed variables adjusted for time and t represents time. Alpha usually represents the smoothing constant, determining how much you really want to smooth old data. Usually in sub-event cases, one must not choose a strong alpha.

Once you have smoothed out the weights, you start computing probabilities of the sequences for every event by using Bayes Rule:

Probability of Event A given Event B

Bayes rule would help you determine the probability of every event and sub-event using the weights you picked or assigned based on your own assumptions of how likely the event is. For example, the above conditional probability simply refers to the probability of all events under the umbrella of Event A happening right after after all the events under the umbrella of Event B.

So, for Event A — Event D — Event A3 — Event B2 if k is 4, it would become

P(B2|A3|D|A) where the weights are (0.34,1,1,0.21).

So, let’s say P(B2|A3|D|A) = 32%

This means that that Amy’s dissent with her roots as she got more self-aware and conscious from her multi-cultural linguistics program was derived from her frustration with her middle-class blue collar father suffering from disk degenerative disease as he let her step mother take control of domestic issues. These domestic issues could have come from Event D (Symptoms were distancing herself from Lara). A prior assumption could be Lara and Amy fought over something right after their night at the bar. Perhaps, Lara was a bit too assertive and controlling, which reminded her of her step mother’s treatment of her, which she resisted. The probability of this weighted sequence could be 32%.

Again, this probability is just a random number and doesn’t mean anything. It is just an idea. As mentioned above, if one has trouble quantifying 2 + 2, then human behaviour is a notch up, but not impassable. A more real life example, with a hundred events could be an extended version of this. Player odds of playing a card in a Poker game is quite similar to this for conceptualisation purposes as seen below.

There is no fixed rule on what methods to use quantify human behaviour. You can generate a training set of behavioural trends exhibited by people who suffer from Attention Deficit Hyper-activity Disorder (ADHD) and predict if someone has it. With research prevailing in neural networks, unsupervised learning by clustering similar behavioural habits, there is a lot one can do to predict the probability of one’s behaviour.

Resources:

Decision Trees

Neuroscience in Criminology

Bayesian Inference

DT in Blackjack

Weighted Node Search