Detour: Graph Analysis [5]

Simon Tse
Learn about Cancer with Code
4 min readMay 24, 2023
Credit:

Background

In last post I examined how correlation graph would behave.

In this post, I would focus on whether I can obtain the directed relationship based on the graph structure.

Approach

My earlier script would convert a sentence into a graph like below.

Created by author

My question is: given this graph whether I can infer the direction of action. For example, can I discern the direction of causal relationship between ‘p14ARF’ and ‘p53’? With this quest in mind, I am going to exploit and formulate my question as a Bayesian probability estimation. That is

Created by author

While

Created by author

It’s not mathematically rigorous to prove that I can extract direction from the data. For those who are interested in the proper way to do causal inference, you may refer to Judea Pearl’s work. I am just using this notation to guide my way on developing a function to extract information from data.

I have following helper functions written for this task.

Basically, I just follow those steps to create a directed graph with different node/edge weights assigned according to the parameters mentioned in the function. I have covered this is previous post so I am not going to repeat it here and you can refer to following post for further details.

Here I am going to do something different. Instead of relying on the built-in function of Networkx to retrieve simple paths, I am using my own function “generateRandomWalkwithTarget” to generate a walk. For example, when I am running this function with entities ‘CDKN2A beta transcript’ and ‘MDM2’, I get the following

Created by author

You can see the function returns walk that is only 34 tokens long because the walk got truncated when it reached the target node ‘MDM2’.

Then I will run over the entire graph to collect paths that connect different tokens. Please note that I allow the source and target node to be the same when I am running the function “generateSimpleWeightedPaths”.

With the graph of the sentence

In contrast, the product of the human CDKN2A beta transcript, p14ARF, activates a p53 response manifest in elevated levels of MDM2 and p21CIP1 and cell cycle arrest in both G1 and G2/M.

I obtained the following result

Created by author

From the sentence, we want to recover the relationship between ‘MDM2’ and other entities. Here I am using the function “bayesProb” to count the number of occurrences and see how that goes.

Created by author

From the result, it seems the random walks generated from uniformly weighted graph do recover some underlying pattern that corresponds to how human reads and understands the sentence. For example, ‘p53 response’, ‘activates’ and ‘CDKN2A beta transcript’/’p14ARF’ seem to be the combinations that gives higher than 50% chance that ‘elevated MDM2’ will be found in the event space. Interestingly, removing ‘p53 response’ from the event space will make the probability drop well below 50%. That suggests ’p53 response' is somehow a mediator between ‘CDKN2A’/’p14ARF’ and ‘MDM2’.

Intermission

Now I have demonstrated a graph representation of a sentence could extract some structure that corresponds to how human interpret a sentence. In next post, I am going to see if I can construct a probability function of the key entities.

Stay tuned!

--

--

Simon Tse
Learn about Cancer with Code

Try to apply my ML/NLP knowledge to problems I am interested in and create a narrative with the data. Current Interest: Cancer Biology