Association Rules and their counter intuitive reciprocals: a proof requiring investigation

Laurae: This post is about association rules, and what to do with them when you have one (and discussing their validity). It assumes you compute beforehand the five possibilities of a rule when you have the supervised association rule (two-sided: one with the label, one without). It also discusses two different types of (valid) theoretical interpretations (depending on your own taste). The post formatting was slightly modified, to comply to Medium’s text editor. Otherwise, the post was not modified, originally at Kaggle.

ToonRoge wrote:
Laurae wrote:
There are some extremely interesting categorical rules in the dataset…
For instance, assuming for 63692 cases: A = {v31 == A and target == 1} and B = {v3 == C}, you have:
A -> B @100% rate
(-A) -> B @92.62% rate
B -> A @57.60% rate
(-B) -> A @0% rate
A <-> B @58.98% rate
Algorithms telling 100% confident on this is an interesting rule. There are many 99% confident rules on categorical variables also that imply the label we are looking for. I wonder if hardcoding all the possible rules from the data set could beat a single xgboost alone (and I think that’s potentially true when I look at the confidence level of all rules of only categorical variables, but getting them in the order require a lot of manual entries -_-).
I don’t understand this post, but it sounds interesting. What does A -> B means? And (-A) -> B? … When I look at the combinations of v31 and v3 I see nothing exceptional. If v31=A, v3 = C, but this holds for both target = 0 and target = 1.
There is some signal in v31 for sure, but nothing spectacular I would say.
What am I misinterpreting?

Going to explain line by line for the whole process of analyzing just the links I provided.

If you want to follow strictly the inference process, read the following parts separated by a line: 1 => 3 => 4 => 2 => 5 => Conclusion.


A -> B @100% rate

When you have A true (v31 == A and target == 1) then B is forcibly true. From there you know you are in a selective node v3 == C.

Inference #1: (v31 == A & target == 1) => (v3 == C)

Note: it does not prove whether (v31 == A & target == 0) => (v3 == C) is missing. Check inference #3.


(-A) -> B @92.62% rate

When A is not true (either v31 != A or target != 1), then B remains right at 92.62% confidence rate. It means that (v3 == C) is also spread using v31 and target as variables.

Using inference #3 you can have an inference #4: if v31 is not A, you should be able to predict (target == 1) at a potentially 92.62% confidence rate.


B -> A @57.60% rate

When B is true (v3 == C) then A is right at 57.60% rate.

It means that when v3 is C, you have a higher than average confidence to have (v31 == A) and (target == 1).

Inference #2: high proportion of (v3 == C and target == 1) in the data set than the contrary.


(-B) -> A @0% rate

There can not be a case where (v3 != C) leads to (v31 == A and target == 1). This was the first inference found.

Inference #3: using inference #1, we know if (v31 == A), target is remains a mobile variable. However, you cannot have (v31 == A and v3 == A) nor have (v31 == A and v3 == B). Confirms inference #2 (if it was not true, then inference #3 is rejected).


A <-> B @58.98% rate

When both A and B are both conditionals for each other, it happens at a confidence rate of 58.98%.

It means all these three conditions together:

  • You need (v31 == A and target == 1) to have (v3 == C)
  • You need (v3 == C) to have (v31 == A and target == 1)
  • You have a confidence of 58.98% of it happening.

Conclusion: v3 == A has a good ability to segregate target. You also know that if v31 == A, you forcibly have v3 == C. The baseline 58.98% is a starting point if you are assuming the relation (v31 == A and target == ???) and (v3 == C). More exactly, the complete inference tree becomes:

  • If (target == 0 and (v31 == A and v3 == C) ) then ~41% confidence of being true (target = 1 at 41%)
  • If (target == 1 and (v31 == A and v3 == C)) then ~59% confidence of being true (target = 1 at 59%)
  • If (target == 0 and (v31 != A or v3 != C)) then ~7% confidence of being true (target = 1 at 7%)
  • if (target == 1 and (v31 != A or v3 != C)) then ~93% confidence of being true (target = 1 at 93%)

i.e, when you don’t have (v31 == A and v3 == C) true, you have an extremely high chance of having target == 1, and when it holds true you can slightly differentiate whether target is 0 or 1 (small separation).

If you are using loose probabilities, you would use B -> A %confidence rate instead of A <-> B %confidence rate. It’s mostly up to human bias when dealing with (rules and rules’ confidence rate) to probability conversion.

A more manual analysis would be:

And if you are a probabilistic statistician, you would use these values in that leaf report as probabilities that you would compound with each other depending on the situation you are in, to create the final probability of target given v31 and v3.

The major issue is that a human can not check all these variables one by one due to time constraints. For a three-way relation analysis against target, you are looking for an absurd 17K+ possibilities (edit: 51K+ possibilities due to three-way relation where all variables are mobile in each side). Imagine a four-way interaction ^^ Good thing that computers can find it for us :)

Might be of interest for some Kagglers (if they find it useful):

Like what you read? Give Laurae a round of applause.

From a quick cheer to a standing ovation, clap to show how much you enjoyed this story.