Visualization of Information Theory on Features (Part 1)

Published in

Data Science & Design

4 min readAug 22, 2016

Laurae: This post is about visualizing continuous and discrete features using information theory. It supposedly came after Part 2, but for reading it comes as Part 1. Only the initial off-topic quotation was removed to the post, originally at Kaggle.

Large post

Interpretation tutorial: HERE

I’m sharing some charts about variables’ relations. I used multifactor dimensional reduction with interaction information variable (multi-way interaction, not two-way interactions) on, very important in this case, supervised discretized continuous variables that I could discretize (I used 5000+ different conditional inference forest to determine the potential best buckets per variables — and if they do exist). Settings for the visualizations are the following:

Edge visibility threshold: -4.6e-4 (9%) if applicable — if I were to use 100%, it would be pointless because all lines would be drawn: see picture below
Node visibility threshold: 4.0e-4 (100%) — to show all values

big

See here if you need to read about information interaction: Wikipedia about Interaction Information — tl;dr: generaliaztion of mutual information expressed as a real-valued variable that can be positive (association enhancement) or negative (association inhibition)

All values are expressed in percentage of total. For 114 variables, you have 12882 potential ways to interact (if looking at pairwise interactions). Hence, you should be at ~7.8e-5 interaction in average per edge.

Fruschterman-Rehingold organization of variables: big1 — big2

Relations as circle chart (more “human” perception of intertwining links): big WARNING: start at right please! v113

Relations using Kamada-Kawaii organization: big

Relations using self-organizing maps: big

Dendrogram… because why not: big

Variables I could not discretize (because I could not find any two-way interactions with our label variable):

v18
v22 (arbitrary choice I made myself about that one)
v35
v42
v49
v54
v56
v67
v70
v77
v89
v105
v118
v120
v122
v124
v126

Output I got from a connected component analysis (using the thresholds):

Vertices: 91.0 -> some variables could not be plotted due to their uniqueness (who can help me to find v50? I tried hard to find it but I cannot find it… it is in the inputs though — it should be linked with v79 and v10 as they are both inter-linked when I’m looking to the raw values)
Edges: 483.0 -> 483 links are plotted
Diameter: 7.0 -> no need to care about
Average number of neighbors: 10.6154 -> 1 variable has in average 10.6 neighbors
Density: 0.1179 -> not dense relations, very sparse
Centralization: 0.7487 -> big cluster
Heterogeneity: 1.2258 -> not homogeneous variable relations, obviously

Cartesian product network using three-way interactions graph: Fruchterman-Rheingold — Circle graph -Kamada-Kawaii — Self Organizing Map — Dendrogram

Settings:

Edge visibility threshold: 0.00154 (20%)
Node visibility threshold: 4.0e-4 (100%)

Connected components analysis:

Vertices: 86.0 -> missing lot of nodes
Edges: 115.0 -> 115 lines
Diameter: 4.0 -> to ignore
Average number of neighbors: 2.6744 -> for each variable, 2.6744 neighbors in average
Density: 0.0315 -> extremely sparse
Centralization: 0.9332 -> extremely centralized
Heterogeneity: 3.309 -> not homogeneous at all

Fruchterman-Rheingold:

Circle graph:

Kamada-Kawaii graph:

Self Organizing Map:

Dendrogram:

N.B: I personally thought v56 would be the center of the world of the cartesian product network (123 outcomes)… but this is not the case (v125 with 91 outcomes took its spot).

Visualization of Information Theory on Features (Part 1)

Large post

Written by Laurae