Degrees of separation on a tree algorithm

Image for post
Image for post
Image for post
Image for post
Image for post
Image for post

Imagine that you are like me and looking for an algorithm to compute degrees of separation along a hierarchical tree like the ones pictured below. Your tree could represent any data — let’s pretend it’s a company orgchart with node A as the CEO, nodes B and I are execs, and so on. The distance, or degrees of separation, between any two nodes is the number of links along the shortest path that separates them.

Image for post
Image for post
Image for post
Image for post

Concept

Image for post
Image for post
X = m’s distance from A, Y = n’s distance from A, Z = number of nodes that m and n share

Say we have selected two nodes m and n, where m = node B and n = node F. Notice that B is F’s eventual manager (via node D). In this case, B’s distance from A, or X, = 1; F’s distance from A, or Y, = 3. The only node B and F have in common is A, so their intersection Z = 1. B and F’s degrees of separation = 2 because 1+3–2(1) = 2.

Image for post
Image for post
Image for post
Image for post
Case 1: One node is the (eventual) manager of the other

Now let’s select two nodes which are not eventual managers of each other — nodes F and H. Both node F’s and H’s distance from A, or X and Y, = 3. Their intersection Z = 3 because they have nodes A, B, and D in common. F and H’s degrees of separation = 2 because 3+3–2(3–1) = 2.

Wait a second, where did that minus 1 come from?! Well, technically it’s minus 2 once you distribute the 2, and it comes from eliminating the double-counting that happens when links are shared by both m and n on their paths to A. I urge you to trace some paths of your own and compute the math to make evident this concept.

Image for post
Image for post
Image for post
Image for post
Case 2: Neither node falls in the management chain of the other

Data Structure

Image for post
Image for post

Code

degrees <- function(m,n){     ## compute m & n's distance from A
## subtract 1 to remove "person" column
mFromA <- sum(!is.na(data[m, ])) — 1
nFromA <- sum(!is.na(data[n, ])) — 1
## compute how many nodes m and n have in common
## in many cases, m and n only have A in common
overlap <- intersect(data[m, which(!is.na(data[m, ]))], data[n, which(!is.na(data[n, ]))])
## (optional) print above values
print(paste(“Distance”, data[m,1] ,”to A: “, mFromA))
print(paste(“Distance”, data[n,1] ,”to A: “, nFromA))
print(paste(“length of overlap: “, length(overlap)))
print(overlap)
## ok, algorithm time: ## but first, let's check if the user entered the same person
## i.e. does m == n? if so, degrees of separation = 0
if(m == n){
degreesOfSep <- 0
}

## check if m or n is (eventual) manager of n or m
## i.e. check if higher node manages lower node
## if so, compute distance as X+Y-2*Z
else if(mFromA != nFromA){
if(mFromA > nFromA){ lower <- m; higher <- n }
if(nFromA > mFromA){ lower <- n; higher <- m }
for (i in 2:length(data[lower, ])) {
if(grepl(data[higher,1], data[lower,i])){
manager <- TRUE
print(paste(data[higher,1], “is an (eventual) manager of”, data[lower,1]))
break
}
else{
manager <- FALSE
}
}
if(manager){
degreesOfSep <- mFromA + nFromA — 2*(length(overlap))
}
## if one is not (eventual) manager of other,
## compute as X+Y-2*(Z-1)
else {
degreesOfSep <- mFromA + nFromA — 2*(length(overlap) — 1)
}
}
else {
degreesOfSep <- mFromA + nFromA — 2*(length(overlap) — 1)
}
print(paste(“Degrees of Separation: “, degreesOfSep))
}

Cool! Now, what?

Perhaps a degrees of separation metric could help you predict attrition—how does a manager’s resignation ripple-effect through his or her network? Are a manager’s direct reports (+1 degree of separation) more likely to leave than his or her skip-level reports (2+ degrees of separation)?

Degrees of separation make all sorts of team analyses possible—do we perform better as teams or as individuals?

In the future, I’d like weight degrees of separation based on external data, like email communications and LinkedIn connections. In the case of the orgchart, I believe the future lies in networks, not trees. Tree orgcharts are simple which is why we use them, but in reality teamwork and collaboration can happen even amongst those with the highest degrees of separation. Fortunately, there’s a lot more literature online about network algorithms than single-parent tree algorithms.

Written by

data scientist • statistics, philosophy, design, humor, technology, data • www.siena.io

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store