Comments on an implementation of Prim’s algorithm in Ruby

Matt Hinea
Jul 17, 2016 · 3 min read

Prim’s alrogithm is used to return a minimum spanning tree. For those not yet in the know, an MST is a tree made from a graph of points connected by undirected weighted edges. The MST must include every node in the graph and it must do so as efficiently as possible; that is, the sum of weights must be as low as possible. Conceptually this is not the most complicated data structure, and its potential real-world applications should be immediately apparent. Prim’s algorithm works as follows: pick an arbitrary node, add that node to ‘visited’ list, choose the smallest edge that connects to an unvisited node, add that node to ‘visited’ list. Continue in this manner, each time choosing the smallest edge that leads to an unvisited node from any visited node, being careful not to create any loops.

In pseudocode, the algorithm looks like this (from Mark Needham’s blog):

  • Let X = nodes covered so far, T = edges covered so far, V = all the nodes in the graph
  • Pick an initial arbitrary node s — it doesn’t matter which one it is
  • while X ≠ V:
  • let e = (u,v) be the cheapest edge of the graph where u ∈ X and v ∉ X
    i.e. u is a node that has been covered and v a node that has not yet been covered
  • Add e to T
  • Add v to X

Following the pseudocode as closely as possible,we end up with this Ruby code (source: Mark Needham).

# Prim's Minimum Spanning Tree Algorithm - Naive version

def file
@file ||= File.readlines("edges.txt")
end

def header
@header ||= file.take(1)[0]
end

def number_of_nodes
@number_of_nodes ||= header.split(" ")[0].to_i
end

def create_adjacency_matrix
adjacency_matrix = [].tap { |m| number_of_nodes.times { m << Array.new(number_of_nodes) } }
file.drop(1).map { |x| x.gsub(/\n/, "").split(" ").map(&:to_i) }.each do |(node1, node2, weight)|
adjacency_matrix[node1 - 1][node2 - 1] = weight
adjacency_matrix[node2 - 1][node1 - 1] = weight
end
adjacency_matrix
end

def find_cheapest_edge(adjacency_matrix, nodes_spanned_so_far, number_of_nodes)
available_nodes = (0..number_of_nodes-1).to_a.reject { |node_index| nodes_spanned_so_far.include?(node_index + 1) }

cheapest_edges = available_nodes.inject([]) do |acc, node_index|
get_edges(adjacency_matrix, node_index).select { |_, other_node_index| nodes_spanned_so_far.include?(other_node_index + 1) }.each do |weight, other_node_index|
acc << { :start => node_index + 1, :end => other_node_index + 1, :weight => weight }
end
acc
end

cheapest_edges.sort { |x,y| x[:weight] <=> y[:weight] }.first
end

def get_edges(adjacency_matrix, node_index)
adjacency_matrix[node_index].each_with_index.reject { |edge, index| edge.nil? }
end

def select_first_edge(adjacency_matrix)
starting_node = 1
cheapest_edges = get_edges(adjacency_matrix, 0).inject([]) do |all_edges, (edge, index)|
all_edges << { :start => starting_node, :end => index + 1, :weight => edge }
all_edges
end
cheapest_edges.sort { |x,y| x[:weight] <=> y[:weight] }.first
end

def nodes_left_to_cover
(1..number_of_nodes).to_a - @nodes_spanned_so_far
end

# Prim's algorithm

adjacency_matrix = create_adjacency_matrix
first_edge = select_first_edge(adjacency_matrix)
@nodes_spanned_so_far, @edges = [first_edge[:start], first_edge[:end]], [first_edge]

while !nodes_left_to_cover.empty?
cheapest_edge = find_cheapest_edge(adjacency_matrix, @nodes_spanned_so_far, number_of_nodes)
@edges << cheapest_edge
@nodes_spanned_so_far << cheapest_edge[:start]
end

puts "edges: #{@edges}, total spanning tree cost #{@edges.inject(0) {|acc, edge| acc + edge[:weight]}}"

The adjacency_matrix method simply prints a matrix with edge weights at certain (x,y) positions, indicated by integers in the adjacency matrix. An arbitrary first node is selected. Interestingly, our method to create an adjacency matrix from an edge list is used primarily by select_first_edge, which is called by the get_edges helper method. I’m left wondering if there’s a more efficient way to get edges from the text file, but it probably depends on the formatting of the document. Regardless, the code looks quite nice, and is pretty human-readable.

It’s difficult to test Mark’s code without knowing the format his data was stored in. Trying to modify the code to work on a list of comma-separated values didn’t work out very well. Significantly, the code relies on vertices ID’s as indexes when creating the adjacency matrix, so tables with long strings of digits as ID’s (as opposed to integers 1–10, for example) throw errors. Nonetheless the implementation is simple enough to read along with and follow, and it’s a good example of the way that an algorithm can be quickly turned into pseudocode which can then be transformed into executable Ruby.

Matt Hinea

Written by

Web Developer

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade