How I got all tags and attributes from any XML file thanks to Ruby and REXML

Once I had to write a function to get all tags of any XML file, and their attributes.

This is tricky because most parsers need you to actually NAME the tag you’re looking for in order to get data from it.

Here’s how I finally managed to do it, thanks to REXML.

Many thanks to Lucian for introducing to REXML and all the data structuring =)


# This is my main file app.rb

# You will also need to have your xml ready (here test.xml) and a json file (here tags_and_attributes.json)

require ‘rexml/document’
require ‘rexml/streamlistener’
require_relative “tag”
require_relative “repository”
include REXML
# we will parse the XML file thanks to these tools

class Listener
 include StreamListener
 @@repo = Repository.new(Hash.new)
 # let’s generate a repository to store and order our data
 def tag_start(name, attributes)
 tag = Tag.new(name, attributes)
 # here we generate a Tag object for each tag we find
 @@repo.data(tag)
 # lets now move that object into our repo
 end

def generator
 @@repo.generator
 # we can now use our generator to get our data
 end
end

listener = Listener.new
parser = Parsers::StreamParser.new(File.new(“test.xml”), listener)
parsing = parser.parse
listener.generator


# this is my model tag.rb

class Tag
 attr_accessor :name, :attributes
 def initialize(name, attributes)
 @name = name
 @attributes = attributes
 end
end


# And finally, here is my repository.rb !

require ‘json’

class Repository
 def initialize(hash)
 @hash = hash
 end

def data(tag)
 if @hash[tag.name]
 @hash[tag.name][:quantity] += 1
 # if we find a tag name, we add 1 to its quantity
 attributes(tag)
 # and we generate its attributes
 @hash[tag.name][:attributes].uniq! if @hash[tag.name][:attributes]
 # let’s make sure we remove duplicates if any
 else
 @hash[tag.name] = {quantity: 1}
 @hash[tag.name][:attributes] = tag.attributes.keys if tag.attributes.any?
 # if we don’t find tag we create them, and attributes if necessary
 end
 end

def attributes(tag)
 if @hash[tag.name][:attributes] && tag.attributes.any?
 @hash[tag.name][:attributes] += tag.attributes.keys
 # if some attributes exists already we add the new one
 elsif tag.attributes.any?
 @hash[tag.name][:attributes] = tag.attributes.keys
 # if not we create them
 end
 end

def generator
 hash_sorted = @hash.sort_by {|_key,value| value[:quantity]}.reverse
 # now we can sort our hash by occurences and return our tags !
 puts “most common tag is : #{hash_sorted[0][0]}”
 puts “occurences found : #{hash_sorted[0][1][:quantity]}”
 hash_sorted[0][1][:attributes].each {|att| puts “attribute : #{att}”}
 # json_maker
 end

def json_maker
 # optionnal method : get a json with all tags, occurences and attributes
 File.open(“tags_and_attributes.json”,”w”) do |f|
 f.write(JSON.pretty_generate(@hash))
 end
 end
end

Like what you read? Give Pierre Hersant a round of applause.

From a quick cheer to a standing ovation, clap to show how much you enjoyed this story.