Check and Update a URL with Ruby

While checking and updating existing client’s data I needed to update URLs in their database. Very often they had http in it but the side has since moved to https. Other sides moved to totally different URLs and there were even a few which didn’t exist any more. Some entries lacked the http:// or www part because it was a legacy database with suboptimal validation. Time to do some housekeeping for those URLs.

I searched for “crawl” on unsplash.com and it came up with this photo by Alex Blăjan.

I needed a method which I can call with the existing URL and it returns the new URL or nil in case that URL doesn’t work any more. I spare you the details but this is not as easily done as said. Redirections have to be handled properly over multiple steps. I tried a lot of different tools and gems for this job. After testing a couple of stackoverflow solutions I stumbled upon the curb gem (https://github.com/taf2/curb) which uses libcurl.

With that I was able to solve the problem with just a few lines of code:

def checked_url(url)
begin
result = Curl::Easy.perform(url) do |curl|
curl.head = true
curl.follow_location = true
curl.timeout = 3
end
result.last_effective_url
rescue
nil
end
end

And here is the test. Let’s assume I want to check the url nyt.com

irb(main):042:0> checked_url(‘nyt.com’)
=> “https://www.nytimes.com/"
irb(main):043:0>

As always: If you need Ruby on Rails consulting => https://www.wintermeyer-consulting.de and follow me on https://twitter.com/wintermeyer