Defusing a Legacy Rails Application

Legacy projects that nobody maintains are like time bombs. You never know when something breaks. And when it happens you don't have time to fix it. My oldest Rails app stopped working due to a software update and I had a hard time fixing it. And that was when I decided to take a proactive approach, remove many features that hadn't proven useful and rewrite the rest. It was an investment, but I believe it will save me a lot of pain in the future.

TL DR: The app showed various statistics about Slovak domains. I basically replaced the Rails app with a static web page and made a script to update it daily. If anything breaks, fixing one Ruby script is a piece of cake compared to a 5 years old app. And there are very few things to break.

5 years ago, when I started learning Ruby on Rails, I was attending a university course about web development. The teacher asked us to come up with ideas about our final projects. At first, I wanted to build a simple CMS, but he considered it too easy and didn't approve it. And that's how I came up with the crazy idea of crawling all Slovak domains and making statistics about technologies they use. It was my first Rails project.

I started with a simple rake task to download the public list of Slovak domains and save all of them in the database. Then I wrote a piece of code to download all the home pages and look for fingerprints of frameworks and libraries. Then I added IP geolocalization to find out where the domains were actually hosted. The final piece was a nice frontend to present all the data with a lot of charts and lists of domains filtered by registrars, holders, technologies and locations. You could also search for a particular domain and find out all the details about it. I deployed the app and made the data updates run periodically.

Later on, I added a few more features like trends and predictions of growth of domain registrars and holders. I created a new project to gather stats about world-wide domains and analyzed a sample of 3.7 million domains. Then I analyzed Spanish domains and .gov.uk domains.

There was one issue with the technology stats though. I did my best to find fingerprints of Wordpress, Drupal, Rails, PHP, .Net, jQuery, Google Analytics, etc. However, there was no guarantee I hadn't missed something. Also every piece of software constantly evolves and a lot of things can change. I added a note of caveat under each chart saying “These stats are not reliable, use them with caution!”.

How many people were interested in the statistics? Almost nobody. Well, from time to time someone used the technology statistics to show their client or boss or whoever that Wordpress is the future, as it is (according to the non-reliable stats) the most used CMS. Or a web hosting company would present the registrar growth stats to their customers. Someone definitively used the project, which is good for a school project, but doesn't validate the idea as a viable product. Although building a product was never my intention, I wanted to work on something, at least, useful.

In spite of that, it was a nice hobby project, I learned a lot of new things and kept it running for a few years. It didn't require much of my attention. Then things started to change and the project met the destiny of all the legacy projects. In August, SK-NIC updated their SSL software and deprecated all the older clients. I had to update too. A simple openssl-related issue ended up with an upgrade of the whole OS to a new version. Fortunately, the app survived. In November, another issue came up. I don't remember why, but I had to upgrade MySQL and guess what? The app stopped working. And I would have to completely rewrite the rake task downloading lists of domains to make it work. Since I was too busy with other projects and also had all the source data backed up, I kept the site without updates for one month and started thinking about the future of the project. Issues like these would be arising more often in the future and will be less fun to solve. And such an issue might come up at the least fortunate time when I'll be too busy working on something else. There are three things to do with a legacy project:

  1. Keep it running without software updates and pray to the software gods that nothing breaks. Occasionally, do the update and have time available to fix the issues that arise. And gradually rewrite parts of the code causing most issues.
  2. Completely rewrite the project and start maintaining the code. Use the latests libraries and your latest best practices. And again, never stop maintaining the code.
  3. Shut it down.

I decided to remove all the parts that were no longer used and difficult to maintain and completely rewrite the rest. So I started removing. Updates of technology and location stats involved downloading of the whole Slovak Internet (home pages of all the domains to be exact), the stats weren't reliable and nobody was actually interested in them so much. After I removed them, I noticed that domain lists and individual domain pages looked quite empty. Everything presented about a domain was just its registrar name, holder name and DNS server. And there's a dozen of projects out there showing the same stuff. After removing domain lists and pages, I realised that the whole site was just a few pages with charts. So I completely rethought the original design.

I built a single static page with all the important charts. It would download a JSON with the current data and show them on the charts. The data is updated once a day. It's mostly computed directly from the domain list. Data about trends require keeping track of the previous values in the database. Putting it all together is quite expensive and the results would have to be cached by the backend. Unless there is no backend. Everything is static and the only dynamic part, chart data, is updated daily by a script. The static page is hosted by Github. The update script runs on the original VPS, computes all the chart data and pushes updates to the static page repository. The script runs once a day, for one minute and consumes ~40 MB of RAM (compare it to a Rails app that needs to run constantly and requires hundreds of megabytes). It only needs Ruby, OpenSSL, Sqlite, Git and some standard OS stuff. When I decide to move it to a new server, I'll put the app inside Docker and make it even more OS-independent. There's hardly anything to break. The app can't be easier to maintain. I've hopefully found the point of covering 80% of the actually used features with 20% of effort. As simple as that.

If you used some features of the old version and now miss them, let me know. Everything I'm not ashamed of is open-sourced, so check the links on statistiky-domen.sk or my Github account. If you have any other ideas, I'll be thankful :)

Show your support

Clapping shows how much you appreciated Martin Lipták’s story.