Road to Mastery: Building an Open Source Package

Alexandre Olival
9 min readSep 3, 2020

--

Photo by NASA on Unsplash

Finally, it happened. Years of writing code in a slouched and definitely-bad-for-posture sitting position, preceded by a few more years of learning how to actually do it, culminated in this moment.

I stopped d̵o̵u̵b̵t̵i̵n̵g̵ ̵m̵y̵ ̵a̵b̵i̵l̵i̵t̵i̵e̵s̵ feeling too embarrassed to put anything out there and, together with a friend, built a PHP package that will validate European national identification numbers and if possible extract personal information from them.

The logo, in its full Papyrus Font + Royalty Free image edited in GIMP glory.

I thought I would tell you where the idea for it came from and some technical insights. I also have a suggestion for you, dear reader, in the end. Let’s begin!

The Inception

Currently I’m working as a software engineer for Samlino.dk. It is a price comparison application as a service that finds you the best deals in Denmark for multiple services from broadband to personal loans.

Working with viking folk is fine, until you get to the performance reviews (depicted above).

Personal loans happens to be where I got the idea for the package. When the user opens the form to write down their personal and financial information as to request a loan, the very first screen asks you for something called a “CPR” (short for “Centrale Personregister”).

“Curious”, I thought…

This is the danish national identification number, and banks require it before we send a request for a loan. It identifies the citizen and allows us to check their age, as to not give 25 dollar loans to twelve years olds wanting to buy Minecraft.

But hold on, they can know the citizen’s age!? Well yes.

Being Portuguese, this was surprising. Our national identification number can only be validated by an algorithm, but encodes no personal information on the citizen bearing it. However, as one can see by this Wikipedia article, this is not the case for all European numbers. Some even encode the region of the country in which the citizen was born or registered.

Anyway, in order to validate and extract the birthdate of whatever CPR was input by our customers, I used this neat npm package.

Taken from the package’s GitHub repository README.md

Getting our hands dirty

I quickly wondered how cool it would be to aggregate all ways to validate and extract data from European countries. One single package with a simple API in which you could input an identification number and a country, and it would return whether it was valid or not and whatever citizen data could be extracted, if possible. So I texted my friend and thus Socrates was born.

Why “Socrates”? A friend suggested that name because Socrates, apart from being a badass philosopher, was an internationalist ahead of his time. Here's one of his quotes (which happens to be one of my favourites ever):

I am a Citizen of the World, and my Nationality is Goodwill.

Armed with all the motivation we needed, and a package name, we quickly got to work.

However, as one would expect, attempting to find all the information soon became a challenge and opened a huge can of worms. Two major problems arose…

Problem #1: what constitutes a “national identification number”?

A quick look through that Wikipedia article, blessed be its existence, will lead you to conclude that not all countries view identification numbers the same way. Some have several, others have a single one. Sometimes the definition of what is the identification number is very very fuzzy.

As such ,we relied on frequency of use, questioning citizens from those countries and cross-referencing the information in the article with whatever information we could find on Google. Many times we had to Google the information in the native language and then translate it. Which also segues nicely into the second problem:

Problem #2: how are they validated, and do they encode any information?

Oh boy.

At the very beginning of our journey, me and my friend knew that we would eventually hit several walls related to this. Unlike Denmark, Portugal or Spain, most countries lack an official source for the validation algorithm or the information of whether they encode personal information or not.

We ended up having to dig through Wikipedia (both in english and in the native language for each country), rummaging through old archived websites and forum posts and then cross-referencing with whatever actual examples of identification numbers we could find for a given country.

Some, we were lucky to find number generators for. Others, we had to use existing packages that would validate and also generate IDs we could then test with.

We got as far as having to image search identification documents for each country hoping (that is, praying) that at least some of the numbers were not invalid examples.

I admit that many of the validation algorithms we found have their origin now lost in the ether of my memory. It was a hard and grueling task.

But we nailed it.

…save for four countries.

So how does it work?

Inspired by this talk from Jeffrey Way, I thought the very first thing we should do technology-wise was figure out the public API for the package, as well as its internal structure.

Riffing with my friend amidst a now lost voice call.

We settled on sticking to two public methods:

validateId and getCitizenDataFromId

Simple, concise and descriptive enough, even a Java developer could figure it out 🕺 (I’m joking, please don’t J2EE me into oblivion). Both methods receive the ID as the first parameter, and the country’s two digit ISO code as the second:

Socrates for everyone

We wanted to give a special treat to the framework we both love, so you can also use the package the “Laravel way” with a Facade (if you’re on Laravel of course):

Socrates for ~web artisans~

Behind the scenes, after sanitizing the input and making sure all is well, the code decides which validation or extraction class to instantiate per country by looking through an array which lists all the supported countries and their respective classes, keyed by their ISO code.

Those classes encapsulate the logic to validate, or extract information from, a national identification number.

Here is how we get an Extractor instance:

The factory class responsible for giving us the concrete implementation of the Extractor
The neat little array it goes through

Note: some of you may recognize a Software Pattern here, but I’ll refrain from going into detail on that, as it is not the point of the series.

You’ll notice that the factory return type is a generic CitizenInformationExtractor. For maximum flexibility, both the Extractor and Validator factories have their return type typed to an interface. That way, all we really need to worry about when implementing a new country is conforming to it. Here’s the interface to which all Extractor classes must conform. It’s probably what you would expect:

And what is Citizen? Just a POPO to make accessing the extracted citizen data across countries more consistent and ergonomic. There’s a really nasty tendency in the PHP world to avoid type safety and robustness (the perils of a dynamically typed language). At the very beginning, we too almost fell in the trap of having the method return a simple array. But I feel this:

Is infinitely better than this:

Why miss out on the benefits of autocompletion and robustness that comes with a class? Tsk tsk

This is pretty much the gist of it. We also added a request validator as an extra treat for Laravel devs, so you can validate incoming requests which have a National Identification Number using the validation rule we provide 🎉.

Remember, you can check the code and documentation here.

To use the package in your own project, simply pull it in using composer

composer require reducktion/socrates

Conclusion

Being able to build a package and give back to the community was a long standing objective for me and my career. I definitely expect to bring more stuff out and keep supporting Socrates. A special thanks is in order for everyone who helped me and my friend and put up with our long hours of research and coding, especially during these strange times.

Thank you for reading. As always be sure to check out my other posts and, if you’ll suffer me, my new personal website I built during the lockdown. Stay safe out there, build something (anything) and wear a mask.

Addendum

If you visit the repository, you will notice that it is not listed under mine or my friends GitHub account, but rather an organization we both created: Reducktion. This is Open Source and as such everyone is very welcome to contribute, especially if you can nail those last four countries that we could not implement. We would love to know your feedback and suggestions.

If, however, you somehow resonate deeply with this quote from DHH:

The MIT license is often just lumped in with other open source licenses because of its compatibility with the likes of GPL or other copyleft licenses. That makes it seem like they’re just really flavors of the same thing, but they’re not. In many ways, I consider the MIT license to be as different from copyleft licenses like the GPL as it is from commercial proprietary software.

The MIT license to a large extent is the anti-license. The utopia of socialized programs, one that embraces the lack of marginal cost for software goods.

It’s an explicit rejection of the strong-property rights approach taken by both Gates and Stallman at their respective ends of the libertarian spectrum.

It’s the language of giving without expecting anything in return. It’s the language of sincere charity. A charity without strings attached, neither commercial nor reciprocal. With the risk of sounding sanctimonious, I read it as a pure projection of altruism.

It’s kinda funny to analyze the MIT license from this perspective, because I do remember feeling the pull of a primordial debt to the software community when I started Rails. A motion to give back now that I had something to give. I was born into the software community through the grace of open source, and now I had the opportunity to participate as a contributor, and it felt wonderful.

But it felt like that exactly because there was no sword hanging over my head. Nobody telling me that this is what I ought or had to do. No one expecting me to do it. So it was an act of volition rather than one of duty. A truly authentic choice.

That to me is freedom.

The freedom to first pursue self-actualization in making something in my image. The best I possibly knew how. Again, not as free labor, but as a literal labor of love. As an amateur in the original sense of the word.

(…)

To borrow a phrase from Stallman, “free labor” under free as in freedom, not free as gratis. Free from demands, free from debt, shame, and repayment.

(…)

So. To kick off this mindset, I’d like to borrow an ancient concept from the history of debt. The jubilee. I hereby declare a jubilee for all imagined debt or obligations you think you might owe me or owe the Rails community as a whole. Let no one call upon you to ever feel obligated to repay this vanquished debt. Contribute to the Rails community because it brings meaning to your life. Because writing Ruby sparks joy. Don’t participate if it doesn’t.

Either way, you’re whole and we’re square ✌️

A gratis tablet of dramamine for the nausea of our otherwise market-soaked lives along with an open invitation to make some socialized software together.

…then I think you should hit me up on Twitter and join our GitHub organization as well. Who doesn’t love working and hanging out with like minded people? 😀

--

--

Alexandre Olival

Developer with a tendency to multiply database queries way past the intended amount.