I built an intelligent Tool for Extracting and Formatting Phone Numbers

Daniel Osi
4 min readDec 3, 2021

--

So on Monday, while building a static website for a client in the UK, I needed to add a UK phone number to the contact section. I realized I didn’t quite know how to write it properly, so I did a quick Google search for “proper way to write a UK number” and then “UK phone number format”. I came across a bunch of articles from Quora, Medium and other sites attempting to explain it, which helped me by the way.

With my experience of the web, I had come across various online tools in the past which helped me do basic things like convert JPEGs to PDFs, compress PDFs, extract and validate emails, etc, so I kept looking at search results to see if I could find a tool to properly format this phone number quickly and I found none. I did some further searches for “Phone number formatter” and then “Phone number extractors” and found that a few phone number extractors were ranking on Google. Decided to try some of them out and discovered that not only did they NOT do the job properly, a lot of them were slow and bloated with adware.

So I asked myself two questions:

Why couldn’t I find a decent phone number extraction and formatting tool online in a world filled with increasingly lazy, social media obsessed people (like me)?

Could the reason be that it is difficult to write a decent phone number extractor or formatting tool? Or that people have just chosen to overlook this problem? Or that it isn’t a problem that was worth solving?

Regardless I decided to write one, to solve my own problem (partly because that wasn’t the first time I had needed to format a phone number when doing some work online). So I sat down with this problem and started breaking it down bit by bit.

A decent phone number extraction and formatting tool should be able to:

  1. Extract phone numbers from any imaginable piece of text no matter how well or how badly arranged
  2. Be able to capture phone numbers from the over 100 different ways that people write phone numbers
  3. Be able to properly format and arrange the extracted phone number(s) based on standard E.164 formatting conventions or recognized national conventions
  4. Be able to do all this in less than 1 microsecond

On dissecting this problem, I realized that it wasn’t actually going to be as easy as — say an email extractor. Given that there are literally over 100 different ways that people write phone numbers, and that different countries have their dialing codes (both local and international), and different conventions and ways of writing them.

At first it seemed like a trivial task, but I soon realized that people write phone numbers in so many different ways it will make your head spin — literally. After going through some listing sites like Jiji.ng, I found every possible combination imaginable — including with / without international prefix, number-to-letter substitution, intended obfuscation, varying block size, brackets, spaces, dashes — Easy for humans to recognize, but pretty hard for a computer.

So I sat down to draft the algorithm using pseudocode at first, before translating it to PHP. It attacks the problem using a combination of intelligent string manipulation and really beautiful regular expressions. While writing, I realized how much I’d missed writing regular expressions and low level programs (last time I indulged was back then in school when we were studying computer science theory).

Fast forward 3 days later.

The final result does a great job of capturing a vast array of various formats of valid phone numbers while avoiding false positives. Supported countries at this time include United States, United Kingdom, Australia, Canada and Nigeria. Also added a module to extract phone numbers directly from supplied URLs, and a module to export to CSV and TXT files.

While this system largely works (95% of test cases), it is not very sophisticated and has its limitations. A better approach would utilize machine learning and fancy algorithms, but since I didn’t want this to turn into a science project and I don’t know much about machine learning, I took this approach.

If you’d like to try it out, it’s currently at:

https://classes.ng/tools/phone-number-extractor/.

And is available for free.

Extractor landing page

To be honest I don’t know if anyone is going to find this tool useful, but I figured that people who need some help with data scrapping or digital marketing or anyone who just needs to format a phone number properly may find some use cases.

For the stack used, you’re probably expecting something fancy like React, Vue.js, Laravel and Node.js etc. Nah, it’s just plain PHP at the back and CSS (bootstrap) at the front. Sorry to disappoint the React and Vue fanboys though, would probably rewrite it using something more modern if this tool becomes useful to anyone.

As a bonus, I added a good ol’ email extractor as well:

https://classes.ng/tools/email-extractor/

I’d like to hear any thoughts or opinions or suggestions for improvements (not on the look or the tech stack used — I’m an old fashioned dev and critiquing me for not using React or Vue.js is NOT going to change my mind), but on the algorithm and real life use cases. Let me know if it solves your problem in anyway.

You can always reach me at osi@classes.ng.

Thanks for reading.

My name is Daniel Osi and I’m the Founder, CEO & CTO of Classes.ng. I spend my time dreaming about new ideas, writing old-fashioned code and trying to catch up with modern trends.

--

--