Sinch Blog
Published in

Sinch Blog

Detecting Phishing in URLs — How to approach the problem? (Part 1)

The days when the main way to get scammed online was through spam emails are long gone. Nowadays there are many tools to filter spam e-mails. Moreover, people are better ‘trained’ to notice when an e-mail seems like spam.

When it comes to the messages we get through our cellphone (by a conversation platform like SMS) the scenario is different. The people behind this type of scam are always changing the approaches to create phishing attacks and this volatility is one of the main reasons why people fall for these scams.

Approaches to phishing detection

We can split detection approaches into two main groups: i. database-oriented approach (reactive methods) and ii. Heuristics-oriented approach (proactive methods).

They differ in the way the phishing detection occurs: while reactive methods mainly consist of lists available in several internet services where the URLs are reported as phishing, in the heuristics methods, features are extracted from the website/URLs and heuristics are used to check the legitimacy of the page. Our approach is a proactive method.

The database-oriented approach

Also known as traditional or list-based approach, this group encompasses methods for phishing amounts of data detection that consider lists of labeled URLs.

Two types of lists are possible: allow lists — those that have URLs that are certainly to be safe, and block lists — those that have URLs that were reported as phishing.

This technique is known to have very low false-positive rate, but the time factor is a disadvantage. As list-based tools generally depend on the confirmation by other parties that a URL sent to the database is phishing, the time between sending a URL to such databases and the confirmation of phishing may be enough for a phishing attack to occur. Moreover, phishing URLs are short-lived, therefore, the phishing attack might be complete by the time the URL is added to the list.

The heuristics-oriented approach

The problem of the delay between a phishing attack and its detection is mitigated by using a heuristics-oriented approach to phishing detection.

The idea behind this approach is the analysis of a set of characteristics of a website that will enable the prediction of the legitimacy of the page. This is usually done by a classification model — and this is where Artificial Intelligence (AI) can be of great use!

We can use Machine Learning, a branch of AI, to build models that will consume information about the websites we want to classify as legitimate or phishing. With these models embedded into applications, we can detect phishing in real time.

The possibilities are enormous and the choice of methods for phishing detection will depend on several factors, such as:

  • Processing time allowed (more complex heuristics will require more processing time).
  • To which extent the methods can extract information from the websites according to local law (this will vary depending on GDPR guidelines on different countries).

When it comes to the set of features that we can use in those models, there are three main groups:

  • Content based features (from the source code of the page)
  • URL based features (looking specifically at the URL)
  • Visual similarity features (visual content of the website)

Below, a summary of the approaches and its ramifications:


At Sinch we use URL based approach using Machine Learning to help us detect whether a URL is phishing or legitimate. We process a huge volume of data per second, so we need to have a solution that is quick and able to process great amounts of data. In the next post we will get into more details on how to build a machine learning model to do that.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store


Follow our publication to see stories about technology and culture written by Sinchers!