QUANTRIUM GUIDES

Digital Address Verification

Our approach to verify addresses digitally

Akul Mangal
Quantrium.ai
Published in
7 min readAug 8, 2021

--

Why Digital Address Verification?

Address verification is one of the crucial step which enables businesses to confirm their employee’s, vendor’s or customer’s addresses. For example, the banks verify your address before opening an account, a mobile wallet provider verify your address before enhancing your transaction limits and likewise there are other use cases for businesses and compliance.

Physical address verification requires a person to physically visit the location and verify. This process is time consuming and has certain limitations to scale as it is dependent on the availability of the trusted manpower in the geographical location. Imagine, your business is growing multiple fold globally and this becomes a blocker in your business growth.

Quantrium offers solution to this problem with our Digital Address Verification Service and remove this hurdle in scaling your business in a smart, digital and cost effective manner.

Photo by Wesley Tingey on Unsplash

How do we do it?

Digital Address Verification validates the provided address by considering the address and few photographs with GPS metadata for verification. The Digital Address Verification service is powered by our AI models and analyses all this information to validate the address.

Our technology uses a two prong approach:

  1. Geolocation Matching
  2. Text Address Matching

The Geo-location Match converts a text address into a geolocation which is compared to the GPS location in the image metadata.

The Text Match, converts the geolocation into a text address for string to string comparison. Each method provide a score which is further translated into a final score based upon our intelligent contextual parameter called Population Density Proxy. The final score is the result of the address verification analysis model.

Geolocation Matching

We use Geocoding to convert a text address into geo-coordinates using Google API and this is compared with the GPS location on the image metadata.

Multiple Search Results

Google API returns the results in one pair of coordinates based on the top most match in Google Maps search results. Google does not always select the best address from the list of addresses. To resolve this issue, we get GPS coordinates for all the search results using Google Autocomplete API and Google Geocoding API and evaluate each one of them in our Geo-coordinate match algorithm which provides us the match with scores.

We evaluate these multiple search result coordinates one by one. At any instance, we have 2 coordinates. In each of these search result instances, we use the Haversine distance and calculate the distance between the search results’ coordinates and our image coordinates (reference point).

Then next step is to analyse each search result distance along with the Population Density Proxy and calculate the best match score using our custom model. For each input GPS address, we have associated with it a Population Density Proxy.

Population Density Proxy

Population Density Proxy is a variable determining how populated a particular address is on a scale of 1–6. Here, 1 indicates that the address is an extremely populated urban address and 6 indicates the address is sparsely populated.

Why Population Density Proxy?

Let’s say the difference between the 2 pairs of coordinates is 50 meters. Now a distance of 50 meters in an urban city’s prime populated location is very significant. Whereas the same 50 meters distance in a sparsely populated area is not that significant. To accommodate these 2 scenarios and to provide context to the distance between coordinates, the population density proxy has been included in both the Text Address match and Geo-coordinate match scores.

Calculating Population Density Proxy

There are two ways to calculate the value of the Population Density Proxy:

  1. Population Density Classification based on Census Data
  2. Petrol Pump Approach

Population Density Classification based on Census Data:

In this approach, we extract the pincode from the text address. Then we map the pincode to a sub locality and city using a dataset. Further, we search the sub-locality and city in the 2011 census population density dataset to assign a category between 1–6 as a proxy for the population.

The drawback of extracting data from a 2011 census population dataset is that the population of many areas have changed drastically since then which may lead to inaccurate results. Many places which were barely populated in 2011, may currently be highly populated. To resolve this issue we approached another method for the proxy variable calculation using petrol pump approach.

Petrol Pump Approach

In this approach we consider the density of petrol pumps to be an indicator of population density. There are 2 petrol pump approaches that are implemented:

  1. We consider 3 search radiuses: 0.5 km, 1.5 km and 2.5 km. We calculate the number of petrol pumps in each of these radiuses using Google Places API. We calculate a weighted score and scale it from 1–6.
  2. We calculate the average distance between the nearest 5 petrol pumps and the GPS coordinates from the image. This value significantly varies between areas of different population densities.

Then, we calculate the final value of the Population Density Proxy based on a weighted aggregation of the above two approaches.

Text Address Matching

In the text address matching approach we convert the geolocation into a text address for string to string comparison. The following flow chart illustrates this approach.

How to convert Geo-coordinates to an address?

To convert Geo-coordinates to an address, we use the concept of reverse geocoding. Reverse geocoding is the process of converting a pair of coordinates into a human readable text address. To accomplish this we use Google Geocoding API.

Multiple Pin Drop

The text address in reverse geocoding is very sensitive to the exact location. You may have noticed that when you drop a pin/marker on Google Maps, and you shift that pin slightly, the corresponding address changes drastically. To allow a small margin of error, instead of dropping 1 pin on the map which was the GPS coordinates from the image, we drop multiple pins around that point. We reverse geocode each of those point coordinates and evaluate all the corresponding addresses.

Multiple Bucket Comparison

After we reverse geocode all the multiple pin coordinates and get the corresponding text addresses, we perform string-string comparison. To perform better and accurate matching, we split that address into 4 sections:

  1. City
  2. Pin code
  3. Sub-locality
  4. Street/Complex/Area/Place of Interest

Since the Sub-locality section has 3 components, we further divide it into 3 sub-sections and compare those sub-sections: Vile Parle, Vile Parle East, Navpada with the sub-locality of text address and accordingly calculate a match score.

We compare the address entered by the user and the addresses we get by dropping the multiple pins one by one in a section-wise manner. If there are multiple subparts in one section, we dynamically divide it further into sub-sections and compare. Depending on the match, we get a score for each bucket out of 1.

Then, we take the cumulative score of all the buckets and pass it to a custom model along with the Population Density Proxy to get a final text address match score.

Pyphonetics

The words in Indian addresses are translated from the native languages. Hence spellings are not clearly defined.

“Bangalore” and “Bengaluru” sound similar? How do we ensure we get a perfect match for names like these even though the words are different?

In our model, we use the Pyphonetics library of python to get positive matches for names that sound the same.

Custom NER Model

When it comes to splitting the address into components/sections, we are currently using Google API to do that. But in some cases we have observed that it is not accurate. Another method we are working on is to make a custom Named Entity Recognition Model using Spacy to split the address into 6 components:

Text address match is not as accurate and reliable as Geo-coordinates Match because there are many variable factors such as old names of streets vs new names of streets which might affect the results. Therefore, the text address match is considered to be a supporting factor to the Geo-coordinates match.

Output and Conclusion

The model analyses all the above scores and returns the same along with a map snapshot showing the image coordinates and text address coordinates embedded in a PDF report containing the status whether the Digital Address Verification is:

  1. Approved
  2. Not approved but not rejected
  3. Rejected

It’s established that address verification is an essential to various processes across the industries. With everyone working remotely, the KYC agencies, HR professionals, businesses are facing a tough time in conducting address verification.

The conventional methodology is tedious and time-consuming. Document collection, data arrangement, vendor management and regular follow ups makes the process more difficult and inconvenient.

Quantrium’s AI Digital Address Verification service has the power to verify the physical address of employees, customers, vendors and business partners from their smartphones, without having to physically move an inch. It’s a simple, effective and efficient solution to the address verification problem.

--

--