ChatGPT for parsing addresses and geocoding

John R. Ballesteros
6 min readFeb 1, 2023

--

As stated by OpenAI, “ChatGPT is a model called ChatGPT which interacts in a conversational way. The dialogue format makes it possible for ChatGPT to answer followup questions, admit its mistakes, challenge incorrect premises, and reject inappropriate requests. ChatGPT is a sibling model to InstructGPT, which is trained to follow an instruction in a prompt and provide a detailed response.” ChatGPT is a hot topic these days, whereas geocoding, producing coordinates for pois and addresses, is an old, but a challenging topic in geography due to their sub-structured shape. This blog tries to answer questions like: Can ChatGPT parse addresses in different formats?, US standard and other ones?, can it produce coordinates directly?

Parsing addresses

There are two most common ways of parsing an address:

  1. Tokenization: It refers to the process of splitting an address in composing parts (‘tokens’). The parts have to belong to the standard structure of an address.
  2. Generation: It is creating a new address, that has sense, based on an input.
  3. Geocoding: It is the process of finding geographic coordinates (latitud, longitude) for address or points of interests (POI).

Parsing addresses with ChatGPT

ChatGPT is a generative model, which means that every request to its api might create a different response, in the case of addresses that behaviour would be problematic. Next examples show the results.

Tokenization

How ChatGPT splits address components.

Example 1:134 Niles St, Elizabeth, NJ 07202, EE. UU.”

When writting this address directly to ChatGPT, answer is:

“This is an address located in Elizabeth, New Jersey, United States of America. The street name is Niles St and the zip code is 07202.”

The very same address can be written in very different ways by people, see example 2.

Example 2:134 Niles St, Elizabeth, New Jersey, USA”

Answer is: “This is an address located in Elizabeth, New Jersey, United States of America. The street name is Niles St and the street number is 134.”

However, when specific instructions are given to ChatGPT, for instance:

Example 3: ‘parse address: “134 Niles St, Elizabeth, NJ 07202, EE. UU.” ’

Answer is: “This address can be parsed as follows:

  • Street Number: 134
  • Street Name: Niles St
  • City: Elizabeth
  • State/Province: NJ
  • Postal Code: 07202
  • Country: EE. UU. (United States of America)”

Example 4: ‘tokenize this address: “134 Niles St, Elizabeth, NJ 07202, EE. UU.” ’

Answer is: “The address “134 Niles St, Elizabeth, NJ 07202, EE. UU.” can be tokenized as follows:

[“134”, “Niles St”, “,”, “Elizabeth”, “,”, “NJ”, “07202”, “,”, “EE. UU.”]”

Let’s see if ChatGPT can tokenize addresses in other countries:

Example 5: “Calle San Juan # 70 111, Medellín, Colombia, Sur América”

Answer is: “This is an address located in Medellín, Colombia, South America. The street name is Calle San Juan and the street number is 70 # 111.”

Example 6: tokenize this address: “Calle San Juan # 70 111, Medellín, Colombia, Sur América”

Answer is: “The address “Calle San Juan # 70 111, Medellín, Colombia, Sur América” can be tokenized as follows:

[“Calle San Juan”, “#”, “70”, “111”, “,”, “Medellín”, “,”, “Colombia”, “,”, “Sur América”]”

Example 7: tokenize “Carrera 70 # 44 - 111, Medellín, Colombia, Sur América”

Anser is: [“Carrera 70”, “#”, “44”, “-”, “111”, “,”, “Medellín”, “,”, “Colombia”, “,”, “Sur América”]

What about incomplete or with erroneous words:

Example 8: tokenize “Karrrera 70 Num 44, Med, col”

Answer is: [“Karrrera 70”, “Num”, “44”, “,”, “Med”, “,”, “col”]

Generation

How ChatGPT deals with incomplete or address with errors.

Example 1:134 Niles, Elizabet, New Jesey, US”

Answer is: This is an address located in Elizabeth, New Jersey, United States. The street name is Niles and the street number is 134.

Example 2: “13 Nile, Elizabet, N Jesy, US”

Answer is: “This is an address located in Elizabeth, New Jersey, United States. The street name is Nile and the street number is 13.”

Example 3: “Correct this address: “13 Nile, Elizabet, N Jersy, US” ”

Answer is: “The correct form of the address is: “13 Nile Street, Elizabeth, New Jersey, United States”. ”

“Example 4. “Callle San Juan Num 70 111, Medellín, Colombia, Sur América” ”

Answer is: “The parsed address is:

  • Street name: Calle San Juan
  • Street number: 70
  • Building or apartment number: 111
  • City: Medellín
  • Country: Colombia
  • Continent: South America

Note that the address is written in Spanish and “Sur América” refers to South America. The address refers to a building located on Calle San Juan, number 70, apartment 111, in the city of Medellín, Colombia, on the continent of South America.”

Example 5: “Karrera 70 № 44 111, discoteca la 70, med, col”

Answer is: “The parsed address is:

  • Street name: Carrera 70
  • Street number: 44
  • Building or apartment number: 111
  • Establishment: Discoteca La 70
  • City: Medellín
  • Country: Colombia

Note that the address is written in Spanish. The address refers to a building located on Carrera 70, number 44, apartment 111, which is the location of the establishment “Discoteca La 70”, in the city of Medellín, Colombia.”

Try for yourself these two cases:

Example 6: “Cl. 10 Num 43, hotel la 10, el poblado, mde”

Example 7. Complete this address: “Cl. 10 Num 43 med”

Geocoding

How ChatGPT produces geographic coordinates.

Example 1: “Locate this address: 134 Niles St, Elizabeth, New Jersey, USA”

Answer is: “134 Niles Street, Elizabeth, New Jersey, USA” is a residential address located in the city of Elizabeth in Union County, New Jersey, United States. You can use online mapping tools such as Google Maps to get the exact location and obtain driving or walking directions.

Example 2: “Geocode this address: “134 Niles St, Elizabeth, New Jersey, USA” ”

Answer is: “Geocoding is the process of converting an address into geographic coordinates, such as latitude and longitude. For the address “134 Niles St, Elizabeth, New Jersey, USA”, you can use a geocoding service or API, such as Google Maps Geocoding API or OpenStreetMap Nominatim, to obtain the exact geographic coordinates of the location.”

Example 3: “Find coordinates for this address: “134 Niles St, Elizabeth, New Jersey, USA” ”

Answer is: You can use online mapping services such as Google Maps or Bing Maps to obtain the coordinates for “134 Niles St, Elizabeth, New Jersey, USA”.

As an example, you can use the following link to find the coordinates on Google Maps: https://www.google.com/maps/place/134+Niles+St,+Elizabeth,+NJ+07208,+USA/@40.6680452,-74.2165255,17z/data=!3m1!4b1!4m5!3m4!1s0x89c258a01c8e7151:0x5c5c41b5b1c8caac!8m2!3d40.6680452!4d-74.2143368

The coordinates for “134 Niles St, Elizabeth, New Jersey, USA” are approximately 40.6680452 latitude and -74.2143368 longitude. Please note that the accuracy of the coordinates may vary depending on the source and method used to obtain them.

Suggested coordinates by ChatGPT. They are a 2.5km distant from actual coordinates for 134 Niles St, Elizabeth, NJ, USA.

CONCLUSIONS

  • ChatGPT makes a great job in tokenizing addresses not only in the US but other countries, for example in Colombia, South America.
  • It had no problems tokenizing addresses written in spanish.
  • ChatGPS got better results when a specific instruction was made before the address, for instance: “tokenize”, “correct” or “parse”.
  • ChatGPT was able to correct writing errors in addresses.
  • It can not directly geocode addresses and the coordinates suggested via third party apis like Google Maps were not correct.
  • ChatGPT restricted the use when more than 20 addresses where asked in less than an hour.
  • Time responses for all the examples were above 1 second, which might be important to consider for certain applications were responses are needed in a matter of miliseconds.

Support me

Enjoying my work? Show your support with Buy me a coffee, a simple way for you to encourage me and others to write. If you feel like it, just click the next link and I will enjoy a cup of coffee!

--

--

John R. Ballesteros

Ph.D Informatics, Assoc. Professor of the UN, Med. Colombia. GenAi, Consultant & Researcher AI & GIS, Serial Interpreneur Navione Drone Services Co, Gisco Maps