Farm to Table with Placekey & Overture

Leveraging Placekey to Alleviate Arduous Places Research

Jarredparrett
6 min readMar 1, 2024

Abstract:

By joining FSIS Meat, Poultry and Egg Product Inspection Directory of about 6,900 rows to the 57 million rows of Overture Places on Placekey, I enriched 42.8% or 2,986 records with an exact Placekey match lending to attribution of phone, website, socials, emails and centroids, and for 61.54% or 4,293 records, I recorded an ‘Address Placekey’ match indicating a possible record associated to the location at the same address. Overall, a considerable uplift and introduction of a new places research approach that scales and can be run in the free Colab tier.

Colab Notebook: Overture and USDA Join by Placekey

No matter the role, we’ve likely all faced the challenge of inadequate places sources and the need to enrich them. It’s a common problem that stems from the pain that is research starting at a location name and address. The story goes: you find some type of source that contains location name and address, you then manually go row-by-row to find the values that correspond to each row and in the end (days to weeks later) you are left with your enriched document after all the effort.

Early in my career, I collaborated with a founder in agriculture, aiming to enhance protein supply by leveraging local farming and processing networks. For meat consumers, the journey from farm to table — via an abattoir — is often overlooked, until we probe deeper into the ‘where’ and ‘what’ of these intermediate stages. This was one of my first places tasks: understand the network that is the farm-to-table.

Where?

Where are these facilities?

I found myself on the USDA website reviewing the FSIS Meat, Poultry and Egg Product Inspection Directory — a resource providing all USDA-regulated meat, poultry, and/or egg inspection facilities with address, name, DBA, and activities of the location. It was the first step in understanding if the model proposed would actually work. Dropping these into Google Maps quickly answered the question of the proximity of these places to major metropolitan areas that may be consumers. So that covers the where but the question remains: what are these places like and does finding this out call for the the worlds most morbid road trip?

What?

“How do we understand what these facilities are actually like? Moreover, how can we contact them? Do they have a public storefront?”

I quickly found an inverse relationship between visibility on search engines and these facilities with many of them being only identifiable by a street view exploration. Often, after a considerable search effort with some local knowledge, I stumbled upon a stray listing on an aggregator site or a long-forgotten social media page — even for the best researchers, it could be difficult. Back then, I faced an effort to go location by location finding websites, contact info and any trace of these places — some even prompted a visit to the site to understand even more.

Placekey expedites this by providing a means of applying a unique entity ID to enable joins to seemingly unrelated sources.

Applying Placekeys is made easy by leveraging the placekey-py library’s management of api and bulk api calls with built in rate limiting. Below, I get values for ~7000 rows in a single function.

import pandas as pd
from placekey.api import PlacekeyAPI

df = pd.read_csv("/content/MPI_Directory_by_Establishment_Name.csv")
pk_api = PlacekeyAPI("api_key")

def get_placekeys(df):
location_infos = df.apply(lambda row: {
"location_name": row["Company"],
"street_address": row['Street'],
"city": row['City'],
"region": row['State'],
"postal_code": str(row['Zip']),
"iso_country_code": "US"
}, axis=1).tolist()

placekeys = pk_api.lookup_placekeys(location_infos)
return [pk.get('placekey') if pk and 'placekey' in pk else None for pk in placekeys]

df['placekey'] = get_placekeys(df)

With my data Placekey’d and the recent introduction of Placekey’s skinny mappings for data partners, linking Placekey identifiers to other sources’ IDs, simplifies data integration without needing additional enrichment from other sources. The Overture dataset includes POI features such as websites, social media profiles, phone numbers, and email addresses for about 57 million places. So lets dive in!

import polars as pl
mpi = pl.read_csv('/content/MPI_Directory_by_Establishment_Name_Plackey.csv')
mpi_placekeys = mpi["placekey"].to_list() + mpi["address_placekey"].to_list()
placekeys_to_retrieve =list(filter(lambda item: item is not None, (mpi["placekey"].to_list() + mpi["address_placekey"].to_list())))
overture = pl.scan_csv('/content/Overture_To_Placekey_Skinny_File.csv').filter(pl.col("placekey").is_in(placekeys_to_retrieve) | pl.col("address_placekey").is_in(placekeys_to_retrieve)).collect()
mpi_joined_overture_placekey = mpi.join(overture.unique(subset=['placekey']), "placekey", "left")
mpi_joined_overture_placekey_address = mpi.join(overture.unique(subset=['address_placekey']), "address_placekey", "left")
del mpi
del overture
# USDA values in Overture matched on location name and address: 42.8%
print(f"USDA values in Overture matched on location name and address: " + str(pct_not_null(mpi_joined_overture_placekey,"id")) + "%")
# USDA values in Overture matched on address: 61.54%
print(f"USDA values in Overture matched on address: " + str(pct_not_null(mpi_joined_overture_placekey_address,"id")) + "%")

First, pull in the latest release for Overture.

!aws s3 --no-sign-request cp s3://overturemaps-us-west-2/release/2024-02-15-alpha.0/theme=places/type=place/ /content/data/latest --recursive

Second, create a list of values from our join to retrieve as a scan filter for the Overture files.

import polars as pl
import os

ids_to_retrieve =list(filter(lambda item: item is not None, (mpi_joined_overture_placekey["id"].to_list() + mpi_joined_overture_placekey_address["id"].to_list())))
directory = '/content/data/latest'
dataframes = []
for filename in os.listdir(directory):
if filename.endswith('.parquet'):
file_path = os.path.join(directory, filename)
df = pl.read_parquet(file_path).filter(
pl.col("id").is_in(ids_to_retrieve)
)
dataframes.append(df)

combined_df = pl.concat([df for df in dataframes])

Finally, join back to our input source to derive the join with Overture values — expediting the research process.

mpi_joined_overture_placekey_with_overture = mpi_joined_overture_placekey.join(combined_df, "id", "left").filter(pl.col("id").is_not_null()).to_pandas()

Here are some example rows and quickly plotted on a map:

EstNumber EstablishmentID Company Street City State Zip Phone GrantDate Activities DBAs placekey address_placekey building_placekey id address_placekey_right building_placekey_right geometry bbox names version updateTime sources categories confidence websites socials emails phones brand addresses longitude latitude  
0 M17395 + P17395 6243 50th State Poultry 98-715 Kuahao Place Pearl City HI 96782 (808) 845-5902 07/07/2021 Meat Processing, Poultry Processing zzy-223@8m3-vqr-73q 225@8m3-vqr-x5z 225@8m3-vqr-x5z 08f46482d81281320347e0d4e50e8dd0 225@8m3-vqr-x5z 225@8m3-vqr-x5z b'\x00\x00\x00\x00\x01\xc0c\xbe\x97\n\x07\xc0\xbf@5dX\x8f\rN\x9d' {'minx': -157.9559374, 'maxx': -157.9559374, 'miny': 21.3919763, 'maxy': 21.3919763} {'primary': '50th State Poultry', 'common': None, 'rules': None} 0 2024-02-11 0:00:00 [{'property': '', 'dataset': 'meta', 'recordId': '112121433491817', 'confidence': None}] {'main': 'food_beverage_service_distribution', 'alternate': array(['food'], dtype=object)} 0.9746751083 ['http://50thstatepoultry.com'] ['https://www.facebook.com/112121433491817'] ['+18088455902'] {'names': {'primary': None, 'common': None, 'rules': None}, 'wikidata': None} [{'freeform': '98-715 Kuahao Pl', 'locality': 'Pearl City', 'postcode': '96782', 'region': 'HI', 'country': 'US'}] -157.9559374 21.3919763
1 V32006A 6163664 Frozen Assets Cold Storage 555 Northwest Avenue Northlake IL 60164 (708) 292-0950 03/06/2020 Certification - Export, Certification - Trichinae, Food Inspection, Identification - Meat, Identification - Poultry 1g6hsobcth@5sb-7y4-t35 222@5sb-7y4-t35 222@5sb-7y4-t35 08f275926429a2ca0355b5d47d5032d0 222@5sb-7y4-t35 222@5sb-7y4-t35 b'\x00\x00\x00\x00\x01\xc0U\xfa\xc0\xba\xdb2\xc1@D\xf5\xa8\xe5f\xf7\x86' {'minx': -87.9180133, 'maxx': -87.9180133, 'miny': 41.9192168, 'maxy': 41.9192168} {'primary': 'Frozen Assets Cold Storage', 'common': None, 'rules': None} 0 2024-02-11 0:00:00 [{'property': '', 'dataset': 'meta', 'recordId': '361552830949663', 'confidence': None}] {'main': 'storage_facility', 'alternate': array(['food_beverage_service_distribution'], dtype=object)} 0.6330845771 ['https://www.facebook.com/361552830949663'] {'names': {'primary': None, 'common': None, 'rules': None}, 'wikidata': None} [{'freeform': '555 Northwest Ave', 'locality': 'Northlake', 'postcode': '60164-1606', 'region': 'IL', 'country': 'US'}] -87.9180133 41.9192168
2 M8406 + P8406 + V8406 127325 Mennella'a Poultry 100 George Street Paterson NJ 7503 (973) 345-1300 07/20/2021 Meat Processing, Poultry Processing, Certification - Export 22y-223@627-tzc-bc5 22y@627-tzc-bc5 22y@627-tzc-bc5 08f2a1046622392a0380d40cea2d8b9d 22y@627-tzc-bc5 22y@627-tzc-bc5 b'\x00\x00\x00\x00\x01\xc0R\x89\xf6f\xfa\n\x1d@Dr\xc3\xb2q\xd7\xb4' {'minx': -74.1556642, 'maxx': -74.1556642, 'miny': 40.8965972, 'maxy': 40.8965972} {'primary': "Mennella's Poultry Co", 'common': None, 'rules': None} 0 2024-02-11 0:00:00 [{'property': '', 'dataset': 'meta', 'recordId': '213534490599', 'confidence': None}] {'main': 'chicken_restaurant', 'alternate': array(['butcher_shop', 'professional_services'], dtype=object)} 0.9746751083 ['http://www.mennella.com'] ['https://www.facebook.com/213534490599'] ['+19733451300'] {'names': {'primary': None, 'common': None, 'rules': None}, 'wikidata': None} [{'freeform': '100 George St', 'locality': 'Paterson', 'postcode': '07503-2319', 'region': 'NJ', 'country': 'US'}] -74.1556642 40.8965972

That’s about it, enriched places by a join and not an arduous search.

Next?

Well, for me, hopefully a farm of my own one day, but until then, exciting places work and continuing the mission of providing the universal identifier for places.

If you have any questions or significant sources you want a skinny mapping to, please email me at jparrett@safegraph.com. Also, make sure to sign up for the Placekey API to make your joins easy and get the most out of your data.

--

--