A quick glance at a map of the U.S.-Mexico border would suggest to you that Mexican food is likely to be relatively popular in southern Texas, New Mexico and California. And in fact Mexican restaurant concentration corresponds closely with population identifying as ethnically Hispanic or Latino.
Some business clustering matches real-world demographic evidence and intuition. Sometimes, however, our intuition, or a map, won’t help as much.
Want to know just how far north the proliferation of taco joints spreads, and where else in the country Mexican food is big? Or wondering what parts of the U.S. are hottest for sports bars or pasta restaurants? Distance from Italy will only get you so far. Yelp data, and Yelp data science, can help.
You’ve likely encountered business clusters like the ones in the above map, even if the type of business or place was different. Ever searched for food in a new place, only to find nothing for miles before suddenly coming across a handful of options all at once? Clusters of restaurants occupying the same small area is no coincidence; network benefits that arise from proximity encompass everything from rent and labor to supply chain costs. This effect is known as economies of agglomeration and is easy to spot within a city, particularly for high-visibility businesses such as restaurants. But can we see the effect across larger swaths of area and for different business types?
To explore this question, I looked at where businesses in particular Yelp categories were located and whether and where there are statistically significant geographical hot spots for those types of businesses.
To do this, I first defined what I consider a hot spot to be. For this exercise, I defined a hot spot as a region that is both (a) a geographically contiguous — sharing a border, no matter how small; also known as Queen contiguity, a term borrowed from the allowed movement of chess pieces — group of ZIP Codes and (b) each ZIP code area has a similarly high percentage of all businesses that are in the given category (a quantity I’ll call PBC — Percentage of Businesses in Category). I used ZIP codes to capture highly local conglomerations of businesses in a way bigger geographical units such as counties might not achieve.
The critical question is: How do we know whether two neighboring areas are similarly rich in a given type of business, so that we have not just a warm speck, but a hot spot? That’s where we turn to a measure known as the local Moran’s I statistic. It computes, for each neighbor pair, how far each neighbor’s PBC value is from the average across all areas, as well as how similar the two neighbors’ PBC values are to each other. This allows us to identify all clusters of neighbors that have similar, and similarly high, PBC values. Keep in mind that two candidate clusters with similar PBC values could be very different: One could be made up of two small towns that each have 10 businesses, one of which is a museum; another cluster could be made up of two large cities that each have 1,000 businesses, 100 of which are museums. This method gives each area an equal chance of being labeled a hot spot, which may not always be intuitive.
The next step is to determine whether these clusters are significantly nonrandom enough to be deemed true hot spots. But how do we know that a particular distribution of businesses is nonrandom? Even with what is known as complete spatial randomness (CSR) — basically, every business is assigned a random location in the country — we’d expect to observe some degree of clustering due to chance.
To get at this question, I took all the observed PBC values across different locales and randomly placed them on the map in 1,000 different ways. From each mapping, I calculate a separate Moran’s I statistic for each neighbor pair, which generates a distribution of possible values of the statistic. The true Moran’s value can then be compared to the values in the 1,000 randomly generated distributions to determine the statistical significance of the true placement of businesses.
The resulting maps show, for each business category, the statistically significant geographical hot spots in the continental U.S. It is important to note that the high variability of population density in the U.S. means that smaller, high-density hot spots will not look as impressive on our map as would larger, less-dense hot spots. For example, if a large, sparsely populated part of Montana has a significantly high proportion of bookstores, it will look more visually striking than a small, densely populated part of Massachusetts with an equally high proportion of bookstores. A hot spot would look very different on a population-based cartogram with the size of each region determined by its population.
Still, you can discern plenty of interesting insights from the maps. Take, for instance, the hot spots that this method identifies for sports bars. Wisconsin — known for its die-hard allegiance to the Green Bay Packers and University of Wisconsin Badgers — is bathed in red. Seems like the country’s biggest sports fans can find a seat at the bar in the Midwest.
Cajun cuisine was born in the South after tens of thousands of French-Canadians immigrated to the U.S. — most of them to Louisiana — in the late 1700s, so it’s no surprise that this hearty cuisine with rural French roots still rules the region of its origin.
Residents of the deep South — also known as the Bible Belt — are known for their religious devotion; parts of the central U.S. are not far behind.
There’s no place like the central U.S. for farming and its associated equipage.
Pasta is most prevalent in the Northeast, where Italians first immigrated to the U.S. and settled, but it’s also growing in popularity in Florida.
There are pools aplenty in the warm climates of Arizona, California, Florida and Texas. That creates lots of business opportunities for people who clean them.
Where there’s snow, there are snow-removal businesses.. and there’s plenty of snow in the Northeast and Midwest.
Sporting-goods stores cluster around some of the best destinations for outdoor sports in the country: the Rockies!
Beer gardens are popular in Texas and the Northwest, with other hot spots scattered around the country — not only in spots where the climate is hot enough to drink outside year-round.
Curious about a category you haven’t seen mapped yet? Check out our interactive with dozens more maps, covering categories including bubble tea and synagogues.
Graphics by The DataFace.