Exploring Neighborhood Coffee Shop Footprints

Comparing neighborhood coffee shop scenes in 31 U.S. cities

Alex Shannon
data.tale()
24 min readOct 17, 2018

--

Introduction

It is a morning ritual. It is a vice. It is a necessity. It is the relentless jitter in your tapping leg. The warm happiness spreading through your veins on a cold winter morning. Coffee has bean (er… been) one of humanity’s favorite brews since its use first became widespread in the 16th Century. It is consumed in countless ways and in countless places, but since its origins, coffee has been inextricably tied to the institution which grew up alongside it, the coffee shop.

Mudhouse, a local coffee shop in the author’s hometown

Coffee shops can be found in about as many varieties as the drink itself, often reflecting the culture of the surrounding neighborhood; cozy nooks and open mic nights, coffee and donuts on the run, an agreeable meet-up spot for an informal business meeting. Indeed, one is hard-pressed to think of an institution so ubiquitous, yet so wildly varied, through which to capture a hint of the essence of a neighborhood. That is what I have set out to do here — use coffee shop data as a means of comparing neighborhood coffee shop scenes; any conclusions drawn are necessarily speculative and should be taken with a grain of salt¹, but I believe the methods used can provide useful insight into the American coffee shop culture, and hopefully provide some guidance to coffee-lovers on what neighborhoods they ought to explore (or avoid) in their quest for the perfect cup (or the perfect place to drink one). In what follows, I will outline a methodology for deriving a unique ‘coffee shop footprint’ for neighborhoods in 31 U.S. cities, ultimately using these ‘footprints’ to group and compare the coffee scenes in these neighborhoods.

Coffee Shop History — Part 1: Origins

While the primary concern of this post deals with analyzing neighborhood coffee shop footprints, in researching this post, I found the history of the institution too fascinating not to share. These asides delve into that history. They are purely for reader enjoyment, and can be skipped without missing anything of the primary purpose of the article.

Human coffee consumption likely began in Ethiopia’s Kaffa region sometime during the 15th century — apocryphal tales abound of goat-herds observing their unusually excitable tribes after grazing them in coffee fields, or of traveling Sufi mystics seeing flocks of high-strung birds chirping around the plants as the inspiration for the discovery. While the true story may never be known, consumption first began with chewing the beans, but brewing methods quickly became predominant.

Percolating from the Ethiopian Highlands to the Middle East, the new beverage faced initial scrutiny (it drew many comparisons to the outlawed alcohol), but was eventually embraced with a firm coffee shop culture, not unrecognizable to the modern coffee-shop-goer, quickly taking shape around it. 17th Century French traveler Jean Chardin gave a lively description of the Persian coffeehouse scene:

“People engage in conversation, for it is there that news is communicated and where those interested in politics criticize the government in all freedom and without being fearful, since the government does not heed what the people say. Innocent games… resembling checkers, hopscotch, and chess, are played. In addition, mollas, dervishes, and poets take turns telling stories in verse or in prose. The narrations by the mollas and the dervishes are moral lessons, like our sermons, but it is not considered scandalous not to pay attention to them. No one is forced to give up his game or his conversation because of it. A mollawill stand up in the middle, or at one end of the qahveh-khaneh, and begin to preach in a loud voice, or a dervish enters all of a sudden, and chastises the assembled on the vanity of the world and its material goods. It often happens that two or three people talk at the same time, one on one side, the other on the opposite, and sometimes one will be a preacher and the other a storyteller.

Reaching, with impeccable timing, Italian shores via Venetian trading ships, coffee’s global ascent dawned rapidly alongside the great age of European exploration².

Scene depicting an Ottoman Coffee Shop

Some Notes on the Data

To my knowledge, there is no open repository of US coffee shop data, so I did my best to create one (all data and code used for this article can be found here ³); data for the coffee shops were pulled from Yelp’s API and neighborhood boundaries were drawn from Zillow as these tend to better capture organic communities than zip codes, census blocks, etc. The Yelp data was spatially merged with the Zillow neighborhoods, and statistics were compiled from there. The selection of cities attempts to capture a variety of sizes and geographic locations, densities and urban cultures. Data is restricted to the U.S. cities as there exists some level of homogeneity in American coffee culture and in business zoning and regulations — an important assumption when performing clustering.

The selection of attributes used to analyze the neighborhoods attempts to address a variety of dimensions; they are defined and rationalized as follows:

  • density is the count of coffee shops divided by area of the neighborhood; density plays a key role in the choice and accessibility to coffee shops
  • mean price is used to capture general expensiveness of coffee in a neighborhood
  • mean rating looks at the average Yelp ratings for coffee shops in a given neighborhood, a rough proxy for quality
  • mean review count and st dev review count as percentage of total reviews look at the average Yelp review count per store. Review counts tend to follow a rough power law distribution, with a few locations having a disproportionate amount of reviews, so the standard deviation is also taken into account, and normalized against the mean (e.g. a standard deviation of 10 for a mean of 12 expresses something very different than an standard deviation of 10 for a mean of 2,000)
  • percent bakeries and percent fancy donuts categories attempt to capture bakeries and donut shops (the ‘fancy’ can simply be read as “not-named-Dunkin”) that also serve as places for people to get coffee. These are not exclusive groupings; for example, some stores are grouped as both fancy donuts and third wave
People gathered at a local coffee shop in Detroit
  • percent corporate attempts to capture the percentage of ‘corporate’ coffee chains in a neighborhood. The idea here is that these are places where you could hold a business meeting (e.g. “Hey, let’s grab a coffee at Starbucks and talk about those reports.” is a phrase that doesn’t sound terribly unusual; replace ‘Starbucks’ with ‘7–Eleven’ and you might get some looks), but is focused less on providing the highest quality coffee, and is more in the middle ground of quality and convenience. Starbucks, Peet’s, and Cafe Nero are prime examples. For a full list, refer to the code.
  • percent grab-and-go attempts to capture locations where convenience is the key factor. Dunkin’ Donuts is the most represented in this group, but McDonald’s, 7-Eleven, and Tim Horton’s are also prime examples. I had a few intense debates as to whether or not to drop 7–Eleven and McDonald’s from the analysis, as coffee is not their primary focus, but was convinced by a number of people for whom these are their primary coffee sources that I am now firmly of the conviction that they belong here.
  • percent third wave identifies coffee shops that focus on coffee chains that have 4 or more stores in the sample, yet focus on producing a high quality product. A more detailed description can be found here.
  • percent tea and juice identifies stores that sell tea and juice; there is some wide variety, ranging from Taiwanese boba shops to juice bars to fine British tea houses. These are grouped together primarily because they are all tangential to coffee shops, and often serve coffee, but are not quite a coffee shop. They are included in the analysis as I believe looking at these does provide a better understanding of the coffee culture of a neighborhood.
  • percent independent captures the rest of the bunch — any store with under 4 locations that is not tea or juice is included here. This may be an upscale local coffee shop or a gritty diner. This categorization is admittedly not perfect, but the prevalence of locally-run institutions does likely reflect neighborhood culture.

Some seemingly obvious metrics are left out. Coffee shops per capita and coffee shops per job, alongside some mixture of price and neighborhood income would be useful additions, but data on these are not readily available for the neighborhood boundaries defined here. It was decided that using naturally forming neighborhoods was more important than including these variables and switching to something more contrived, such as zip codes.

A quick glance at the correlation between the different metrics supports a few intuitions. The strongest positive correlation in the data is between percent independent coffee shops and rating, while the strongest negative correlation is between percent grab-and-go coffee shops and rating, supporting the intuition that neighborhoods with a lot of local coffee shops tend to have better quality joe than neighborhoods with a lot of 7-Elevens. Neighborhoods with many independent shops also tend to get more reviews, and neighborhoods with many corporate coffee shops tend, unsurprisingly, to be overpriced.

Coffee Shop History — Part 2: The Enlightenment

Scholars credit the introduction of coffee into Western Europe as one of the key enabling factors of the European Enlightenment. Serving as a safe alternative to the beer and wine of the day (water was often unsafe due to poor sanitation), the beverage’s stimulating effects, in stark contrast to alcohol, enticed sharp thinking and intelligent conversation. To facilitate and enhance such propensities, a new type of public space emerged throughout Europe, the coffee shop.

Along with the beans, a similar culture of conversation and community emerged in 17th Century England. Before the days of universal street numbering, mail could be delivered to the local coffee house. News was shared, events planned, publications written. Button’s Coffee House in London kept a large, white marble lion’s head on its walls; anyone could feed its wide-jaws with limericks, stories, or thought-pieces, the best of which were selected for the shops weekly publication “The Roarings of the Lion.” The London Stock Exchange, the world’s first, traces its origins back to meetings at Johnathan’s Coffee Shop in Change Alley. We can begin to see the modern coffee shop beginning to take its form in the city:

“The contrast between coffee and alcoholic drinks was reflected in the decor of the coffee-houses that began to appear in European cities, London in particular. They were adorned with bookshelves, mirrors, gilt-framed pictures and good furniture, in contrast to the rowdiness, gloom and squalor of taverns. According to custom, social differences were left at the coffee-house door, the practice of drinking healths was banned, and anyone who started a quarrel had to atone for it by buying an order of coffee for all present. In short, coffee-houses were calm, sober and well-ordered establishments that promoted polite conversation and discussion.

An Overview of the American Coffee Shop Scene (in 31 Cities)

Before diving into a neighborhood analysis, let’s take a glance at the cities themselves. First, looking across all 31 cities, we find that the top 4 stores by count (Starbucks, McDonalds, Dunkin’ Donuts, & 7-Eleven) vastly outnumber smaller chains. Indeed, these 4 stores make up over 36% of coffee shops in these 31 cities. While this is significant, other industries, such as the beer industry, exhibit even more disproportionate concentrations.

Starbucks can be a rather polarizing store for coffee-lovers, but its presence is ubiquitous across the U.S. and globally. Is there anything that can be gleaned about a city from the fraction of coffee shops that are Starbucks? Are these the most corporate cities in America? The most bland? The most convenient? I’ll leave that up to the reader to decide, but here’s a glance at the percentage of Starbucks making up each city’s total coffee shop footprint:

Starbucks, America’s most prominent coffee chain, accounts for roughly 17% of the coffee shops in the cities in this study, though its presence varies widely.

Newton, Massachusetts not only holds the record for highest percentage of Starbucks; it also takes home the prize for highest percentage of Dunkin’ Donuts, closely followed by neighboring Cambridge and Boston (in that order). Let’s take a quick look at some basic footprints of these three donut-loving municipalities:

Radar Chart of Greater-Boston Municipalities; endpoints represent each type’s percentage makeup of total coffee shops

Dunkin’s influence is apparent in the pull of grab-and-go stores in each municipality; Cambridge tends to be the hipster of the group with the greatest percentage of independent shops, while Newton is the mainstream neighbor, sticking mostly with corporate and grab-and-go shops. We can also see that the region has a stark absence of third wave coffee shops and tea and juice shops.

Let’s move south a bit and look at New York City’s boroughs to see if any similar patterns emerge:

Radar Chart of New York City Boroughs Coffee Shop Makeup

New York City Boroughs also appear to live up to their reputations, with Brooklyn being the most independent of the bunch, Manhattan being similar, but more corporate (and more third wave). Queens and Staten Island have remarkably similar makeups, while the Bronx is slightly more in the Grab-and-Go direction. Overall, New York City’s coffee scene skews more independent than Boston’s. Let’s take a glance at a few other U.S. cities before moving on to a more granular analysis of neighborhoods:

Radar Plots of San Francisco, Portland, and Pittsburgh coffee shop makeups

The West Coast, it seems, is host to a more independent demographic of coffee shops than the North East. Nearly 75% of the coffee shops in San Francisco and Portland are classified as independent, while Pittsburgh, our Rust-Belt representative, has the most third wave coffee shops as a percentage of total coffee shops among the cities studied.

That’s a brief glimpse into an endlessly interesting dataset. I hope to do more analysis here, and I hope others do as well. But for now, let’s zoom in on particular neighborhoods and see if we can come up with a system for comparison.

Comparing Neighborhoods

While we’ve seen that cities themselves exhibit different and fairly discernible coffee shop footprints, neighborhoods within cities can be just as, if not more dramatic. For the purposes of comparison and classification, only neighborhoods containing 10 or more coffee shops are included from here on out, as data from smaller neighborhoods are subject to excessive amounts of noise (in the statistical sense of the term).

Stark contrasts are clear when looking at the data from a bird’s eye view. For example, two bordering Denver neighborhoods display remarkably contrasting types of coffee shops; the Central Business District is dominated by a mix of corporate and grab-and-go vendors, while the hipper Five Points neighborhood, containing the city’s RiNo arts district, solely consists of independent stores.

We see a similar, if not quite so clear contrast when looking at Brooklyn’s layout; independent stores dominate the neighborhoods, while corporate stores pop up in business districts, with grab-and-go arising in fairly evenly all over the map in this subset.

These overviews provide a reassuring intuition that neighborhoods do indeed follow some sort of ‘coffee shop footprint’ pattern. As we begin to attempt to form groupings of these neighborhoods, we will also want to take into consideration our other metrics, such as density, price, rating scores, and rating counts to further provide qualitative differentiation between the different neighborhoods.

Three different clustering methods will be used on normalized versions of the data (a min-max scaling was selected, as the values are all positive, and few outliers are so large as to effect the scaling). Firstly, a hierarchical approach with agglomerative, complete-link clustering will allow us to zoom in and out as we please looking at neighborhood similarities. Then K-means clustering will be used to identify groupings of different neighborhood types; as the selection of the size of the number of groupings is largely subjective, we will also use a gaussian-mixture model to cluster the neighborhoods and do a manual sanity-check to see if our methods make sense and if there are any important considerations that we may need to bare in mind when interpreting these results.

For the hierarchical clustering, a ‘complete linkage’ method was employed, meaning that initially all points were clustered by themselves (in 11-dimensional space, based off scaled versions of the attributes defined earlier), and clusters are gradually linked together based off of their most dissimilar members until one group remains; this makes intuitive sense for our purposes, as we want groupings to be resistant to dissimilar members. We can then arbitrarily hold a cut-off somewhere in the process, drawing a horizontal line to make a number of grouping convenient for our purposes of understanding and generally classifying the different types of neighborhoods.

We can use a dendogram to visualize the results. Given that there are 308 neighborhoods with more than 10 coffee shops our dataset, it’s pretty huge. Happy scrolling:

Hierarchical Clustering of 308 American Neighborhoods by Coffee Shop Footprint (all neighborhoods analyzed here have more than 10 coffee shop locations)

Assuming you’ve made it to the bottom of that plot, you can see that some clear patterns have emerged! At the top, we see that Chinatown in New York and San Francisco are right next to each other. At the bottom, we find airports and corporate downtowns. Many neighborhoods that border each other geographically are also next to each other in these groupings (implying that geographic contrasts are not always as stark as the Denver example we looked at earlier). This seems to be good news for this methodology.

Potential things that may warrant closer inspection are the ‘outliers’ of the group — for example, New Orleans’ French Quarter and Central City East (better known as ‘Skid Row’) in Los Angeles seem to be unlike any other neighborhoods (and also relatively unlike each other). The French Quarter is an anomaly, and a wonderful one at that, in the U.S. — due to a strong French influence and a heavy amount of tourism, it is rightfully tough to find a good comparison. Likewise, Skid Row is an outlier in U.S. neighborhoods, containing one of the largest stable homeless populations in the country while also being isolated from much of greater Downtown LA. That these two outliers in our analysis seem to be outliers in other ways is a good sign — indeed, it would be slightly concerning if, say, Skid Row and Midtown Manhattan were grouped together.

Radar Charts of the groupings defined by our hierarchical clustering (colors match those on the dendogram above); Data has been normalized between 0 and 1, so percentages will not necessarily sum to 100, but do provide an accurate, scaled measure by which to gauge feature prominence.

While our hierarchical clustering seems to provide intuitive results when viewed through the lens of the dendogram’s neighboring neighborhoods, comparing the larger groupings (represented by the colors in the chart) can provide a clearer understanding of what characteristics similarly-grouped neighborhoods share, and give a better intuition as to the logic behind these clusters.

A few interesting patterns stand out (e.g. if you’re looking for a good donut, go to one of the ‘firebrick’ neighborhoods, for tea, try ‘forestgreen’), however we see that some of the groups are remarkably similar. ‘darkviolet’ is essentially the same as ‘darkorange’ if some third wave coffee shops were thrown in. ‘sandybrown,’ ‘steelblue,’ and ‘teal’ all display a similar mix of characteristics, with the emphasis on one or the other trait being the main differentiating factor. Are these differences significant enough to constitute unique groupings? Or do they overfit the data, and we’d be better off consolidating into fewer groups if we wanted to fit our model to new neighborhoods? Or do we not have enough groups, and there’s data frustratedly sitting behind these radar charts just waiting to be released? Clustering is as much an art as it is a science, requiring a detailed knowledge of context in addition to calculation. As our aim is mostly exploratory, for our purposes, this will suffice.

Coffee Shop History — Part 3: The Modern Coffee Shop

From its origins in the Middle East, through the Enlightenment, the French Existentialists and beyond, conversation has been at the heart of coffee shop culture. The modern coffee shop keeps this tradition alive, but a new dynamic has taken hold since turn of the millennium — laptops and free WiFi are now mainstays at many, if not most, modern coffee shops. The quality of coffee quality has taken a large leap up from 18th Century London, where the brew was “likened to a ‘syrup of soot and the essence of old shoes’ while others were reminded of oil, ink, soot, mud, damp and shit” (hardly an appealing description, but they still had customers). And corporate coffee shops have become a huge presence, thanks to Starbucks and its competitors. In this aside, we’ll look at the dawn of Starbucks, WiFi, and the Third Wave.

Starbucks opened its first storefront in Seattle in the early 70s, primarily as a high-quality roaster and distributor in the area (for the first year of operation, the only coffee available at the shop came in the form of free samples). The founders were University of San Francisco grads, and during their time there had become acquainted with Alfred Peet, founder of Peet’s Coffee. Starbucks began to catch on in the Seattle area, and a few stores opened up in the coming years. In 1984 the Starbucks owners bought out their old friend and mentor’s coffee business, acquiring Peet’s and selling the Starbucks store to a former manager, Howard Schultz. Under Schultz, Starbucks began a rapid expansion (from 1987 to 2007, they averaged opening 2 new store per day), becoming the ubiquitous global powerhouse they are today.

Starbucks is not the only big story in recent coffee history; the faction of roasters and coffee shops placing greater emphasis on quality of beans, preparation, presentation, and experience of the beverage which have emerged over the past 3 decades is collectively referred to as ‘the Third Wave.’ Roasters such as Intelligentsia, Stumptown, and CounterCulture have lead the way, opening up stores of their own, but also providing high-quality beans to independent coffee shops.

Finally, the introduction of free WiFi to coffee shops (an introduction from which the author is currently benefiting), has dramatically changed the nature of interaction within coffee shops. Initially hotly debated (sometimes successfully stifled), WiFi in coffee shops can be seen as here to stay. The impacts of WiFi on coffee shops footprints is worthy of a future post, but the demand for public places to drink coffee and use WiFi is undeniable, giving rise to an entirely new business-model, the co-working space. Working from a coffee shop can be an intensely rewarding, productive experience, adding a new dynamic to what coffee shops contribute to the world. However, even amongst the crowds of the headphoned, keyboard-slapping work-junkies, conversation still persists.

WiFi has quickly become a mainstay in coffee shops

While a hierarchical clustering may be our most interpretable model, due to the fuzzy nature of such clusterings, let’s now take a look at two clustering algorithms of a different nature: K-means, and Gaussian Mixture Models. These algorithms are not quite as transparent as hierarchical clustering, but through comparing these radically different methods, we can attain greater robustness in our findings.

K-means is an incredibly prominent clustering algorithm, using k number of center-points and adjusting them until roughly-optimal clusters are found. It works well with high-dimensional data (which we have), but also assumes that there is some spherical nature to the clusters (which we’re not quite sure of), and assigns a hard value based on distance to cluster centers. The value of k, as with our hierarchical clustering, is a number assigned by the user, and not some divinely discoverable value, and thus methods must be used to attempt to determine some satisfactory value for k.

A Gaussian Mixture Model (GMM) is more robust to non-spherical data than k-means and also comes with the handy ability to return cluster likelihood values rather than simply hard clustering the data. It lacks some of k-means’ interpretability, but will serve as a useful metric for comparison — if both return similar types of clusters, that is promising. If they are wildly different, it is usually time to go back to the drawing board.

Before running these clustering algorithms, I performed Principal Component Analysis (PCA) on the data, essentially reducing our 11-dimensional dataset into 5 dimensions (this number was determined as it takes 5 dimensions to explain ~83% of the variance between the data. 4 dimensions explains just under 80%, and would have been a fine choice as well. Anything above 5 dimensions could result in the model overfitting or placing excess weight on non-important variables). The first 2 & 3 dimensions are plotted below, colored by the labels assigned by running the algorithm against the 5-D dataset. The data is grouped into 7 clusters, again somewhat arbitrarily, but this seemed to be the Goldilocks, just-right value that didn’t lump together overly-massive groups (which don’t provide much insight!) but neither did it leave a handful of small 2–3 count outliers (which are often perfectly reasonable to have, as in our hierarchical clustering, but narrowing the number of clusters allows for a bit more robustness in the model).

Principal Components plotted in 2 & 3 dimensions, colors represent the clusters assigned by each algorithm (color is only a differentiating factor here — there is no ‘scale’ or particular meaning behind the assigned colors other than to differentiate them from other groups). Axes are inherently unit-free.

These plots immediately reveal two things; first, that there is a significant amount of variation in neighborhood coffee shop footprints (as seen in the spread of the data), and second, that this variation tends to happen along a more-or-less continuous space — i.e. there are no clear clusters that tend to ‘attract’ neighborhoods; rather neighborhoods differ, but it’s more along a spectrum than by falling into discrete groupings. This is an interesting finding, and useful to know as a city planner observing the coffee shop culture of neighborhoods (knowing that there if you wanted to shape that culture, it would likely require a slow shift in any one direction rather than a quick snap after reaching a tipping-point) or if you worked for Starbucks and wanted to develop a strategy for future store placement.

Radar charts of the clusters former by GMM and K-Means algorithms. Similar clusters seem to have been identified by both algorithms, and these are shown as horizontal neighbors.

Let’s now examine our clusters to see if there are any similarities in the type of groups formed.

Low and behold! there do appear to be similarities almost across the board. Looking at the table of radar charts on the left, we can readily observe that GMM and K-means, while slightly differing in the details, identified largely similar groups among the data.

The first group (GMM 1 & K-means 7) are characterized by the high number of independent coffee shops and their high average ratings — one could guess that these include some of the hipster coffee capitals of the world, and indeed, Williamsburg, Brooklyn, SF’s Mission District, and Seattle’s Fremont neighborhood all fall into this category for both algorithms.

The second grouping (GMM 2 & K-means 6) is slightly more pricey and corporate than its hipster neighbors, but still has a high influx of independent shops and high ratings. East Cambridge (home to MIT) and Capitol Hill in Washington, DC fall into this category.

The third grouping is highly similar to the second, with slightly more bakeries being the clearest differentiator. University City in Philadelphia and Downtown Portland fall into this group.

And so these matching patterns continue, providing a good bit of reassurance that the models are seeing similar things in the data. The one exception is the final grouping, which is highly corporate in K-means, but rather spread out in GMM. This is likely a mix of outliers — it includes a mix of mostly central business districts and airports in both groupings.

We’ve had some success using multiple methods to develop an intuition and a framework through which to analyze and compare the coffee shop scenes in U.S. cities. The radar charts above show that similar groupings were uncovered by GMM and K-means. How do these compare to the groupings from the hierarchical clustering that we performed earlier?

To visualize a comparison, I created a symmetrical square matrix with values ranging from 0–3 (0 if two neighborhoods share no clusters from any of the algorithms; 3 if they share clusters in all of the algorithms). The resulting structure was larger than our humongous dendogram, and unfortunately Medium cannot render the finer details of the full-sized image (you can find it on my github), but glancing at the colors below (0 corresponds to white), you can get a sense that while there are denser and less-dense pockets of groupings, a significant amount of matching is going on. General reassurance is great, but we also want insight. Below this giant figure, there are more granular views of 4 neighborhoods and their encircling 20 neighbors that fall into very different densities in the clustering —Brooklyn’s Bed Stuy neighborhood is bunched in a tightly-packed group. San Francisco’s Nob Hill and Manhattan’s Morningside Heights fall nearer the middle of the pack, with some dense bunching, but also greater variation. Boston’s Chinatown finds itself in a more lonely bunch, with many of its closely-related neighborhoods not necessarily all that alike one another.

Comparing all 3 methods¹⁰
A selection of 4 neighborhoods, plotted as a cluster of their 20 ‘most similar’ neighborhoods. Dark blue squares denote matching in all 3 algorithms; light squares matching in 0 (with the exception of the Bed Stuy plot, where it actually matches in 2). We can see some variety in the density of the clusters here, moving from Bed Stuy, a neighborhood in a highly-clustered space, to Chinatown where most of the closely related neighborhoods aren’t that related to one another.

Ideas for Further Investigation and Concluding Thoughts

Hopefully this has been a fun, unique look into coffee shops and the neighborhoods that they help shape, one that has made you think a bit about an institution that many people mindlessly visit on a daily basis. There is so much data to explore, and I have only touched the surface. If you know how to code, please dig into the data and share your findings; if you are an expert in the subject matter and have any suggestions, please reach out. A few things keep this exploration confined to the U.S. —availability and uniformity of neighborhood shape-files, the assumption that people use Yelp similarly across the U.S.¹¹, and general cultural attitudes towards coffee shops and business regulations and zoning in the country. None of these issues are insurmountable — perhaps sources other than Yelp can be pulled in; neighborhood shape-files doubtlessly exist for most cities, or can be made and adjusted for these purposes; zoning laws and coffee culture impact a neighborhoods coffee shop footprint, so methods can either be used to soften their impacts - or perhaps they are things we want to identify to begin with. I don’t know exactly what future analyses will dig up, but there’s doubtlessly much to be discovered.

Further analysis need not be limited to coffee shops, either. Coffee shops may be the one of the best indicators of a neighborhood ‘footprint’ in this regard, due to the sheer amount of them and the variety within them. But to rattle off a few more… gas stations? pharmacies? grocery stores? lunch spots? restaurants generally?

I hope this post, in a way my nerdy ode to the coffee shop, provides a small glimpse into how these wonderful institutions both shape and are shaped by their surrounding neighborhoods. I hope it suggests that there are methods by which to quantify and compare coffee shops leading to insights, both intuitive and delightfully not-so. And finally, I hope that, beyond all analyses, this post inspires the reader to visit a local coffee shop, be it a cozy, independent micro-roaster or a Dunkin’ Donuts off the interstate, to sit down and reflect, if only for a brief moment, how wonderful it is that such an institution exists for all to enjoy. ☕

FOOTNOTES

  1. ^ or a splash of milk

2. ^ European explorers introduced coffee to many of the regions in which it’s production currently flourishes, such as Brazil, Vietnam, Colombia, and Indonesia (the world’s top 4 coffee producing nations). Alongside coffee, they also introduced many new and unpleasant diseases, systematic oppression, and asparagus.

3. ^ feel free to use any data or code; if you do so, a link to this article would be appreciated. Cheers!

4. ^ Due to limitations in the number of calls available in the free version of yelp, this data was gathered over quite a few days (in early October 2018), with multiple individual calls made for each neighborhood, and then a filtering mechanism used to remove duplicates and anything not serving coffee. From there, multiple other filters were used, along with a fair bit of manual cross-checking, to assure that most of the thousands of locations analyzed are primarily coffee shops (a category which is itself ambiguous). Doubtless, a few examples slipped through. If anyone finds such an example, please email me at acs882 {a.t} nyu dot edu, and I will update and send you my eternal gratitude.

5. ^ https://www.dailysabah.com/history/2018/07/20/coffeehouses-in-ottoman-society

6. ^ https://www.economist.com/christmas-specials/2003/12/18/the-internet-in-a-cup

7. ^ Probably because they threw all of their tea into the harbor.

8. ^ Though we do see some close-to-discrete groups begin to emerge when we filter for only neighborhoods with more than 50 coffee shop locations:

Discrete groups start to form when the we only look at neighborhoods with >50 coffee shop locations, however this is such sparse data, and not clear enough for any conclusions to be drawn.

9. ^ Apologies for not going through all 308 neighborhoods, but this post is already long enough! I hope to create some sort of interactive visualization tool for this in the near future so that you can explore at your own leisure.

10. ^ I can’t help thinking of the glitchy Pokémon, MissingNo, when I see this plot.

11. ^ This is likely not true. I’d imagine that, all else aside, tourist hotspots like Manhattan receive qualitatively different reviews than, say, suburban Dallas, but this is a topic worthy of a study all its own.

--

--