How to: Asian American Dot Density Map
A step by step guide on making a dot density map of Asian American sub-groups from the Census.
Step 1 Acquiring Census Tract geography and Census race data by tract
- I used TIGER files directly from the Census which is available by state. I downloaded each state and merged them in QGIS. This seems like a lot work, but it was a matter get a links downloader extension for your browser, dumping all the downloads into 1 folder, importing all into QGIS by merging the layers in 1 step. This process to make a nationwide file takes between 5 and 10 minutes and can be done while watching TV.
Note: Tracts are the smallest geography that this table of Asian groups is published in. The race dot map examples you may have seen previously use Block or block group level data, which is the smallest geography that the “Race” category is published in.
- I downloaded the Census table for the “Asian Alone Or In Any Combination By Selected Groups” from Social Explorer which gives me the subset of Asian groups. I reduced the size of the file by removing some of the columns in this dataset, leaving the GEOID and population counts.
- The census geography was already loaded into QGIS from merging state layers, so now we just have to add the Census data. We do this by going to Layer>>Add Layer>>Add Delimited Text Layer and selecting our Census data file.
- Next I joined the newly created dataset by GEOID to the tract geography. Click on the tract geography layer, open the Properties panel, and select the Join in the menu to the left. Add a new join by click on the + sign on the bottom left corner and fill out the pop up window fields with the name of the census data layer and the 2 column names, 1 from each layer to join.
Step 2a Generating a dot for every Asian
All the dots for each geography has to be generated in 1 go and not by subgroup to prevent dots from overlap. I started by splitting all tracts into east and west and generating the sum of all subgroup populations in 1 go using Vector>> Research Tools>>Points in Polygons and selecting total population column as the number of points for each polygon . The dots were then exported into .geojson files.
Step 2b Applying a subgroup id randomly to each dot.
Since all the dots for each subgroup were generated in 1 go, they do not have unique national identifiers attached to them yet. I believe advanced GISers would have don step 2a and 2b together with ease — I am not and so I did not. Instead, I used python to parse through the resulting files and randomly assign the dots with a nationality code while keeping tract that each census Tract makeup is maintained. This script will be available in the project repo.
Step 3 Converting resulting geojson to a map tile for online use.
To better control the zoom level visibility of my dots, and reduce the size of my files, I used tippecanoe in my terminal to convert each .geojson file to a .mbtile file. Once converted, these files were uploaded as tilesets in mapbox studio, and 22 million dots were loaded into a new style. These steps were taken following the Mapbox help topic here.
Step 4 Loading and styling the resulting map.
No advanced Mapbox styling here. I relied on using a scale based on zoom level for the radius, and a dictionary for the color of each dot. The look I wanted most closely resemble the first example from the Cooper Center. So opacity and size were adjusted to that effect. Color coding is very challenging here, with 23 categorical values, differentiation is impossible without an underlying orientating principle. I tested out color schemes based on language systems, physical proximity of groups in Asia, and interactions in the United States. Finally with the expert eye and aid of our research assistant Adeline at the Center for Spatial Research, we reached a final palette. The final colors are a hybrid, the solution is happily messy — a logical orienting principle based on proximity that is edited by completely subjective fine-tuning.
Step 5 Adding some initial interactivity.
There were many things I wanted to know about this dataset — composition and concentration are the most pressing questions, but comparison, change, and absence were also important. For this initial map, the interaction I built has just 4 elements to highlight composition and concentration.
The first element is the Mapbox geocoder to that users can navigate to different places with search.
I used the builtin Mapbox geocoder from this simple example.
Only 2 options were updated from the code found at the link above. These 2 changes are :
1. limiting search results to the United States. This is done by adding countries: ‘us’ to the geocoder options. Instructions were found here.
2. The geocoder was added not to its default location but to a div to fit the layout of the webpage. This is also extremely straightforward, and the examples for doing so can be found here.
The second is a color legend for the map that doubles as a filter to isolate groups. that allows comparison of country wide group composition with the group composition of particular tracts. This is done by building a simple HTML table that allows me to list the subgroups and color code them. When each group is clicked, a filter is set on the map to only show that group. The code for this is explained in the Mapbox help guide here. Specifically, it uses this line: map.setFilter(layerName, [‘==’, key, value]); where the layerName is the dots layers of my map, the key is the subgroup key from the census, and the value is the name of the group.
A third element is a map popup, showing population composition for a particular tract when the mouse hovers over that tract. This is done using a combination of mapbox’s built in code for map.on('mouseenter',…) and a using CSS to position the popup as a HTML<div> element. A very similar example can be found here.
The final bit of interaction is a list of points of interest for users to navigate around the map. I found that I gravitated to large cities when exploring the map and in the process missed concentrations of particular population, especially in smaller non-coastal cities. The list of places of interest is made by finding the top 10 counties with the highest count of a particular subgroup. I did this by first downloading a nationwide county level file, again from Social Explorer for the same table I used for generating the tract level dots. I then used Python to iterate through the counties and output the top 10 counties for each of the 23 subgroup codes.
Then I also add places with tract level concentrations that may be diluted by using larger county geography as the base unit for ranking. I did so in Python by ranking all tracts by each of the subgroup codes, and taking the top 100 tracts in terms of number of persons for a group. I then checked to see if any of the 100 tracts in a given group belonged to the same county. When a particular county had many tracts with high concentrations of a particular population, I included that county as well. This is fairly straightforward geographically speaking because tracts are nested in counties, to get the county that a tract belongs to, just take the first 5 digits of the full 11 digit tract FIPS code.
I downloaded the the countrywide counties file from Census TIGER data portal. Using QGIS, I plotted and exported the centroids of each county and used it to build a map layer using Mapbox Studio. Centroids of each county is labeled with the county name on the map in addition to builtin Mapbox place names data layer.
To use concentrations data I created in the above steps, a HTML dropdown menu was created to allow selection of each subgroup. When a subgroup is selected, the data for centroids is again used here in combination with the Mapbox flyto function to recenter the map to the centroid of one of the counties of interest. You can find a good example of how to use it here.
There is much more to be done, but I am hoping to share this project as a public basemap shortly for others to build on top of. You will find all code and data, along with many other open source projects at the github account for the Center for Spatial Research.