# Crowd Estimates at the Capitol Riot Using PyTorch and Streamlit

Jan 14 · 5 min read

Although experts have said that without any aerial photos of the riot, it is difficult to estimate the size of the crowd, the one estimate that everyone can agree on is that there were a few police when compared to the number of rioters.

Using Streamlit and CSRNet, developed by Yuhong Li, Xiaofan Z, and Deming Chen, let’s put together a tool to evaluate the number of protesters in any given photo. This tool can be useful to help create descriptive analyses comparing the ratio of cops to the crowd for this event as well as past gatherings (e.g. the Black Lives Matter protests from the summer of 2020).

# Project Goals

The first step to any project — whether it’s a yearlong one or just an afternoon like this project — is to lay out the puzzle pieces and clarify to your partners as well as yourself what the goal is: what is this project meant to accomplish?

This means you need to take stock of the tools at your disposal, the amount of hours you want to prioritize on each phase of your project, and if they align with your goal(s). While it’s good to think about these three aspects of a project in conjunction, if you’re stuck, forget the code and start with your goal — it’s the driving force that can easily bring you to the finish line or the missing keystone piece that prevents your project from leaving limbo.

## Evaluating Feasibility of a Project

For this project — my initial concept was to quickly determine the ratio of police to crowd in any given photo. However, when looking through the news, many photos of the Capitol Hill rioters did not show the police in rigid lines like I was picturing from the video footage I had seen from the summer — when troops/police had advanced on BLM protesters and Trump got his photo-op in front of St. John’s Church.

This immediately nixed the idea of separating the police from protesters using a lasso or multi-point select polygon using OpenCV or Bokeh, and I decided to reduce the time that this project would take and instead focus on quickly deploying a working crowd estimator that includes police and crowd alike.

Before diving into the crowd estimator code itself, let’s look at some Streamlit functions to help with the dashboard’s appearance.

Although Streamlit isn’t the most flexible option out there in its appearance, adding CSS styling is a great tool at your disposal.

`# Local CSSdef local_css(file_name):    with open(file_name) as f:        st.markdown('<style>{}</style>'.format(f.read()),                    unsafe_allow_html=True)# Hide Hamburger in Top Right and Footerdef hide_hamburger():    hide_streamlit_style = """                <style>                #MainMenu {visibility: hidden;}                footer {visibility: hidden;}                </style>                """    st.markdown(hide_streamlit_style, unsafe_allow_html=True)local_css('style.css')hide_hamburger()`

Using Streamlit’s file_uploader, users can import local images. In the following code snippet, I’ve resized the image as well as check if it is right-side up (an issue for some images).

`uploaded_file = st.file_uploader(label="")if uploaded_file is not None:        # resize image if too large        basewidth = 500 # width we'll create the height off of        image_in = Image.open(uploaded_file)        wpercent = (basewidth / float(image_in.size[0]))        hsize = int((float(image_in.size[1]) * float(wpercent)))        image_in = image_in.resize((basewidth, hsize), Image.ANTIALIAS)# make sure image is right side upimage_in = ImageOps.exif_transpose(image_in)`

Finally, markdown lets us customize text in the dashboard. Make sure to add unsafe_allow_html=True.

`st.markdown(f"<p class='upload-text' style='font-size:20px;padding-top:50px;'>Crowd Estimate</p>", unsafe_allow_html=True)`

# CSRNet & PyTorch

There were two issues I saw beforehand when deciding which tools to use for this project: 1) CSRNet is written in Python 2 and uses a CUDA-enabled version of PyTorch, and 2) PyTorch pushes repositories over the 500 mb slug size when deploying on Heroku.

## CSRNet & Python 2

CSRNet is written in Python 2, which Streamlit has not supported since February 2020. I used the Futurize library to convert CSRNet’s scripts to Python 3, and when running the model training script, addressed any additional compatibility issues (e.g. importing the submodule “spatial” explicitly). In my initial “laying-out-tools stage”, I had come across this article, which has a step-by-step on converting CSRNet to Python 3.

## CSRNet & CUDA & Heroku

The bigger issue is CSRNet using a CUDA-enabled version of PyTorch. While the CPU-only versions of PyTorch are smaller and are friendly to Heroku deployment (500mb max slug size), I’m using a pretrained crowd estimator model from the step-by-step article cited above that was built with a CUDA-enabled version of PyTorch. While PyTorch has good documentation on how to save/run models between GPU and CPU, it’s trouble I’d rather avoid.

Instead of converting the model for CPU-only PyTorch or re-running the script to train the model using a CPU-only version of PyTorch (virtualenv this 💯), I decided to keep the project goal as a local dashboard that estimates crowds.

Streamlit also offers a service to deploy dashboards and share them publicly — it limits project sizes to 800mb and you can request access here.

## LFS & Github

One last thing that might have people hung up in the deployment process — uploading models to Github. The model in this project was over 100mb, which is the max file size for commits via Github. This article lays the process out well, the tl;dr version is below.

1. git lfs install
3. git lfs track “*.tar”
4. Commit the files

# Planning Out Projects & Preemptive Troubleshooting

I decided to write an article about this project specifically because it represents the need for solid project scoping and goal+tools alignment before touching the code.

And while these code snippets are specific to projects revolving around Streamlit, PyTorch, Heroku, and generally quick AI-based dashboard projects — I hope the article is helpful in illustrating the importance of setting goals and doing your research before diving into the code. Better to learn it via Medium than the hard way lol.

About Theo: Founder of Basil Labs, a big data consumer intelligence startup that helps organizations quantify where consumers go and what they value.

Love music, open data policy and data science. For more articles, follow me on medium. And if you’re passionate about data ethics, open data and/or music, feel free to add me on Twitter or Linkedin.

## Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data…

Thanks to The Startup

### By Analytics Vidhya

Latest news from Analytics Vidhya on our Hackathons and some of our best articles! Take a look.

Medium sent you an email at to complete your subscription.

Written by

## Theo Goetemann

Theo Goetemann. Founder @Basil Labs. #AI #OpenData #ConsumerIntelligence https://twitter.com/theo_goe

## Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Written by

## Theo Goetemann

Theo Goetemann. Founder @Basil Labs. #AI #OpenData #ConsumerIntelligence https://twitter.com/theo_goe

## Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

## Tuning your PostgreSQL for High Performance

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface.

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox.

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic.

Get the Medium app