Crowd Counting: Approaches, Use Cases and Importance (Part 1)

Suraiya
Secure and Private AI Writing Challenge
12 min readAug 19, 2019

Context:

I am writing this article as part of the Crowd Counting project which came out of a study group called sg_wonder_vision. This study group is in turn an initiative by fellow students of “Udacity Facebook Secure and Private AI” Course.

I am not new to math, machine learning, security, socket programming fields and Python in general. In that sense the course was not overwhelming for me. One thing that I would like to highlight is — you would deeply experience various people handling scenarios (as discussed in HBR Magazines and similar publications) at a global scale or wider spectrum as you would work/interact with people at different phases of their lives, style of communication, varied level of awareness about other people’s culture/boundaries/choices as well as different level of stress management skills etc. To be effective as a data scientist or in any role at the intersection of technology and business we would have to deal with diverse stakeholders/people. Attending/participating in different group initiatives related to a course like this is a valuable experience and can be highlighted by many as part of our value proposition as professionals.

Even though the course content was not overwhelming for me it was still an interesting time of juggling various courses, work timeline as well as summer activities and family commitments as a working parent. Some links shared by other scholars were extremely valuable and saved my time as I did not have to spend hours in locating those. I still wish I could spend more time reading those resources and attend more activities to make the most of this collaborative learning effort and help each other.

That said I am looking forward to applying (at work) what I learnt from this course about federated learning. Also backed by the course content I created a strategy to influence some private and public sector decision makers and data owners. This is to reduce barriers in building Machine Learning models in a collaborative way to propel forward the movement of data driven insights and policies. I am also excited about the Lunch and Learn talk I am doing on Encrypted and Federated Learning this Fall in front of public and private sector stakeholders. I also intend to highlight the learning outcome this Fall during one Youth conference in Canada involving approximately four hundred high school students who are curious about advances in AI and technology in general for the betterment of lives globally.

As part of attending the course I did some minor code fixes in one advanced federated learning related example here. These fixes allowed me to run the example smoothly without errors. We, another study group (consisting mostly of students from PST timezone) would like to turn this code into a real life use case/example of collaborative effort among university research center and hospitals as per this diagram.

Crowd Counting Concept, Use Cases, Importance

Crowd counting is the way of counting or estimating number of entities/objects in a crowd. One of my fellow team member wrote an article on this. I prefer to use “entities/object count” instead of the typical term “people count” and I prefer to use the term “object counting” instead of “crowd counting”. That way we would be able to apply the same concept and techniques with little adjustments to count animals, trees, delivery packages, bottles, vehicles etc. and would easily see more real life applications of automated counting.

In the series of articles on object counting I would highlight the benefit of automating object counting mostly from business stakeholders’ perspectives. This is because the success or failure of initiatives significantly depends on having sponsors for projects and support from various stakeholders and “economic buyers”. If they do not see the value then we would not have much success with our initiatives in improving citizens lives no matter how elegant and technically sound the proposed solutions are.

Using human to do the counting may not be feasible in every case. As for example sending human frequently to remote hard to reach areas for checking regulatory compliance is not feasible in terms of required time and cost. Also hiring enough people for compliance check may not be feasible either.

Steps For Crowd Counting:

  • Data Acquisition:

In an initiative to automate the approach of crowd counting images and videos from cameras, drone or satellite can be used. I shall write about resolution and area/coverage of different image collection devices another day.

The collected images may be available on various open data portals too for researchers to analyze.

  • Loading input/data — load the data in memory.

When we are dealing with lot of images generator and limited pre-fetch concepts are very useful.

  • Clean (to reduce garbage in and garbage out situation) and pre-process data.

In this process we omit images where the image quality is poor.

In pre-processing phase we may resize image, reduce colored images to gray scale and set the data to appropriate shape so that various algorithms can handle the data.

  • Segment an image — it is the process of automatically analyzing image and putting bounding box or contour around items on an image.
  • Classify/label the segments that have our object of interest (like “tree” not a “tree”)

There are algorithms which does both classification and segmentation together — which is called semantic segmentation (as it is not only putting bounding boxes where there may be something interesting; it is also looking for a particular type of item while putting boundary that is attaching meaning or semantic to that bounding box or contour.

  • Counting : In the very simple case of a still image (instead of video frames); we would count objects detected in that image (we may additionally count objects by label/class type). If it is a video footage then we know that a video can be thought of as a collection of images shown quickly one after another. So in case of video we would consider sequence of frames or images that form the video and analyze one frame /image in that sequence at a time. Afterwards the steps will be the way we process an image, possibly compensate for objects moving out of frames and repeat the process for each image. In case of objects moving in and out of a frame that is “tracking” scenario additionally we would have to consider which object has gone out of frame with respect to time if we want to be very rigorous in counting/estimating.

Real Life Examples of Object Counting:

Transportation, Environment, Policy, Resource Deployment

Speaking from my personal experience, back in 2017 I was tackling the low level technical and analytics side of a Computer Vision project to identify types of vehicles on road (in my province) and potentially counting those by types. The vehicle counting was done using data set from https://images.drivebc.ca/bchighwaycam/pub/html/www/index-Northern.html. The idea was to come up with vehicle density index by type on a road segment. Which can be linked to/used in predicting road wear and tear, environment pollution etc. This in turn would help the Government and policy makers to plan and deploy resources for road repair, creating alternate route, set levy etc.

During development of the first version of the proof of concept I used OpenCV and Orange framework for image segmentation and classification. Using traditional non Deep Learning techniques like SVM and Naive Bayes the classification accuracy was ~64% . Later on we switched to deep learning and we used CNN. The idea was to see how it performs. Initially we were trying to build the model from scratch however the convergence rate of the model was super slow on a traditional CPU based system on Azure cloud. On top of that the number of examples were not enough for learning. My coworker compiled Tensorflow with optimization flags on and with that there was around 46%-54% speed up in processing. We did not have access (business constraint) to GPU. Still the training process was lengthy. At around Fall 2017 I read about transfer learning, used available pre-trained model from net and was able to do training using smaller data set and the process converged faster.

During transfer learning I applied “unfreezing” but only at top few layers and I restricted number of classes at the Softmax layer instead of using all classes/ labels that the pre-trained model was trained to handle. I used this model for classifying already segmented vehicles. For segmenting initially I was still using either traditional bounding box or contour detection approach. The segmented images were passed to CNN based classification component. During another iteration of that experiment I did semantic segmentation (like what Segnet does) which tackles both the tasks of segmentation and labeling (classification). With more than two classes the classification accuracy was still suffering. Once I modified the model to detect only two classes the accuracy rose to 81%. And due to leveraging transfer learning we did not need weeks of training to build the vehicle classification model.

For training I used mix of data collected from Google using crawler, from this contest, data from here etc. For semantic segmentation under supervised learning scheme I needed to annotate components /items on an image and for that these tools labelImg and FIAT were very handy.

And before having the large data set we were using basic augmentation techniques (like rotation, shearing) to create more images from base images.

At some point to avoid out of memory issue I used generator and prefetch concept to handle performance issues.

Trouble faced were as follows:

  • Due to different camera positioning we were getting images from the side or from front,
  • Change in lighting and reflection due to day or night light conditions , rainy or snowy day etc.
  • Low frequency of camera capture per day at some road segments
  • Low resolution image (deliberate to protect privacy of travelers).

Tweaking this model at our end was right thing to do instead of leveraging third party APIs. A big reason is those third party APIs are built for generic cases not tailored to our needs. Also during exploration phase we found that the big player third parties in this field did the training using high resolution images. The effect of this is if we used third party APIs then our low resolution segmented image of a bus was getting labeled as a harmonica. Whereas our system was labeling such segments properly.

We tackled data imbalance problem before training.

One major limitation of our system was incorrect labeling of small areas (containing a mix of snow patch, shadow and parking signs) as vehicles.

Even though the detection part was improved the counting part was still not very reliable. This is because we were treating each frame of the video stream as a still image. Also due to the frequency of image captures by the cameras the vehicles in and out of a scene would not be accounted for properly. Instead of a generic model we would have to set a per road segment model for more reliable estimate. But at the end the idea was to have a rough estimate not exact count of vehicles — and in that sense it was a reasonable proof of concept model.

Farm and/or Urban Animal Monitoring and Control

Below I am summarizing this article.

Tracking is detection of object in a video frame where item of interest is not still. The way tracking works is : at first a detection algorithm puts bounding box around objects in a frame. It assigns identification number (id) to detected objects as well. If in the next frame the object with a particular id is no longer detected/found it is assumed to have gone out of frame or scene. For detection YOLO (You Only Label Once) is a commonly used algorithm. There are lot of pre-trained models identifying particular types of object like people, cow etc. following YOLO architecture. For object detection one can use pre-trained model if it is trained to detect one’s object of interest. If that is not the case then one has to tweak the pre-trained/existing model or train a brand new one. In the next phase (called “Tracking”) one commonly used algorithm is called Simple Online and Real time Tracking (SORT). It can track multiple objects but the objects already have to be detected by the prior step (like YOLO) in the pipeline.

As per the author his group has more success in identifying small objects using masked R-CNN and restricting detection model only on two classes. It matches outcome of my own past experiments.

In some towns like mine number of deer and rabbits are increasing at an alarming rate making it harder for local home owners and small farmers to grow (which is encouraged) their own vegetables. By roughly counting deer population and rabbits from various images and eventually turning that into some type of index would help proper resource planning and deployment for deer/rabbit control. The market size/cost effectiveness is still to be studied.

Agricultural Automation and Precision Agriculture

Data can be collected about time of the year when wheat seed planting happened, amount of seed used, density of seedling, final yield of grain etc. Then by analyzing relationships among various observations/data points we can suggest optimum volume of seed to use based on time of planting. In general the theory is if planting happens on time/early then lower amount of seed produces high yield. In such a scenario increasing seed density per square unit of land would adversely affect grain yield. However for late planting scenario by increasing volume of seed during planting we can compensate for late planting and can still get high yield. In this paper a way to automatically analyze images from wheat field for counting seedlings (which can be correlated to grain yield down the road) is proposed. This paper also covers wheat spike counting (using Computer Vision) which can be mapped to wheat yield.

In this paper a deep learning based automatic analysis of images from crop fields to estimate crop pest density is proposed. As crop mites are very tiny traditional manual/visual inspection by human to estimate mite infestation is very inaccurate. This automatic pest monitoring can be used to predict plant disease, propose optimum steps and resource allocation for pest control.

Precisely monitoring and controlling crop yield (by creating optimal growing conditions) are very important to be able to feed the growing population using limited land and resources.

Assessing Popularity of Event, Regulatory Compliance and Safety

Crowd counting mechanism can be used to count participants in a protest and crowd coming to an event. The estimate can be used to control in flow of people at an event to prevent overcrowding, stampede (very common in hajj) and to plan evacuation route. Our Crowd Counting project/team’s core experiment and report highlights this use case. The same mechanism can be used for congestion monitoring and proposing alternate route as well.

In developing countries in case of non-profit services that run using donated money or Government grants there are some cases where the executives involved generate fake reports showing lot of clients for free education, health care services etc. that they offer. Whereas in real life they are not providing any service like these in a consistent manner. Additionally they hire fake cohorts on the day of inspection. If images are collected on random working days from the surroundings of such organizations and analysis of those do not show much activities then funding/support for such organizations can be stopped.

Similarly it is very common for people in some developing countries not to abide by transportation safety code and overload passenger ferry, ship etc. On many occasions this overloading/overcrowding is the reason of drowning and loss of precious lives. Greedy boat/ship owners allow as many people as they can on the ship/boat. By collecting and inspecting camera captures leveraging third party (not having conflict of interest and not corrupted) and estimating crowd inflow/outflow based on the images possible violations can be spotted and officials can be alerted to have a more closer look.

In a developed country application of crowd counting would not be about fighting fraudulent use of donated money or grant. It would be about monitoring efficiency of an operation compared to another operation of same kind in terms of clients entering the facility and overall service/response time. This comparison report can be sent to the agencies (without disclosing confidential information) and the agencies with below average performance can use this to improve quality of service / performance to keep getting grant and/ or maintain permit to operate.

In the next part of this article I shall highlight use cases of object detection and counting in health care (disease detection), hospitality sector, space, inventory management etc. Additionally I shall share size of each market segment where possible.

Word of Hope/Concluding Remarks:

As we rely on machines and algorithms to automate and speed up various processes we would still leverage human intelligence/keep them in loop. Use of automation would reduce service bottleneck and human burnout by using machine to tackle repetitive time consuming tasks. This would let human beings do what they are good at doing. With automation human beings would have more free time to handle relationship building (give their clients/people around them the gift of time and attention ) and/or tackle scientific research and discoveries. Leveraging AI human would have a chance to provide service, produce consumable at a rate to keep up with the growing and aging population.

--

--

Suraiya
Secure and Private AI Writing Challenge

Technology leadership consultation, personal transformation and expansion coach for blissful living.