CrowdSense; Live Room Occupancy Indicator

Norman Keyes
Generative Design Course
6 min readMay 10, 2024

Norman Keyes, Mohammad Hossein Zowqi, Nicholas Richards, Xinyan He | Generative Design | Spring 2023 | GSAPP | Columbia University

Diagram demonstrating how the projection indicates occupancy level

| Introduction

Maximum or over-capacity crowds at popular nightclubs create the potential for high numbers of casualties in the event of a fire. Amidst the popularity of these spaces of assembly and the lenient nature of bouncers/security to overcrowd these venues, an interactive tracker system could help solve and enhance the experience of these spaces.

In correspondence to this, our project aims to count the number of people entering a place of assembly/room. Sensory data will be tracked using a webcam and reflected through the lighting, which changes based on the occupancy limit. We are utilizing a color gradient of green to red, each representing a particular threshold increase in occupancy. A visual projection of this blob detection functions as an installation within the room that showcases how the tracking is determined.

Based on our scope, we used Python and Touch Designer to track facial data using the blob detection feature. Several libraries, including OpenCV, MediaPipe, and DeepFace, also performed face and hand detection.

Logic diagram illustrating how the video feed is processed

| Methodologies

This Python script leverages several libraries to detect and track faces and hands from a webcam feed. The program incorporates several stages, including initialization, detection, tracking, and filtering based on the number of unique faces. Here is a detailed explanation of the code, including its dependencies and an overview of its potential applications and limitations.

Ware Lounge in Avery Hall: Smart phone and projector are mounted on opposite sides of the central column

| Camera setup and placement

To conduct a prototype test, we used Ware Lounge as the place of assembly. The primary input source is a smartphone attached to the north column of Ware Lounge using a 3D-printed mount anchored to the column. The phone’s camera is oriented towards the entrance door, creating a detection zone based on input from people entering the room. This input is streamed over WiFi through vdo.ninja.com, which sends it through OBS Virtual, creating a live feed of the entryway.

A projector is mounted to the same column to produce the output feed; however, it faces the north wall of the room to display the projection. The live feed from the smartphone camera captures unique facial IDs and processes them through a Python script, as shown in the flowchart above. The colored overlay detections are fed into Touchdesigner, where an augmented video feed based on the occupant count is created. This is fed into the projector and displayed on the wall of the room.

Screenshot of video feed after processing with Python OpenCV Library

| Libraries and Dependencies

1. OpenCV (`cv2`): Used for video capture, image processing, and displaying the output video feed. It is a versatile library for computer vision tasks.

2. MediaPipe (`mediapipe as mp`): Provides robust solutions for hand and face detection, including pre-trained models for identifying critical landmarks on faces and hands.

3. NumPy (`numpy as np`): Essential for numerical operations, such as calculating the Euclidean distance between face embeddings.

4. OS (`os`): Used for handling file paths and directory creation.

5. Datetime (`datetime`): Helps generate unique filenames based on the current timestamp.

6. DeepFace (`deepface`): This technique is utilized for face recognition tasks, specifically for extracting face embeddings to identify unique faces.

7. Logging (`logging`): Facilitates logging for debugging purposes.

| Stage-by-Stage Explanation

1. Initialization:

  • The script starts by setting up logging to record debug information.
  • Initializes MediaPipe solutions for hand and face mesh detection, specifying parameters like `max_num_hands`, `max_num_faces`, and detection/tracking confidence levels.
  • Defines drawing styles for visualizing detected landmarks on faces and hands.

2. Setup Paths and Video Capture:

  • Determines the home directory and sets up a save path for recording video files.
  • Initializes video capture from the default camera and prepares variables for video recording and counting unique faces

3. Utility Functions:

  • `get_unique_filename()`: Generates a unique filename using the current timestamp.
  • `apply_color_filter(image, color)`: Applies a color filter to the image based on the specified RGB values.
  • `is_new_face(face_representation, known_representations, threshold)`: Checks if a detected face is new by comparing its embedding to known face embeddings using Euclidean distance.
Face and hand detection with additional swapping feature

4. Main Processing Loop:

  • Continuously reads frames from the webcam.
  • Converts each frame to RGB format for compatibility with DeepFace.
  • DeepFace is used to detect faces and generate face embeddings.
  • Filters out false positive detections and updates the list of unique faces if a new face is detected.
  • Utilizes MediaPipe to detect and draw landmarks for hands.
  • Checks for closed hands to start/stop video recording based on the presence of closed hands.

5. Handling Unique Faces and Display Messages:

  • Counts the number of unique faces detected.
  • Applies color filters and overlays messages on the video feed based on the number of unique faces.
  • Displays the processed video feed with annotations for unique faces and other information.

6. User Controls and Cleanup:

  • Listens for key presses to reset the unique face count or stop recording.
  • Releases video capture and writer resources and closes all OpenCV windows upon exit.

| Main Applications and Future Development

This script is designed primarily for environments where monitoring and controlling the number of individuals is crucial. Potential applications include:

  • Event Management: Monitoring the number of attendees at an event.
  • Security Systems: Enhancing security by identifying and tracking unique individuals.
  • Retail Analytics: Analyzing customer behavior and managing store occupancy.
  • Smart Homes/Offices: Automating responses based on the presence and number of individuals.

| Prototype for Further Development
This script serves as a prototype for more advanced systems that can incorporate additional features such as:

  • Integration with Access Control Systems: Automatically restricting entry based on occupancy limits.
  • Alert Systems: Sending notifications when the number of individuals exceeds a predefined threshold.
  • Behavior Analysis: Extending detection to analyze body language or gestures for enhanced interaction.
  • Cloud Integration: Storing data on the cloud for real-time monitoring and historical analysis.

| Current Limitations
Despite its capabilities, this script has several limitations:

  • Face Recognition Accuracy: The reliability of face recognition depends on the quality of the embeddings and the threshold for determining new faces.
  • False Positives: The system may still generate false positives, especially in varied lighting conditions or with occlusions.
  • Real-Time Performance: Processing multiple faces and hands in real-time can be computationally intensive, potentially leading to delays.
  • Recording Logic: The current method for starting/stopping recordings based on closed hands might need refinement for practical scenarios.
  • Scalability: Handling a large number of unique faces or extended periods of video recording may require optimization and additional resources.
Post production done in TouchDesigner

| Post Production

After the video passes through the detection script, it can be augmented to create a more compelling and immersive experience. TouchDesigner is a software that specializes in integrating and manipulating live signals for live performances. Here, the webcam footage is distorted through a series of noise feedback loops to create a mesmerizing swirling effect.

| Conclusion

In summary, this script provides a foundational implementation for real-time face and hand detection, with practical applications in various fields. However, further development and optimization are required to address its limitations and enhance its functionality for more sophisticated use cases.

| Appendix

Python Script:

--

--