PEEPS — a full stack web application for hosting person re-identification (ReID) algorithms: A Full-stack Development Journey

Hey there 👋, we’re Edwin and Edmund, and we’re both interns from Digital Hub. I’m Edwin, a final-year Computer Engineering student at NUS, and I’m Edmund, a third-year Computer Science student at NTU. In this article, we dovetail our interest in deploying computer vision algorithms with full-stack development. We journey through the full stack development of PEEPS — a web application that is designed to host computer vision algorithms. Within our stint here, we incorporated and deployed features for in-house person re-identification (ReID), which seeks to match individuals across different cameras or locations in video streams. We were guided by AI Engineers from DSTA’s Digital Hub Programme Centre: Ying Hui, Ernest, and Kimberlyn.

d*classified
d*classified
8 min readNov 1, 2023

--

TL;DR: bringing an idea from Concept to Capability

In this article, we dovetail our interest in deploying computer vision algorithms with full-stack development. We journey through the full stack development of PEEPS — a web application that is designed to host computer vision algorithms. Within our stint here, we incorporated and deployed features for in-house person re-identification (ReID), which seeks to match individuals across different cameras or locations in video streams, and has applications for surveillance and security.

Example of ReID | Video used for ReID: https://youtu.be/7nIQv3K_-Qc

Brainstorming features for a security & surveillance web application

What would users REALLY want — we embarked on multiple cycles of research and wireframing to unearth latent needs of a security & surveillance web application use. One useful reference was the OODA loop — which outlines the phases when presented with a situation or potential threat: Observe, Orientate, Decide, Act. We highlight two features that we shortlisted from our research are outlined here:

  1. Adding of Target Chip and Camera | Targets taken from: https://youtu.be/7nIQv3K_-Qc

The above shows an intuitive way for users to add Targets and Cameras. Users are able to upload images of their target and Peeps crops out the target using the largest bounding box as a guide. It’s an intuitive process that streamlines the task for an enhanced user experience.

2. Starting ReID tasks and alert notifications | Video used for ReID: https://youtu.be/7nIQv3K_-Qc

Here, we unveil the seamless user experience of selecting specific targets and cameras for their unique tasks. Observe as tasks are being run and witness the real-time alert notifications that appear on the dashboard, offering a glimpse into the potential of Peeps.

Full-stack Development: modular thinking to design for scale

Full-stack development isn’t just about the frontend (what you see) or the backend (where the magic happens). It is an involved process peeling down the layers across the product that influences entire user experience. We’re talking about everything from data handling, load times, access to features, prompts, warnings, user agency features. We’re all about adopting a modular approach, housing each part in separate Docker containers to boost scalability and ease of maintenance.

Architecture diagram of PEEPS

The frontend is our canvas, built using React and styled with Sass, Bootstrap, and Mui. To serve it all up, we’ve got NGINX stepping in as the web server. The frontend’s role is pivotal in shaping smooth user interactions and creating interfaces thatmakes sense and more importantly, enhances rather than hinders user workflows for security and surveillance operations. Imagine the agony of trying to track an intruder across cameras in your home, but spending most of your energy and time wrestling with clunky, buggy software!

The backend is the brains of the operations, powered by Kafka, FastAPI, and Websocket. It’s all about data communication, analysis, and dishing out those awesome computer vision algorithms. You’ll find key components like the Frame Extractor, Results Processor, and Ray Serve for all things CV. Kafka’s messaging magic efficiently ties these components together.

We also need to store and manage our data properly. For that, we use a combination of file systems, Redis, and MongoDB. Redis is an in-memory database that provides fast access to data such as the frames for our CV pipeline. MongoDB is a document-oriented database that stores structured data such as the different tasks that the user runs. Later on, we’ll be discussing more on our database system for prompt image storage and retrieval.

Initial design & development sprints

We’re big on prototyping to nail down how things should look and work. Having a good prototype helps to keep the team on the same page on the requirements on the application. It also sparked meaningful discussions, enabling us to envision how the application should come to life. We’re opening up about the initial design struggles, the battle against inconsistency, and the hunt for reusable components. Our evolving Figma designs tell the story of how we conquered these challenges.

Here’s a glimpse of the project’s initial designs:

Initial system tutorial landing page
Initial project Figma design for the homepage
Initial project Figma design for the Live ReID page
Initial project Figma design for the page where user upload their target image chip
Initial project React homepage
Initial project React page to add ReID targets

Initial project React Page to add new cameras

The above screenshots are the initial Figma design and the initial React designs that paved the way for our internship project. What are your thoughts on these initial designs?

Of course, there’s always room for improvement. For instance, you’ll notice inconsistencies between the Figma and frontend designs. Our initial design might not be the most user-friendly or intuitive. Questions arise: What does an alert look like when a target is detected? Which cameras qualify as “Cameras of interest” on the Figma ReID page? As we delved into the repository’s codebase, we also identified a lack of reusable components.

After first rounds of design & development sprints

Updated Figma Design

Through the innovative prototype function of Figma, a whole new level of clarity emerges. Users are now able to follow the seamless flow between each page, gaining a comprehensive grasp of the application’s intricacies. In the video above, we showcase how users can effortlessly initiate a ReID task, handpicking their preferred targets and cameras of interest, and seamlessly running them within an available task. Furthermore, catch a glimpse of the alert system, demonstrating how it gracefully handles target detection scenarios within the camera feeds. Our refined design not only looks better but also enhances user understanding and interaction.

Frontend Development: Bringing Life To Our Design

As seen from our demo earlier, creating user-friendly interfaces is all about blending React’s JavaScript libraries and frameworks for a delightful user experience. We also learnt the importance of responsive design for universal compatibility, a must for keeping users engaged.

Backend Development: Serving APIs and Implementing CV Algorithms

Moving on we focus on our backend infrastructure setup and API implementation. FastAPI and Websocket are employed for communication between frontend and backend.

Integration of computer vision algorithms were implemented using Ray Serve, due to its framework-agnostic nature, scalability features, and native GPU support. Components such as the Target Cropper, Detector, and Embedder were implemented to name a few.

Target Cropper is responsible for cropping a tight bounding box around the target that the user selected in the frontend. This way, we can isolate the target from the rest of the image and focus on its features and attributes.

Detector uses an object detection algorithm called YOLOv7, which can detect objects of various sizes and categories in real time. However, YOLOv7 may struggle with small objects that are hard to distinguish from the background. To overcome this challenge, our mentor used a technique called Slicing Aided Hyper Inference (SAHI), which divides the image into smaller slices and applies YOLOv7 on each slice. This way, we can improve the accuracy and recall of small object detection. You can learn more about SAHI from this article.

Visualisation of SAHI detecting small objects | Source: https://github.com/obss/sahi

Embedder is the “ReID magic”, which is responsible for creating a numerical representation of each cropped person in every frame. This numerical representation, also called an embedding, can be compared to the embedding of the target image to determine if it is the person we are looking for.

Backend Development: Results Processor and Database Management

The Results Processor, a crucial component, is designed to throttle and transform Ray’s results for seamless integration with external vendors. The implementation employs the asyncio library, showcasing the advantages of asynchronous programming. This approach enables our program to continue its operation while awaiting the completion of our throttling task.

Choosing the right database technology and design considerations for efficient querying is important:

Initially, our approach involved storing frames within the file system. This approach came with a set of drawbacks. The computer vision pipeline we employed was quite intricate, comprising numerous distinct layers. When each of these layers necessitates frame reading, the longer retrieval times from the file system is amplified and introduces a substantial overhead. Recognizing this challenge, we made a pivotal decision to investigate the potential of leveraging an in-memory database, such as Redis. This strategic shift enables us to cache our images directly in RAM, resulting in significantly reduced read times. The outcome? A marked reduction in overall overhead and improved efficiency!

Future work — Scalability

We conclude this article by discussing scaling options for optimising the application’s performance. Vertical and horizontal scaling were considered. Vertical scaling involves adding more resources like CPU and RAM to an existing machine. Whereas horizontal scaling involves adding more machines to the system. Vertical scaling has some advantages, such as lower costs and simpler maintenance, but it also has some drawbacks, such as a single point of failure and limited flexibility. Horizontal scaling, on the other hand, offers more redundancy and scalability, but it also requires more upfront cost. If given more time, we could look into using Kubernetes to improve our scaling capabilities further.

Conclusion

The key takeaway from this internship for us was the importance of iterative user feedback as a key ingredient to drive development and user-centric design. We appreciated how our DSTA mentors taught us the importance of continuous improvement and user testing to refine the user experience.

Edwin (middle) and Edmund (right) together with their mentor Ernest (left)

References

--

--