Behind the Line of Sight: A Software Perspective
Our aim was to prototype a smart pedestrian crossing in collaboration with Direct Line and Makerversity that could increase line of sight for drivers in order to reduce accidents.
Breaking Down the Task
The concept for the prototype was to have the LED strips go through different animations and lighting patterns depending who was standing on the zebra crossing or in its vicinity at the time. This would be accomplished by applying live object detection on real-time video footage.
The history and background to the project are outlined in our projects page, but starting the project on a Monday, we were due to demonstrate the prototype to assorted press and the other residents of Somerset House in an exhibition on Friday afternoon. The schedule was tight!
At the same time the hardware team began to set up the rigging that housed the LED strips, the software team divided up the programming tasks for the project. We had to interface with programmable LED strips, which would react to a stream of images coming from two webcams that we planned to attach to the poles on either side of the crossing.
We initially divided up the problem into two parts:
- Making the LED light animations programmable. Different lights and animations are to be displayed for different states of crossing, e.g. Nobody There, Someone On Pavement, Someone Crossing
- Setting up the webcam so that it would feed a stream of images into a object detection program that would check whether any people detected were in the pavement or crossing area.
For controlling the lights, we used BiblioPixel, a library that could program the animations for the LED strips. Although we were able to implement some sophisticated animations such as waves and smooth colour transitions, we ultimately went with the following lighting scheme:
- Nobody There: off
- Someone on Pavement: flashing amber
- Someone Crossing: solid red
This ensured that people with epilepsy and/or colour blindness could still unambiguously distinguish between each of the states.
Due to time constraints, we had to rely as much as possible on existing object detection implementations. TensorFlow's Object Detection API was proposed early in our search, using an RCNN architecture.
We managed to get a basic demo up and running off the CPU of our laptops and found that it was able to process images at a rate of 10 frames per second, which was good enough for our needs. After understanding the shape of the data that the object detection API was returning (essentially a list of points that represented a rectangle, a label of what that detected object was, and a certainty score, i.e. something like [{coords: [Point, Point], certainty: Number, label: Label}…]
), we were able to break down the task more concretely:
- Filter down the list of detected objects into those that are just people
- Filter down the list according to some certainty threshold
- Apply collision detection for these people on the pavement and crossing area
- Figure out what LED lighting routine to run, based on the highest ‘priority’ collision
A test rig was set up in the adjoining space where we could set up an external webcam. Colleagues happily volunteered to walk around that space to generate video footage that could be fed into our program to test its person detection capabilities.
Feeding the stream into our program, we were able to confirm that the model was good enough to recognise people in ideal conditions. We also identified the need to be able to easily annotate regions of what the webcam was capturing in order to define the pavement and crossing areas.
It also confirmed our assumption that we would need a dual-camera setup — one on either side of the road, as one camera’s field of view wasn’t quite wide enough to cover the entire length of the crossing.
Implementation
We were able to glue together bits from the following libraries in order to achieve what we needed for the person detection pipeline:
- OpenCV provided the video capture from the webcam as a stream of images. It also had handy drawing facilities for configuring the coordinates of the crossings and pavements
- Shapely, a geometry library provided the collision detection functions (i.e. checking whether the crossing or pavement regions intersects with the feet of the person)
- ReactiveX provided stream primitives in order to filter and coordinate events coming from our webcam streams
In the case of one webcam, we can allocate a pool of threads to consume off a queue of images that the webcam was producing, (i.e. a one producer and N consumer problem). With multiple (two) streams, coordinating them such that only the most up to date collision from each stream was used was less trivial. This was where ReactiveX’s stream operators helped greatly; the CombineLatest operator ensured that the observer (our LED controller) would always have the most up-to-date pairs of events from the streams to work with.
Demo-ing
On Thursday afternoon, we began to set up the stage at Somerset House, where we would hold a mini exhibition to associated press, the members of Makerversity and our sponsors. We were also nervously waiting to see if the second webcam would arrive in time, before eventually managing to pick one up in Argos, of all places.
On the day of the final demo, we made sure that both the webcams were calibrated so that there was an overlap in the middle of the crossing to ensure that there were no blindspots.
The exhibition went smoothly; one testament to the object detection’s robustness was that someone had to stand on the crossing while the interviews were taking place to avoid the flashing amber lights from disturbing the video recording. It was one of those rare occasions where a human cardboard cutout would have really come in handy.
Concluding Thoughts
Leveraging the libraries of Python’s thriving hardware and machine learning ecosystem, we were able to get a prototype up and running in a constrained period of time. We did have some ideas for future improvements:
- Setting up some test infrastructure to codify how our observable pipeline reacts to streams of images
- Introducing a ‘decay’ factor, so that whenever someone stepped off the crossing, it would go to flashing amber instead transitioning straight to being in the unoccupied state
- Making the object detection more sophisticated, e.g. filter for other obstructions such as pets on the crossing, and training the model on objects that are likely to come up around crossings
Nonetheless, it had been an encouraging result given our time limitations, and had generated many ideas for future products and research.
Resources
A summary of the main libraries and frameworks used:
- OpenCV— Python bindings for OpenCV
- Shapely — Geometry library for collisions in Cartesian space
- TensorFlow— Numerical computation library
- ReactiveX — API for asynchronous programming using observable streams
Even though it wasn’t used in the final program, Jupyter also deserves a mention as our go-to tool for literate Python development.