Incentivised Multi-Target, Multi-Camera Tracking with Untrusted Cameras (Part 1)
(Update: I later realized that the algorithm described in this blog post had exploits. It was necessary to change almost everything in order to make it robust enough to allow complete strangers to cooperate. In satisfying those constraints, the algorithm became something for which it was clear that having a central server/authority was unnecessary. But the article is kept here for reference. It’s just the earliest form of the idea that later became the theorem and algorithm described on the Grassland website)
In November 2015, I was being driven along the Ottawa River Parkway when at the corner of Island Park Drive I wanted to know whether or not I had seen the sedan to our right, license plate number [XXXX XXX] before and when. I have a near photographic memory for certain types of information so the fact I couldn’t exactly place this particular license plate tortured me for several years after the fact.
So I built software to record the activity of my street through a web camera in my living room window and convert the people and cars into animated renderings on a live “SimCity” using a 3D version of Open Street Map. On my “SimCity”, I can then identify, rewind, search and observe the paths of anyone within its field of view. To get more neighbours involved resulting in better accuracy, I designed an algorithm to allow and incentivize untrusted, non-overlapping cameras to join an MTMCT network of arbitrary size which places them, cameras and neighbours, in two distinct forms of Nash Equilibrium.
What follows is a reply thread of an email I sent to some of my engineer and mathematician friends, colleagues and to Prof. Carlo Tomasi at Duke University whose team created the benchmark for multi-target, multi-camera tracking.
So far it seems to them that the only strategy for a rational, self-interested node is to cooperate and contribute to the network and that, despite some initial misgivings this might actually be to the social good.
You can read our attempts to reason from the thesis and find “Exploits”. But I believe the “Answers” show the algorithm not only…
- Precludes any imbalance we attempted
- Improves correlation clustering by narrowing the search space to people entering or exiting private family dwellings
- The incentives and network effects result in a perpetual increase of camera coverage and density
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
I wanted to get your feedback on an algorithm I'm working on for a deep learning model I've been playing around with to track the activity of my street through a web camera in my window. The model is inspired by this paper. I’ve thrown the track paths up onto Mapbox so I can have a little “SimCity” with 3D buildings where the people and cars are my neighbours. It's only public data since I'm not looking into people's houses. But I want to get more data.
So I'd like to allow for anyone in my neighborhood to join the network without having to trust their camera or see their video feed since nodes can just download the neural network model weights and run the inference software locally which just uploads the tracking data of identified objects to my server (this scales better than the alternative). So I need to incentivize good behaviour and make bad behaviour so resource intensive as to be unprofitable. I call the algorithm Observation/Prediction With Confidence Ranking.
I've been trying to get other engineers to show me what problems, if any, whether programmatic or social, there may be in this cooperative and probabilistic game. I've included their attempted Exploits and my Answers. So far it seems to them that the only strategy for a rational, self-interested node is to cooperate and contribute to the network and that, despite some initial misgivings this might actually be to the social good.
But maybe you can see something we can't.
Observation/Prediction With Confidence Ranking
Instead of the cameras just passively observing objects (pedestrians and cars) to be tracked, they must also make predictions about the path vector just beyond their field of view (FOV) but only up to the next node [Although nodes don't know the location of other nodes. Only my server knows their location (latitude and longitude)]. Those predictions are in turn confirmed or denied by the observations of adjacent nodes, when and if the identified object passes into their FOV. And those adjacent nodes then rank the previous node based off its predictions. The total of these ranks becomes the "predicting" node's Confidence Rank which each adjacent node uses to assign a weight to the "predicting" node's observations (think of it like Google's Page Rank). Nodes with poor predictions and who don't self correct have their observations increasingly discounted by their neighbours until they're "silenced". Nodes with lots of neighbours and who learn to make better and better predictions are assigned a high weight and thus have a louder "voice" in the network.
To incentivize my neighbours to not only join the network but to ensure their feed is accurate and unobscured, the Confidence Rank will serve as a one of several parameters by which the network will reward each node with an amount of cryptographic tokens. This token lets you subscribe to my "SimCity" web app but also lets you query the network for a history of activity. The richness of this data will get better over time as each node can train the software to identify different types of activity (walking, running, biking, eating, talking, shopping, entering a store, leaving a store etc) and different types of cars and submit these new weights to me.
Discouraging Dishonest Nodes
The other parameters upon which a node's token rewarded is contingent is the distance that their identified objects travel through an unbroken chain of nodes, how varied the other nodes that observe that object are and how many objects they actually identify. This requires multiple nodes spread out over long distances to also confirm the identity and paths of these objects and be confident enough to make a prediction that their own neighbours will corroborate.
So for a node to risk being dishonest, it would have to have each node in an unbroken chain of nodes also risk losing their own reward just to reward an arbitrary node far away from it.
Each node also uploads random still images along with their accompanying activations to my server for verification.
What Do You Think?
So do you think there's anything wrong with my algorithm? Without a stranglehold on the weighted majority of the network, could it be gamed as far as you can tell? I know it seems Big Brother'esque but Big Brother seems to me like it's based off of an information asymmetry. But if everyone's naked to everyone else, then no one is.
...I feel like you could set up a fake node that has a moderately high rank just by inferring data from nodes close to you and relaying it. Timing of your system is going to be really critical, so you might need to blind the data or add some time lags to prevent fake nodes that just infer data and play with latency.
... nodes don't know where other nodes are. Only the main server does... The main server relays the result of the prediction from the [next] Observer back to the Predictor (which means a node would want to be conservative in its predicted path vector since the longer it is the more likely it is to be off) and vise versa. [ So even if a fake node could infer what traffic is around another node, the data they must pass requires identification via matching vector representations, this would be practically impossible without actually setting up a real node ].
In most cases nodes will not have overlapping POV, so what's stopping you from constantly voting against your neighbor in order to silence them? If I set up a series of webcams down the road in front of my house and then downvote everyone who isn't from my subnet then I could control voting across a specific chunk of space.
But that would be to no avail. Remember it's an Observation/Prediction game. If a dishonest node is not declaring its own observations about legitimate objects predicted by adjacent nodes, then it can't make a prediction about them either. So it's shooting itself in the foot. By making a prediction about the objects next path vector, you must have observed it.
If it doesn't acknowledge the real objects it sees that adjacent nodes predicted then what else can it do to get a ranking? Declare fake Observations/Predictions? But no adjacent node to it has a reason to intentionally acknowledge something fake unless it has a guarantee that it can in turn get a significant number of nodes in an unbroken chain after it to agree they saw it as well.
For nodes acting purely in self-interest, dishonesty doesn't make sense. Unless they control most of the nodes, they're better off acknowledging what they see when they actually see it. And by each node acting in their own self interest, the network gets better at seeing what's real and what isn't.
Also because nodes don't always overlap is why we're making predictions. The feedback from those predictions helps nodes and the entire network learn what paths objects are taking when they can't see them.
Assuming I have the above info correct, I'm not sure you've added anything. You are no longer sharing raw video data, true. You're still sharing the information that people actually care about, which is what/who/when things are moving around in their neighborhood.
I never wanted to share video data. Video surveillance is uninteresting and lifeless. What I want is to add more live data onto my interactive 3D map of Ottawa.... But instead of it being a digital ghost town, for every car that's driving down the real Bank Street there's a virtual car driving down my virtual one. You can click on it, it'll tell you all kinds of information about it's history. Or every time my neighbour goes out to mow his lawn, there's a little avatar moving around in front of the house on that map. You can also rewind the clock and see what happened at all times before.
To me, this is really, really fun. You'd have a birds eye view of EVERYTHING.
...the identity vector of each person lets each camera conduct what's called a Re-ID of the person. So that's the big problem faced by the models I've seen on campuses and in public places. Re-ID problems stem from trying to tie multiple vectorized image inputs (different angles, different lighting, different clothes) to a single person. Facebook has a leg up on this problem with photos tagging the same face in multiple lighting conditions. But tracking cameras have to Re-ID from not just a face but a body even when their back is turned.
Also, a neighbor mowing his grass wouldn't be crossing between nodes, right? Will something be displayed on the map if it remains within one node boundary?
You're right, he wouldn't be crossing the nodes. But we could be pretty sure that m neighbour is at his house and in the vicinity of this node alleging to view him for a few reasons.
- A chain of nodes would have tracked him up to his house some time before.
- The last node in that chain is the node viewing him.
- That node has a pretty high rank. Because its capacity for what's called Re-ID on that household is extremely high. Why?
- The node is in a residential area
- It's seen members of this household come and go longer than anyone.
- Through k-means distance and maybe some help from the operator it knows there are only 5 people in that dwelling so the search space is very tractable.
- It's seen them at almost every angle and in every piece of clothing they own.
This node is my neighbour's "Node Familias". [ so to speak ]. If it thinks he's there, he's probably there. If it think he's not around, he's probably not around.
So this "Node Familias" is the solution. The "Facebook tag" is their house. Before they leave the house the node knows what they're wearing. It just binds that vector to their ID.
It also helps balance out the power dynamics of this game as well. Nodes in high traffic areas will add more objects to the chain but only at later stages. And by themselves, their Re-ID accuracy is very low. They're completely dependent on the identification power of residential nodes.
Yeah, but you're still talking about some serious invasion of privacy. It depends on exactly how the system works and in particular when it starts / stops displaying an object, but imagine that I put up the map and then set FRAPS to video capture everything so that I can check history. I can now follow a car from driveway to destination, letting me make a very strong guess as to who is in that car and therefore keep track of individual people. And "well, but there won't be enough nodes for a complete routing" doesn't suffice -- once this product goes into effect, *someone* is going to realize that all they need to do is set up cheap webcams and they have opticon coverage of the city with no government oversight. Might be the city, the police, or a government that does it, but someone will. Answer:
You don't need to use FRAPS.... That's the whole point! :) It already comes with history. That's what I meant before by "rewind the clock". The rest of your concerns are exactly what my system "wants" to happen. [ We know that people, even in the early days went to great lengths to mine Bitcoin. This is a cooperative game where the reward is a feedback loop of not just money but omniscience. So there's no telling what people might do. But...] Let me clarify the raison d'etre.
I'll start with a few news articles [You can read them if you want but it's the same old story. Their titles are self explanatory]:
Hedge funds use this data to do things like match the number of people who entered Walmart last quarter to Walmart's earnings and then use next month's tracking data to predict Walmart's share price. They pay millions of dollars to have exclusive access to this data in order to gain an advantage on the stock market and have knowledge others don't.
Not only do we not know our information is out there, we don't even know how far the data has gone.
This ignorance is dangerous. Case in point.
And the time would fail me before I could cover Cambridge Analytica and other think tanks with countless other news stories, ad infinitum. It'll never stop.
The real problem is the asymmetry of this informational topology. A small group of people have an enormous amount of power because they know more about us then we even know about ourselves, let alone them.
The network is called Grassland. And for good reason. Grass is a fairly recent addition to the planet; and despite the fact most incumbent vegetation towers over it, blocking it from the sun, it's been so successful as to have spread to every continent on the planet, even Antarctica. Owing in no small measure to two very important properties. It's extremely combustible, even known to spontaneously combust; and its tolerance for heat is higher than almost any other form of vegetation; it will be growing again within 48 hours of a forest fire while most other forms of vegetation are dead. It's nature's great equalizer.
Grassland, does the same thing. Its ideology is not about hiding or concealing information. It's about leveling the playing field for everyone. Large data harvesting companies are like trees whose power comes from their ability to hoard enormous amounts of our own data and rent seek on it. Monoliths that cast a shadow over us. Their gravitational pull obscures and warps the dimensions of our institutions. A good government can't prevent this because either they'll be stifling innovation or their actions amount to useless political gestures because technology will have run half way around the world before legislation can even get out of bed. And a bad government benefits immensely from monoliths. So the only solution is to burn the forest. Everyone sees everything. Advantage then comes from innovation not rent seeking.
Take for example fake news, or even news in general. It all depends on an asymmetry of information. They have something, information that you don't have But you're not just getting the information, you're not just getting what they give you. You're getting what isn't given. What they deemed to be unimportant. In epistemology this is the ominous ramification of the Jean-Paul Sartre Coffee Without Cream joke. The element that is removed from what you're getting still forms part of the fundamental nature of what you're getting. The waitress is right; coffee without milk is different from coffee without cream in as much as the rests are just as much a part of the music sheet as the notes.
But Grassland gives you a frame of reference that is dispassionate, omniscient and (as I'm trying to verify) mathematically secure against falsehoods. Its reward system even rewards those nodes that exhibit the least bias. If you had this a few years ago, Americans would just zoom in on the map to the restaurants mentioned in the Pizzagate conspiracy, rewind the clock and see EVERYTHING that ever happened and EVERYONE that has come in and out of that restaurant from a birds eye view.
Oh, another thought: will the system deal with things in the sky? That assumes that some of the cameras are pointed up, but there's no reason they couldn't be. Would the software handle that?
There is a reason they wouldn't be pointed upwards. Unless adjacent nodes are also tracking the sky and adjacent nodes to them are also tracking the sky and so on and so on, no such node is going to be part of the network if it can't be part of the same chain of moving entities that the rest of the network is a part of. That node will be doing O/P 's based on objects no other adjacent node can O/P. So, as I mentioned before, other nodes will just discredit that node until it's completely "silenced". Essentially dead to the rest of the network and earning nothing.
The system is easier to understand if you don't think of nodes as independent points but as having an existential necessity to be part of that chain.
My only concern would be if there's actually a market, which is something I simply don't have the knowledgebase to judge. Finding some way to do at least a preliminary validation on that should be the next step -- and note that in this case I'm using "market" to mean "people who want to be involved", regardless of if there's money to be made on your part or theirs.
If my team and I can convince just a small group of proselytes that we've no choice but to "burn the forest", it'll grow from there with no advertising. Even those who don't get the ideological underpinnings will join just for the token reward and bring in others. [The more it grows, the better the data will be; which translates to higher payouts to participating nodes.] This should form a feedback loop. (Assuming the mathematical game we're playing has no unpatchable exploits)
Even those who are in it just for monetary gain should become "Beliebers" if the corollary to Upton Sinclair's, "It is difficult to get a man to understand something, when his salary depends on his not understanding it." is just as true.
[ "burn the forest" ] ...I get that part but i still see the potential privacy problems. I can definitely see this becoming a thing but i cant think of the specific use cases although I don't think that matters, the use cases will naturally follow... You definitely need to anonymize the data, like people faces and license plates. are you going to limit people to only see/use data from their immediate vicinity? or can i browse around my city for example? to see interesting statistics for example
.. I think your fears are a bit over reactive. Here's why. I have this set up in my [living room] window. It sees my neighbour but it also sees me. My neighbours can see as much of me as I can of them... [ Let's say I'm a bad person (a stalker, burglar or what have you) and I surmise I can use Grassland to my advantage. But then, it must also follow that my neighbours would not only be able to see everything I did leading up to the event but after it as well and also where I was at this moment. We're now in a Nash Equilibrium. It's mutually assured destruction. We must be good to each other. So even if this expands beyond my neighborhood, I see no reason to obscure information.
To be fair, even cities like London have cameras everywhere yet have very high crime rates. But these camera systems support "dumb", disparate video feeds whose information only flows one way ---into a bottleneck. Thus they can never be part of the social fabric. They don't produce salient narratives of someone's life to the community. And their corrective power has all the delayed exhortation of a credit card bill. ]
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —