Is #SuttonBinShame a bad data problem?

Sutton Council handed its waste and recycling collection service over to outsourcing firm Veolia at the beginning of April. Veolia’s contract required it to operate a substantially new service in the borough aimed at increasing the recycling rate and reducing the amount of waste going to landfill. This has meant more bins for different types of waste and different collection cycles to those previously operated.

It’s fair to say that the new service has been an unmitigated disaster for everyone concerned, not least residents such as myself. My own bins haven’t been collected on time or in the right way since the beginning of the month. Elsewhere, there are numerous reports of collections being missed for weeks at a time, as well as recycling being mixed in with general waste, undermining residents’ confidence that there’s any point separating their waste at all.

This has led to countless gripes on social media under the hashtag #SuttonBinShame, stories in the Sutton Guardian and London Evening Standard, and even a question to the prime minister from Sutton and Cheam MP Paul Scully.

So what’s causing the underlying problems and who’s to blame? There’s a lot of finger pointing going on from all sides, but this comment from resident Richard Johnson reported in the Sutton Guardian caught my eye:

Mr Johnson says at a council meeting on April 20 Veolia’s general manager Scott Edgell told the council the list of properties it had been given was inadequate.
Mr Johnson added: “Scott Edgell stood up in front of seven councillors and said the list Veolia had was inaccurate, out of date and woefully insufficient.
“There’s no way the collections can work with a mismatched list, but none of the councillors said a word when they could’ve gone ‘okay, there’s a massive problem here, let’s sort it’.
“I’ve even volunteered to go round with someone from Veolia and show them where the bins are so they’re on the radar.

The hypothesis that many of the problems could have been caused by bad data is a reasonable one. It appears that many homes, perhaps even whole streets or blocks of flats, aren’t on the collection rounds that the bin crews use. There are three different types of addresses with different collection requirements, and it’s very likely that some addresses are misclassified. The dustcart drivers use tablet computers in the cab and presumably the routes they drive are logged and checked using a GPS system. On the back of this, Sutton Council’s contract with Veolia specifies a range of what are effectively financial penalties if Veolia misses too many collections. So verifying what has been done and what has not been done is essential. But of course that can only be measured against what everyone thinks Veolia should be doing. If the base data is incorrect and incomplete, you’ll see perhaps significant numbers of missed collections for homes that aren’t on the system at all.

And this is the case that Veolia’s general manager Scott Edgell is making: Sutton Council provided them with bad data, so they can’t be blamed if collections get missed.

I’ve got no wish to absolve Sutton Council from any failings that are genuinely theirs, but Scott Edgell’s argument isn’t convincing even if his premise is true. And his premise is almost certainly true.

I work with data. I don’t think I’m revealing any trade secrets to publish here that anyone receiving data from anyone else, especially from an entirely separate organisation, needs to treat it like toxic waste. Other people’s data is fundamentally not to be trusted from the outset. Your default assumption is that the data you’ve been given is incorrect and incomplete. Assume that the data you’ve been given is not just a bit wrong, but so wrong that using it will kill people. (In many contexts, this is in fact literally true.) It’s not just bad data. It’s harmful data. And so you do nothing with it until you are confident that it has been checked and cleaned to the point where it’s safe to use.

Because saying “We screwed up the service because we were given bad data by our client” is never an excuse. It should never even be an explanation. It is always the responsibility of those receiving data to ensure that it’s fit for purpose, even where the purpose is giving a service to the data provider. To think otherwise is to accept that your organisation runs on a “garbage in, garbage out” principle, if you excuse the joke. But competent organisations and people don’t operate like that. It’s a good thing that Veolia isn’t running hospitals or air traffic control. They don’t, do they?

Bad data from third parties is the rule not the exception. You always, always, always check and clean it yourself. Always. In 99% of cases that’s absolutely necessary to get it to the point where it’s fit to use. In the other 1%, you still have a responsibility to check.

So how likely is it that Sutton Council gave Veolia bad data? In my experience, about 99.999% likely. There are two reasons for this:

  1. Other people’s data is nearly always bad in itself.
  2. Other people’s data has been created for their purposes not yours.

I’ve worked with Sutton Council’s data in various areas and that data is generally bad. But this doesn’t make Sutton Council exceptionally bad. It makes them average. The average organisation has poor data handling practices, so the typical data you’ll get from any organisation (including in-house data from your own organisation) is bad. Treat it like toxic waste.

But even if someone else’s data is fine for their purposes, it’s almost certainly not fine for yours, even if you think you’re going to do the same thing with it. Two organisations running different computer systems to do apparently the same thing will find that their systems have significant differences in the scope and semantics of their data.

Semantics are hard: Your site rates movies on a 1–5 star basis. My site rates them thumbs up/thumbs down. That other site also uses thumbs up/thumbs down but leaving a rating is required not optional. Not being able to resolve the differences between these kinds of semantics sensibly can cost people their jobs or their lives. Never assume that you know what data means until you’ve checked that your hypotheses are good.

The new waste and recycling service Veolia is operating is substantially different to the one the council previously provided in-house. There’s no good reason to think that Sutton Council’s data would be fit for Veolia’s purposes without extensive checking and cleaning even if it was fit for Sutton’s.

Sutton Council has budgeted £480,000 for IT integration work on handing over the waste and recycling collections to Veolia. It appears that for some reason the council decided to bear all these costs themselves, including those at Veolia’s end. The budget seems excessively large for what you might reasonably anticipate this work to entail, but even so: How much of this money was spent on data cleaning by Veolia? What did Veolia do to ensure that the data they were working with was correct and complete well in advance of starting the service? Because not only is that absolutely Veolia’s responsibility, it seems that we taxpayers are picking up the bill for it too.