CFA & MFB response stats: use with care

EBA Truth
7 min readNov 7, 2017

--

In typical fashion, Coalition MPs and the media have merrily gone about drawing conclusions that are not supported by the actual data on response times recently published by the CFA and MFB.

A mathematically prudent interpretation of the data leads to the following conclusions:

  • 20 CFA brigades in urban areas are failing to meet performance standards; and
  • There is no reliable evidence that any MFB station is failing to meet performance standards.

In addition, there are a number of flaws in the data collection process, all of which are likely to act to conceal the extent of problems at volunteer brigades.

Together these findings support the conclusion that reform is needed in order to support public safety and firefighter safety in regional centres and outer metropolitan Melbourne.

Don’t believe me? No worries. As always, I’ll lead you through the facts.

Stats 101

(Skip this section if you’re happy that measured success/failure rates come with uncertainty due to small sample statistics. I’m not trying to teach grandma to suck eggs, I’m just pre-empting the nay-sayers.)

First of all, let’s understand how statistical inference about response/fail-rate data works. Imagine we’re about to play two-up and I want to test whether the coins are fair, i.e. not loaded. I toss a penny ten times, and it lands on tails only three times. I expected five times. Does that mean the coin is loaded?

No, it doesn’t. I expect five tails on average, because I expect a fair coin to land on tails with a probability of one half (50%). But as everyone knows, random chance means a fair coin may land on tails a different number of times. In fancier terms, I can say that the underlying proportion of tails from a fair coin is 50%, but the sample proportion can differ due to sampling noise.

In this case my sample proportion is 30% — 3 tails out of 10 trials — but I’d be a fool to accuse you of loading the coin. I know from experience — and the graph above shows — it’s not very unusual to get 3 or fewer tails (or 3 or fewer heads) from a fair coin. I’d want to toss the coin a lot more times than 10 before I was confident a sample proportion of 30% tails was reliable evidence the coin was loaded.

Mathematically, I can put numbers on my confidence, using confidence intervals. If I did this, I would find a 95% confidence interval of 9%-61% for the underlying proportion of tails produced by that coin. Oversimplifying slightly, it means I believe there is a 95% probability the true underlying proportion of tails yielded by that coin in the long run is between 9 and 61%. So, it could be a fair coin — 50% underlying proportion of tails— and I’d be a fool to accuse you of loading it.

Now let’s say I’m still suspicious, so I toss the coin 100 times instead, and get 30 tails: a sample proportion, again, of 30%. Now I’m more confident you’re hustling me, because I have more data. If I calculated a 95% confidence interval, I’d find it was 22%-40%. A fair coin, at 50%, doesn’t fall within my 95% confidence interval. Now I can reliably accuse you of cheating. There’s still a risk I’m wrong but at less than a 5% chance (1 in 20) I’m fairly happy to take it.

The same considerations apply to the CFA and MFB data. We can’t just take the recorded success/failure rates at face value. They are affected by sampling noise. We have to work with confidence intervals if we want to draw any reliable conclusions.

The Data

Because they are based the response by fire stations to as few as 10 emergency calls, confidence intervals around performance compliance rates in the CFA and MFB response data are large.

Here’s how the MFB data look with calculated confidence intervals shown:

(Those lacking trust are encouraged to download the data and plug it in to an online calculator for the Jeffreys confidence interval.)

In all cases, the data are consistent with meeting or exceeding MFB’s performance standard of attendance in under 7.7 minutes, 90% of the time. Yes, some confidence intervals come close to excluding the 90% standard on the low side. This is not statistically significant. It could be real or it could be the result of statistical fluctuations.

Yes, the confidence intervals are large, in many cases. That’s because the sample sizes are small, which makes the data less reliable. Including stations with even smaller call samples — as Brad Battin called for in Parliament, accusing the government of a cover-up — would only make matters worse, introducing data so unreliable as to be totally uninformative. The only way to get a better handle on performance is to get more data for each station, for example by including other call types. (This wouldn’t skew the results either way overall— contrary to another of Battin’s goofball conspiracy theories — but it would shrink the confidence intervals.)

In any case, the MFB data are good enough to see a significant difference in comparison to the CFA data.

The CFA provided a greater variety of data than the MFB. Here I am concentrating on Hazard Class 2, which requires a response in under 8 minutes, 90% of the time. CFA reported both on Customer Service Delivery Standard — based on the time taken for any vehicle to arrive — and Primary Service Delivery Standard — based on the time taken for a vehicle to arrive from the station primarily responsible for a call at the given location. For fair comparison to the MFB data, I will look only at the more lenient Customer SDS.

That data shows, with genuine statistical significance, that the CFA is failing to meet its performance benchmarks in many suburbs and regional centres:

Having taken into account the limitations of the data by using confidence intervals, we are in a position to reliably conclude that the following CFA brigades are failing to meet Customer SDS: Castlemaine, Drysdale, Lara, Edithvale, Hampton Park, Phillip Island, Wonthaggi, Sale, Chirnside Park, Lilydale, Mooroolbark, Upwey, Caroline Springs, Epping, Wyndham Vale, Sebastopol, Benalla, Churchill, Moe and Newborough.

Unsurprisingly, of the 20 CFA brigades we can reliably identify as not meeting performance standards, 19 are crewed exclusively by volunteers. This not a reflection on volunteers themselves, who surely do their utmost to serve the community. It is just what you would expect when your response model relies upon people being available to drop what they’re doing and be at the fire station within 2.5 minutes. (2.5 minutes to get to your car and drive to the station, 90 seconds to get dressed and out the door, 4 minutes driving to the scene in the fire truck; 8 minutes all-up.) It’s not realistic for major urban areas in 2017. Those brigades that manage it regardless — like Narre Warren—must go to extraordinary lengths, and probably have some valuable knowledge to pass on to other brigades.

One of the struggling stations — Caroline Springs — is an integrated brigade, incorporating volunteers and professional firefighters. (Its inclusion in this article should provide some confidence I haven’t cooked the books.) Speaking to firefighters familiar with the circumstances at Caro, I’m told that Caroline Springs CFA serves a very large area, bordering four different MFB stations along its eastern boundary. The northern part of its area is served by a satellite station crewed by volunteers. To reach the extremes of their area, Caroline Springs professional firefighters face a drive of well over 10 minutes. It doesn’t matter how quickly they’re out the door, they are bound to fail the 8-minute response standard to those parts of their response area. Here is an area where reforms would ensure better service delivery.

For fair comparison with the MFB data, I haven’t looked at the CFA Primary SDS data here, but it, too, has lessons to offer. There are a number of brigades with very poor Primary SDS, but acceptable Customer SDS, because other stations regularly cover for them. But a safe and effective response requires two adequately crewed fire trucks. This may not be the case if the primary brigade is failing to respond.

The Victorian Government’s fire services reforms offer mechanisms to address all of these problems — if only the politicians would get out the way and let our fire services get on with the job of improving public safety and firefighter safety. Those reforms are not just about paving the way for better service delivery through more professional firefighters, or about putting an end to bitter disputation. They are also about supporting volunteer brigades to improve their performance, and they are about flexibility.

Yep: flexibility. Turning the usual assumptions upside down, the CFA is inflexible not because of the involvement of a union, but because of the involvement of volunteers. The MFB maintains a uniformly high standard of fire cover and consistent firefighter safety not just by crewing fire stations 24/7, but also by continually adjusting its response to meet changing circumstances. This ranges from the minor — updating dispatch rules and tables— to the major: adding, deleting or moving fire stations. Professional firefighters and the union support these changes because they improve firefighter safety and public safety.

In contrast, when the CFA tries to make such improvements, it frequently meets a wall of resistance on account of volunteer patch mentality. In a forthcoming article I’ll highlight a number of excerpts from submissions to the fire services reform inquiry that provide examples of where this has occurred. Here’s one to get you started.

Before I sign off, whilst these data do prove that many CFA brigades are not meeting performance standards, there are some caveats on the quality of this data. All of them tend to act to mask problems in service delivery, particularly by volunteer brigades, so the situation is even worse than the statistics indicate. I itemised some of these caveats in a Twitter thread:

--

--