This widely-promoted trial claimed to show a 4 and 5-day work week are equally productive. It didn’t.

As usual, the media failed miserably at reporting on a social science study.

Image for post
Image for post

In 2018 a New Zealand financial advising firm trialled switching from a 5 to 4 day work week, over a two month period.

A media push around the 19th of February 2019 promoted the idea that the trial had found workers accomplished as much work in 4 days as they previously did when working 5 days.

In particular it was a hit on Hacker News and The Guardian. But it was also covered in Fast Company, Fast Company again, ABC, NewsTalk ZB, WTSP, Fortune Magazine, MyBroadband, Women’s Agenda, LifeHacker and presumably others.

The media lapped up this idea. But the evidence for it in this trial is weak.

The news articles link to a ‘white paper’, but this report and most of the trial’s website say little about what evidence they have about reduced working hours and productivity. Instead it’s an advocacy guide to changing your company from a 5 to 4-day work week, and a plug for the firm that ran the trial.

To find that evidence you’d have to dig through to pages 8 and 9 of Prof Jarrod Haar’s research report. There are screenshots of those pages below so you can read and decide for yourself.

(Note that that and another qualitative report by Dr Helen Delaney also claim that working 4 days a week for the same pay made staff happier and more enthusiastic about their jobs in various ways. That seemed sufficiently plausible that I didn’t bother to check how good the evidence for it is.)

While ideally output would have been measured objectively — widgets produced or something — in this case it is based on a survey of 28 to 34 supervisors, who were asked about their team’s job performance before and during the trial. We are told: “Job performance was examined using a very standard construct: in-role performance, which basically reflects the way the supervisor see’s their employee team/s doing their job.”

While the report doesn’t say how in-role performance was measured, I emailed Prof Jaar and found out that supervisors were asked to score their teams on a scale of 1 to 6 on these three questions:

  1. Meets formal performance requirements of the job
  2. Fulfills responsibilities specified in job description
  3. Performs tasks that are expected of him or her

This measurement showed no change. That is, supervisors on average reported that their team’s ‘in-role performance’ was unchanged.

Supervisor surveys also reported improvements in ‘attendance behaviours’, proactive helping of other staff, employee behaviour towards customers, and perhaps creativity.

There’s nothing wrong with this as far as it goes, but is it compelling evidence that total output remained the same after work hours declined 20%? I think not. Here are some reasons, and you may be able to think of others:

  1. While the questions are certainly relevant to assessing staff output, they don’t explicitly ask whether output remained the same. That would seems a natural thing to do if that’s what you wanted to know. (Though even if supervisors thought that was the case, the answer might not be reliable.)
  2. We don’t know whether the questions asked drew to mind an hourly productivity level or a total level of output. If supervisors thought about the former, then this is no evidence for the conclusion at all.

    I suspect that when asked about ‘tasks that are expected’ it will be hard to avoid judging performance relative to what people can reasonably be expected to accomplish in the time given, even if subconsciously.

    And think about the two questions regarding “formal performance requirements” and responsibilities specified in job description”. If a staff member was previously going above and beyond what they are strictly required to do in their contract — as most of us do — but now accomplishes less than that, adapting to the 4 day work week by doing only what is technically required, they should score equally well on these questions.
  3. For the same reason that the people running the trial couldn’t objectively measure how much work was getting done — if they could, they presumably would have done so — supervisors might not have been able to do so either.
  4. Supervisors may not want to give an answer that would strongly imply that their hourly salary, and that of their staff, should decline 20%.
  5. The head of this firm appears to be a vocal public advocate in favour of a 4-day work week. Staff may not want to disagree with his views. Or the advocacy within the firm may have convinced them that productivity shouldn’t decline, and given the difficulty of measuring how much work is actually getting done, that’s the impression they report.
  6. Supervisors may not want to report that their teams are performing poorly, or getting less work done, especially if other teams are going to report that they’re doing just fine.
  7. These comments from supervisors suggest that hours worked may well not have actually declined by 20% in any case: “Some days were busier due to having a day out of the office”, and “They also didn’t mind longer hours if it meant having a day off.”
  8. Let’s say that hours worked actually only declined 10%, and that people worked a bit harder such that total output only declined 5%. Would we expect supervisors to be able to observe a 5% decline in work accomplished over 2 months with the naked eye? I would guess not.
  9. Staff were having their productivity scrutinised to an unusual degree during this trial. This kind of attention is known to temporarily lift how much people get done.
  10. Maybe staff could increase their hourly productivity to make up for working fewer hours over two months, but wouldn’t be able to keep this up indefinitely, especially as output expectations gradually adjusted to the new shorter work week. Maybe staff at this firm were working particularly unproductively before this change. Or this is practical for this sort of job, but not others (to be fair, they clearly flag this issue of external validity in their reports).
  11. More technically, staff didn’t have to contend with the lower level of physical capital and infrastructure the country might be able to to sustain if everyone worked 20% less.
  12. Added: A reader notes that before the trial started, staff reviewed all their tasks, identified what could be automated, done more efficiently and eliminated. So this trial might show that a productivity drive combined with shorter work hours can hold output stable. But a productivity drive without shorter work hours might increase output, so there would still be a trade-off.

I tend to find social science experiments unconvincing — someone who is less skeptical by nature might read the above and still find the result suggestive.

Whatever the final verdict, as usual, the media totally failed while reporting on a piece of social science:

  1. Not a single journalist chased up the data.
  2. Not a single journalist critiqued or even explained the method used to arrive at the conclusion.
  3. Not a single journalist asked for details of the questions in the survey, which aren’t on the study’s website.
  4. They mostly didn’t compare to any previous research on this question.
  5. Instead they offered breathless support for a surprising claim based on weak evidence.
  6. They displayed complete credulity in the face of a media push that was focussed on advocacy and free publicity over impartial evaluation of the evidence.

Millions probably saw these stories, especially the one in The Guardian, but as far as I know I’m the only one to dig into the issue. What fraction of people who see the original claim will see this or any other critique? 0.1%?

One lesson is not to assume that substantial attention to a study will lead to problems being noted and corrected. People very rarely take the time to email researchers directly to really figure out what’s really going on.

I should clarify that despite the above I think this is a cool experiment, and the results interesting. I’m glad it happened. While I expect output to go down when people work 4 days rather than 5, I’m not certain of that, and it seems worth testing. A 4 day work week may be sensible even if output does decline somewhat.

But I do wish that the project had not gone to press without scrutinising the strengths and weaknesses of their trial.

I should also add that I have no indications that the academics involved — Prof Jarrod Haar and Dr. Helen Delaney — have acted in bad faith. They are apparently working on a paper which will presumably outline the evidence and its weaknesses in more detail. Prof Haar was forthcoming in answering my questions about the study, which I appreciate.

It’s entirely understandable that businesses and advocates take advantage of the media’s gullibility to promote their agenda when they can. It’s journalists who need to do better, and do more than republish other people’s questionable press releases.

Until they do, don’t believe mass media reporting of social science unless you know and trust the specific author.

The data provided in the experiment

Image for post
Image for post
Image for post
Image for post

Written by

I research the world’s most pressing problems and how to solve them at More about me:

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store