Why your EHR has a downtime when DST ends and how to fix it (and why hospitals don’t fix it).
I used to implement EHR software. It’s how I started my career in health tech. I think there are a lot of things about EHRs that are bad. I believe it was really important to digitize health data in the US but many of the side effects of that process were toxic. I think doctors and other clinicians have plenty of reasons to be skeptical about whether or not EHR vendors are acting in their best interest. It’s part of the reason I really enjoy having a job that allows me to help innovators with the hard technical and compliance problems in healthcare so they can get down to making stuff that helps patients and clinicians.
But here’s one flavor of EHR criticism I can’t abide. Over the course of this week, I saw complaint on my Twitter about Epic or Cerner downtime due to Daylight Saving Time (DST) ending this weekend. It even received an article in Kaiser Health News, with quotes from the former president of the AMA.
“Apple and Google seem to have dealt with these challenges long ago.” Of course, in practice the problem is not quite that simple. It’s important not to treat the symptoms of a problem, we must treat the cause instead. I’m going to walk through diagnosing why we have DST problems in healthcare, how we can fix it, and why we don’t. There’s some tech talk in here, but nothing that a layperson should struggle with.
How DST Timestamps just “work” in the rest of the world.
The creation of time zones grew organically as the world quickly became more connected through trains and the telegraph in the 1800s. While what should constitute a time zone and their modifiers like DST have changed over time, the basic concept is that you have one “true” time and that all other time zones are an offset of that “true” time. The “true” time is generally considered to be UTC (Coordinated Universal Time), also known as commonly referred to as GMT (Greenwich Mean Time).¹ Computers can communicate with each other across timezones by indicating their offset from DST in timestamps they send each other. There are a few formats in which this can be done, but a common one is what is known as an ISO-8601 timestamp, which looks like this:
2018–11–04T16:51:31.146Z — Current Time, UTC (Indicated by the “Z”, which is short for Zulu Time, which is another way to refer to UTC)
2018–11–04T10:51:31.146–0600 — Current Time, Central Time Zone while observing Central Standard Time (-0600 offset)
Computers and the applications on them may either send timestamps with the offset or do the conversions to UTC before sending them downstream. Using UTC timestamps across disparate systems and applications is how we keep timestamps and time management sane (as possible).
How do computers keep up to date time so that we don’t have time skew? That comes from another innovation known as NTP: Network Time Protocol. NTP is a mechanism in which computers ping a foreign server to ensure that their internal clock is operating as expected. While NTP was included in Unix in 1985, it did not receive massive adoption until it was implemented in Linux in 1997 with “Crony” and packaged as part of Windows 2000 in, well, the year 2000. This rise in NTP synchronization technology came into play as consumers were getting onto the “World Wide Web”, or the internet as we call it today. Now it’s very common for desynchronized computers to commonly not be able to authenticate to other applications since checking for timestamps and nonces is the easiest way to prevent a hack known as a “replay attack”. Most modern technology manages time using these technologies and that’s why Google and Apple have products that “just work” during DST.
Why should we care about these dates? It’s important to place all historical events in the context of when they occurred. The data protocol we use to exchange most healthcare data, even today, predates many of these advances in the implementation of time management in modern computing.
HL7v2–30 Years Strong in 2019!
In 1989, the same year that Tim Berners-Lee was inventing the “World Wide Web”, Health Level Seven (or HL7 for short) published version 2.0 of the HL7 specification. After years of custom interoperability formats, HL7v2 was the first common data standard in healthcare. And, it was popular. It very quickly saw adoption in helping early HIS and EHR systems synchronize data between each other. HL7v2 was designed to be a standard for integration inside hospital walls; after all, all other networks for exchanging data at this point were mostly custom by design. Adoption of new health data protocols for critical health data and healthcare workflows mostly stagnated after the adoption of HL7v2 in the United States, so even though it seems dated (ha!) it is still a very contemporary technology used in every hospital today.
Bringing this back to timestamps, this is what a timestamp looks like in an HL7v2 message:
200601241424 — Common HL7v2 timestamp in yyyyMMddHHmm format
Recalling our timestamps from earlier, you’ll notice something missing. That’s right, it’s the UTC offset. To be fair to the creators of HL7v2, v2 does specify a format in which an offset can be added to a timestamp. However, this was an optional implementation.² Datica has over 200 connections to health systems in the United States and I have never seen a production HL7v2 message with an offset in a timestamp. Ever. This is logical. After all, in the 90s healthcare computing was mostly basic at this time and was greatly supported by paper and people processes. A hospital system that spanned or shared HIT across states probably seemed like a quaint concept at the time.³ How can the lack of an offset cause all of these problems? It’s time to introduce a new concept called defensive programming.
Defensive Programming is the concept by which programmers should proactively try to anticipate software problems through application design. While some of these may be purely technical in nature, e.g. running out of memory allocated to the application, most defensive programming efforts revolve around preventing incorrect inputs to applications. Clinicians probably see some of these gates in order entry in the EHR, where users can’t order a prescription with 1 million pills.
One of the more common Defensive Programming concepts revolves around ensuring the validity of data provided to applications by upstream systems. Since an app cannot guarantee that data given to it by 3rd party applications is correct, the app should discard or throw errors when invalid data given to it. A common validity check is ensuring that users are not asked to do “tasks” in the past (or, unrealistically far in the future). While it’s popular to claim that EHRs are a billing system (and perhaps they are more so in the Ambulatory setting), in the Inpatient setting they are mostly a task management system. The EHR is instructing users to draw blood, administer medications or even indicating to therapists when to round on patients. While the documentation of the task lives in the EHR, the outcomes or ancillary jobs required to complete the task often live in other systems. A blood draw is scheduled in the EHR, but the accession number is generated by a 3rd party LIS system. Medications are tracked in the EHR MAR, but clinicians can’t get the medication for a patient (without an override) unless it’s been sent to a cabinet on the floor. All of these different applications rely on the EHR to give them valid data, but determine validity based upon their application logic. As such, most of these applications and hardware will discard input with an old timestamp. For example, if the EHR creates a message to draw a lab 1:01AM when DST changes clocks but the Lab System or Instrument still believes that it is 2:01AM, the Lab System may discard the task to draw the lab since you can’t schedule a lab draw for the past. In the best case scenario there would be a notably and visible error in the lab system to adjudicate the problem; in the worst (and not wholly unlikely) case the error would be silent and the lab draw would be missed. While all applications should ideally set their clocks for DST at the same time, this is a problem with old applications.
“That Belongs in a Museum!”⁴
While it’s amazingly popular to refer to all HIS applications as the EHR, there is often a large volume of applications that support workflows behind the scenes of the EHR. Most physicians don’t know what these systems are because they are never the end users of them; usually it’s some tech or analyst responsible for managing whatever esoteric workflow in some “niché” system. It might be inventory management or even Lab Instrumentation systems that have worked just fine for the last 18 years. Hospitals often use software or hardware like this long after support ends for the applications themselves. So, assuming that some well meaning programmer may have built effective DST support for the application in 2000, there’s one big problem with management of DST in the United States.
Daylight Saving Time is a man-made construct. It changed in 2005.
In 2005, George W. Bush signed into a law a broad energy bill which expanded DST by four weeks. Since 2007, DST now begins three weeks earlier and ends one week later. So, even if a well meaning application developer shored up some of their application time logic in preparation for Y2K, their application would have broken in 2007 when DST ended on the first weekend in November instead of the last weekend of October like usual. This of course is a conquerable problem for applications under current support (like modern EHRs) but would have broken any software that automatically handled DST written pre-2007.
This is why we have downtimes.
So, organizations now have 3 flavors of applications which connect to EHRs
- Applications entirely unaware of DST that operate on local time
- Old applications that handle DST the wrong way.
- New applications that do handle DST correctly (but send data using “correct” timestamps that 1&2 consider invalid).
Orgs could, of course, try to replace all old applications with new shiny applications. Good luck getting a hospital to replace an old system that’s been working well for twenty years. Orgs could try to modify interfaces to be timestamp dependent based on the interface, but that’s a lot of custom logic that can become fragile as they replace software or if DST changes again in the future.⁵ Or, orgs have the perceived lesser of evils where they turn systems off for an hour so that they ensure that they don’t send messages with “past” timestamps to downstream systems that don’t process them because they believe they are invalid. That’s how we ended up with the disappointing but understandable status quo.
On Flowsheets and Medication Administration.
One of the topics mentioned in the KHN article other than downtimes was some idiosyncrasies around flowsheet and medication management. For items like flowsheets, you see instructions requiring values in flowsheets or the MAR be set for 1:00AM and then 1:01AM (what would have been 2:01 AM). This is kind of goofy, but what is the better standard operating procedure?
A) One or two days a year, require a bit of documentation nuance from users.
B) Always allow users to immediately document or indicate that they gave two doses of medication for a patient at the same time (dangerous!) or potentially mess up the flowsheet documentation used to calculate volume for IV-based treatment by having two drastically different values for remainder of IV bag used at the same time (which one is correct?)
I can agree that this is weird and that there is perhaps better UX that could be used during this period of time, but as the Johns Hopkins CMIO notes, their organization has bigger operational problems than this one.
How to fix it (and why we don’t).
The problem with DST downtime doesn’t inherently lie on EHR vendors. It’s a global problem of interoperability. How do I know that Epic supports sending timezone offsets with events? I can see them being set in FHIR messages generated by Epic in their public sandbox. Instead of complaining globally about EHR vendors, we need to demand improved efforts around interoperability from all vendors in healthcare. While we will probably never replace all HL7v2 systems and interfaces, the more connectivity that we have between HIS systems that uses modern data standards like FHIR the better. While it’s easy to say that “interoperability is a business problem, not a tech problem” we need to ensure that we are fixing problems in interoperability that don’t necessarily have clear or compelling use cases. We haven’t been fixing problems with DST since it’s likely always been at the bottom of the barrel for most hospitals behind demonstrating meaningful use, improving reimbursement or managing patient safety. There are some organizations that have found the value of solving this problem. As the KHN article notes, this is something that Johns Hopkins and Cleveland Clinic have sought to minimize impact around DST by requiring vendors to be diligent around making workflows work during the cutover from DST to CST. Our customers, vendors ranging from startups to the world’s largest life sciences companies, had zero errors last night due to DST ending. And, as the on call support staff last night, I can tell you that I was able to enjoy my extra hour of sleep sans pager from 200 health systems asking why we were having DST problems.
We are all in this together.
Part of the reason I wrote this post was because I wanted to shine a light on a small subsection of what makes what most folks in health tech are dealing with on a day to day basis. I understand the frustration from clinicians, but indicating that it is incompetence and not priority or politics causing these outages is a dangerous path to progress. I’ve worked with smart people in HIT trying their hardest from Madison, WI to Madison, MS. The only way we can improve things HIT are by being transparent about our challenges and to work together to improve the underlying systems which power healthcare in the United States.
- Though, it’s worth noting that GMT is not the same thing as UTC. GMT is actually .9s behind UTC. Also, yes, it’s not CUT. It became UTC as a compromise between the French and English abbreviations of the standard. These are a few of the reasons why time zones are hilariously complicated.
- From the HL7v2.2 implementation guide, 12/1/1994: “The HL7 Standard strongly recommends that all systems routinely send the time zone offset but does not require it. All HL7 systems are required to accept the time zone offset, but its implementation is application specific.” From my experience, that “implementation” is to commonly just ignore that offset.
- This is anecdotal, but I remember that Epic built more robust procedures for multi-timezone support for Sisters of Mercy, which began implementing Epic in 2007. Predating SOM, Kaiser had its own different system for managing multiple instances of Epic (Intraconnect), which is mostly unique amongst Epic customers in the modern era.
- “We’re going to have the biggest DST. Longer than any other president’s DST. That’s why DST now ends on the second weekend of November.”