A Timely Update to the Bolt Protocol
Our main mission in the Drivers team is to offer an idiomatic and consistent developer experience, regardless of the target programming language and Neo4j deployment topology.
Consistency manifests itself in the concepts and API the various drivers expose, but also in the way they behave.
The main way we ensure this is through our shared suite of acceptance tests, also known as TestKit.
To that end, when my colleague Rouven worked on introducing date and time-related tests to TestKit he noticed something strange going on.
Neo4j History and Bolt Protocol 101
Before diving into the core of the issue, let us recap what purpose the Bolt protocol serves.
Skip this and the next section if you are already familiar with the protocol.
Historically, Neo4j started as a JVM-only embedded database.
Indeed, it would only run co-located to your application, sharing the same environment (shared heap, shared garbage collection cycles, …).
Around Neo4j 1.0 (ca. 2010), a REST API was introduced and Neo4j could be deployed as a standalone server with the default port 7474 that you all know and love.
Clustering appeared shortly after.
A few years later (ca. 2015 — early 2016), Neo4j 3.0 came out and its brand new binary application protocol appeared (with the default port 7687). The Bolt protocol was born, as well as the Bolt server component and a set of official drivers for Java, Python, and JavaScript implementing it (.NET and Go came along a bit later.)
In a nutshell, the Bolt protocol specifies how clients and servers interact via specific Bolt messages, exchanging data following the PackStream
format.
You can learn more about it here.
DateTime* Structures
Among these data structures, you can find DateTime
and DateTimeZoneId
.
They both encode a localized point in time, defined in seconds and nanoseconds.
They can be:
- Cypher query parameters
- Results of Cypher queries when invoking one of the temporal functions
- Returned nodes’ and/or relationships’ date/time properties
They differ by the way they localize the point in time:
DateTime
specifies the offset from UTC in seconds.DateTimeZoneId
specifies the localization with a timezone name.
Let’s illustrate how they work.
Take 1970–01–01T02:15:00.000000042+01:00
as a DateTime
.
The corresponding UTC time is 1970–01–01T01:15:00.000000042Z
(Z
denotes UTC).
The number of seconds since the Unix epoch is 1 hour and 15 minutes, i.e. 4500 seconds.
The offset is 1h, i.e. 3600 seconds.
The localized number of seconds is 4500+3600, i.e. 8100 seconds.
The resulting DateTime
will therefore be as follows:
{ seconds: 8100 nanoseconds: 42, tz_offset_seconds: 3600}
Let’s do the same with DateTimeZoneId
and 1970–01–01T02:15:00.000000042[Europe/Paris]
.
The UTC offset for this timezone at that point in time is +1 hour.
Therefore, the UTC time is 1970–01–01T01:15:00.000000042Z
.
From there, the same computations as above occur and the resulting DateTimeZoneId
is as follows:
{ seconds: 8100 nanoseconds: 42, tz_id: “Europe/Paris”}
Back to Sweden
Rouven noticed a problem with the following point in time: 1980–09–28T02:30:00[Europe/Stockholm]
, i.e. September 28, 1980 at 02:30 AM in the Europe/Stockholm
timezone.
You can try it in a program by running the following Cypher query RETURN datetime ("1980–09–28T02:30:00[Europe/Stockholm]")
and extracting the results.
Did something happen on September 28, 1980 in Sweden that could cause an issue?
(Sweden — it has to be said — has some form in this area, as Jon Skeet famously pointed out.)
Here I Come to Save the Day(light)!
The answer is yes!
In 1980, Sweden started to implement Daylight Saving Time (DST), also known as Summer Time.
DST consists of shifting the clock at different times of the year to adjust waking hours to daylight hours.
The clock is usually advanced one hour in winter, thus creating a gap.
For example, a country could decide to implement time shifts at 2 AM: after 1:59:59 AM, the clock moves forward to 3:00:00 AM — 2:15:00 AM for instance does not occur.
The clock is usually set backward by 1 hour in summer, thus creating an overlap.
Following the same example, after 2:59:59 AM, the clock is set back to 2:00:00 AM — 2:15:00 AM e.g. occurs twice with different UTC offsets.
The Ambiguity of DateTimeZoneId
Let’s try to convert 1980–09–28T02:30:00[Europe/Stockholm]
to a DateTimeZoneId
as represented in the Bolt protocol.
First, we need to determine the corresponding UTC time.
On that specific day, the clock was set back to 2:00:00 AM after the first occurring hour of 2:59:59 AM.
Therefore, there was two 2:30:00 AM with different UTC offsets.
Since this time represents an overlap, we cannot know which offset to use!
To make matters worse, most languages will silently resolve this datetime.
The following Go program, when run on my machine, will print an offset of 1 hour.
This will print: The offset is: 3600s
.
The Go API time.Date
explicitly documents:
Date returns a time that is correct in one of the two zones involved in the transition, but it does not guarantee which.
The following Python program shows that the programmer has a little more control over the localization process (notice the second localize
call and its parameter is_dst
):
Overall, the ambiguity remains and may cause bugs further down the line.
Time for a Fix
The root cause of this issue is that the seconds field of DateTimeZoneId
includes the offset we sometimes cannot resolve.
One way to resolve the ambiguity is to encode the seconds
field of DateTimeZoneId
as UTC time (as well as the seconds
field of DateTime
’s for consistency’s sake).
Indeed, UTC time is monotonic (it only ever grows) and is therefore non-ambiguous.
Going back to the dreaded September 28, 1980 at 2:30 AM in the Europe/Stockholm timezone, the seconds
field of UTCDateTimeZoneId
(the UTC-encoded replacement of DateTimeZoneId
) will either be:
- 338949000 seconds, i.e.
1980–09–28T02:30:00+02:00
- Or 338952600 seconds, i.e.
1980–09–28T02:30:00+01:00
No more ambiguity — problem solved!
Correction Timeline
The UTC-aware structures are available in all the Neo4j 5 releases.
They are also available in any Neo4j 4.4 release following 4.4.12 (included), if the driver requests it and the server accepts the request.
That way, older drivers connecting to newer servers or vice-versa will continue to work with the existing datetime structures, thus not causing any disruption.