DBT-Athena: Iceberg: NOT_SUPPORTED: Timestamp precision (3) not supported for Iceberg. Use “timestamp(6)” instead

Life-is-short--so--enjoy-it
5 min readFeb 14, 2024

--

Researched why the error raised and checked the comparability issue.

NOT_SUPPORTED: Timestamp precision (3) not supported for Iceberg. Use “timestamp(6)” instead

Intro

In the past, I encountered the same error when I built a DBT model that uses Iceberg Table. At the time, I solely focused on how to make it work to move on without researching why it happened.

At this time, I decided to research why it happened and if there are any regression impacts I needed to worry.

This post is the note based on the research I did about the Timestamp precision error in Iceberg + DBT + AWS Athena

NOT_SUPPORTED: Timestamp precision (3) not supported for Iceberg. Use “timestamp(6)” instead

What is Timestamp precision?

I wasn’t sure what the Timestamp precision was for.

Timestamp precision refers to the level of detail or granularity captured in timestamp data.

For example, the Timestamp with these precisions mean

  • precision 3: millisecond
  • precision 6: microsecond
  • precision 9: nanosecond

By default, Athena supports timestamps with millisecond precision ( precision 3 ), meaning it can accurately represent timestamps down to the millisecond level (e.g., YYYY-MM-DD HH:MM:SS.SSS).

What is Timestamp precision of Iceberg Table?

I got curious the timestamp precision that Iceberg uses.

Based on Iceberg’s official documentation, Iceberg’s timestamp precision is 6 ( Microsecond )

Iceberg’s Timestamp precision — https://iceberg.apache.org/spec/#primitive-types

So, What was the problem I had?

Basically, the DBT model I built created a model with Iceberg Table with the source data in Hudi Table.

The timestamp in Hudi Table is millisecond precision.

I built the DBT model using Iceberg Table with the DBT model code like below. This model tried to read the millisecond timestamp from the Hudi Table ( hudi_event_table ) and write to the Iceberg Table.

Since the Iceberg Table’s timestamp precision is microsecond, the AWS Athena query failed because the timestamp precision didn’t match.

That was why the Athena printed the error message like below.

NOT_SUPPORTED: Timestamp precision (3) not supported for Iceberg. Use “timestamp(6)” instead

{{
config(
table_type='iceberg',
materialized='incremental',
incremental_strategy='append',
s3_data_naming='schema_table_unique',
on_schema_change = 'append_new_columns',
post_hook = [
'VACUUM {{ remove_double_quote( this ) }};'
],
)
}}

SELECT
obj_id,
event_type,
event_timestamp
FROM hudi_event_table
{% if is_incremental() %}
WHERE event_timestamp > ( SELECT max(event_timestamp) FROM {{ this }} );
{% else %}
WHERE event_timestamp >= DATE_TRUNC('month', date_add('month', -12, now()));
{% endif %}

How to cast the timestamp precision in AWS Athena?

Although Athena’s default precision for timestamp is millisecond, Athena also supports custom timestamp formats and resolutions, allowing you to specify different levels of precision depending on your data requirements.

You can adjust the precision of timestamps in Athena by configuring the format and resolution of your timestamp data during data ingestion or by specifying custom formats in your queries.

CAST(event_timestamp AS TIMESTAMP(6)) AS event_timestamp
{{
config(
table_type='iceberg',
materialized='incremental',
incremental_strategy='append',
s3_data_naming='schema_table_unique',
on_schema_change = 'append_new_columns',
post_hook = [
'VACUUM {{ remove_double_quote( this ) }};'
],
)
}}

SELECT
obj_id,
event_type,
CAST(event_timestamp AS TIMESTAMP(6)) AS event_timestamp
FROM hudi_event_table
{% if is_incremental() %}
WHERE event_timestamp > ( SELECT max(event_timestamp) FROM {{ this }} );
{% else %}
WHERE event_timestamp >= DATE_TRUNC('month', date_add('month', -12, now()));
{% endif %}

How does the different precision timestamp value look like on Athena?

Like I shared, I learned how to change the timestamp precision. After that, I got curious how the different precision timestamp value look like on Athena.

The microsecond precision has a six decimal, and the millisecond precision has a three decimal.

yep, Athena was able to read the different timestamp precision properly.

Timestamp precision in Iceberg Table

Any data loss when changing precision?

As you might notice, there is a chance that the precision can be lost during the timestamp casting.

  • millisecond to microsecond: the extra microsecond will be appended with zero.
  • microsecond to millisecond : the microsecond will be dropped.

When casting timestamp precision in AWS Athena or any other data processing system, potential data loss can occur if you reduce the precision of the timestamp.

For example, if you have timestamps with millisecond precision (YYYY-MM-DD HH:MM:SS.SSS) and you cast them to a format with second precision (YYYY-MM-DD HH:MM:SS), any millisecond-level information will be truncated, leading to loss of precision. Similarly, if you cast timestamps with nanosecond precision to millisecond precision, the nanosecond-level information will be lost.

It’s important to be aware of potential data loss when casting timestamp precision, as it can affect the accuracy of your analysis or calculations. Before performing any casting operations, you should carefully consider the level of precision required for your use case and ensure that you don’t inadvertently lose valuable information.

If maintaining timestamp precision is critical for your analysis or calculations, you should avoid casting timestamps to lower precision formats unless absolutely necessary. Alternatively, you can consider using higher precision timestamp formats or preserving the original timestamp precision throughout your data processing pipeline to minimize the risk of data loss.

In my case, I have several DBT models that refer models using different precisions due to the mixed Table Type such as Hudi, Hive, and Iceberg. After I checked the DBT models I have, I found out that all the datasets didn’t have any values for the millisecond and microsecond. Therefore, I was safe to change the timestamp precision back and forth.

How about reading timestamp in Iceberg Table and writing to non-Iceberg Table ( like Hive )?

It’s really a valid and common case.

The use case is reading microsecond and writing as millisecond.

What should we do?

The timestamp value has to be casted to millisecond like below.

CAST(event_timestamp AS TIMESTAMP(3)) AS event_timestamp

--

--

Life-is-short--so--enjoy-it

Gatsby Lee | Data Engineer | City Farmer | Philosopher | Lexus GX460 Owner | Overlander