"Creating a 'MAXVALUE' Partition of an Oracle Table in BigQuery"

Abhik Saha
Google Cloud - Community
4 min readApr 15, 2023

One frequent problem while migrating data from Oracle Datawarehouse to GCP BigQuery is the movement of partitioned tables.

Now in Oracle, there is a type of range partition known as the Maxvalue partition which stores the records from the table which do not fall under any partition defined either during the table creation or when new partitions are added.

There is also a limitation in Oracle that a partition column cannot be NULL. So to overcome the issue, the developers put a default high date like ‘9999–12–31 23:59:59’ in that column so that the data stays in the high partitions(Maxvalue).

During a package/procedure trigger, when the partition column value is updated to a lower date (such as SYSDATE), the records are supposed to move to lower available partition if the partition for that specific date exists. This shuffling does not happen automatically in Oracle and a ‘gather_stats’ block (which contains username, the table name, the partition granularity, etc. ) needs to be defined and then executed manually.

Oracle MAXVALUE partition

In BigQuery, this is quite straight forward and the data shuffling happens automatically in the background when partitioned column values are updated.

Also writing data to a specific partition is allowed.

For example, you might want to load historical data or adjust for time zones. You can use any valid date between ‘0001–01–01’ and ‘9999–12–31’. However, DML statements cannot reference dates prior to ‘1970–01–01’ or after ‘2159–12–31'.

So to work around this, we can take help of the __NULL__ partition in BigQuery where no high date is required.

Let me demonstrate the same with an example.

Suppose you have a table which contains both OPEN and CLOSED orders. The partition column is the order closed date and it is NULLABLE column. So, for OPEN orders, the order closed date is NULL and for CLOSED orders, it has a specific timestamp.

The sample table can be created with the code give below.

CREATE OR REPLACE TABLE
`sample_project.manual_input.all_orders_table`
PARTITION BY
DATE_TRUNC(order_closed_date, DAY) OPTIONS (
partition_expiration_days=365,
require_partition_filter =FALSE ) AS
WITH
base_cte AS(
SELECT
'2-101' AS order_num,
'OPEN' AS order_status,
CAST('2022-10-01' AS datetime) AS order_date,
CAST(NULL AS datetime) AS order_closed_date
UNION ALL
SELECT
'2-102' AS order_num,
'OPEN' AS order_status,
CAST('2022-10-01' AS datetime) AS order_date,
CAST(NULL AS datetime) AS order_closed_date
UNION ALL
SELECT
'2-103' AS order_num,
'CLOSED' AS order_status,
CAST('2022-10-01' AS datetime) AS order_date,
CAST('2023-04-15' AS datetime) AS order_closed_date
UNION ALL
SELECT
'2-104' AS order_num,
'CLOSED' AS order_status,
CAST('2022-10-01' AS datetime) AS order_date,
CAST('2023-04-10' AS datetime) AS order_closed_date )
SELECT
*
FROM
base_cte;

After we execute the above statement, the partitioned table is created in BigQuery.

all_orders_table created
Table schema

We can check the number of records in each partition using the below query.

SELECT
* EXCEPT(table_catalog)
FROM
`manual_input.INFORMATION_SCHEMA.PARTITIONS`
WHERE
table_name='all_orders_table';

We get the below screenshot as output.

INFORMATION_SCHEMA.PARTITIONS is a dictionary table which contains partition data

Note that there are two rows in the null partition. Now let us update the OPEN order rows using the below query. It updated the order closed date to current date where the order closed date is null.

UPDATE
`manual_input.all_orders_table`
SET
order_closed_date=CURRENT_DATETIME(),
order_status='CLOSED'
WHERE
order_closed_date IS NULL;

We get the below output after we query the table.

Query result

Also, if we query the INFORMATION_SCHEMA.PARTITIONS table, we get the following as output. We can see no __NULL__ partition exists, and the two rows have automatically moved to partition ID ‘20230415' which is today’s date hence proving that the shuffling is automatically done by BigQuery.

Auto shuffled rows

It’s important to note that shuffling records in partitioned table can be resource-intensive and can impact query performance. Therefore, it’s recommended to only shuffle data when necessary and to limit the amount of data being shuffled whenever possible.

--

--

Abhik Saha
Google Cloud - Community

Data Engineer @Accenture India || Writes about Bigquery, Cloud Function, GCP, SQL || LinkedIn ID: https://www.linkedin.com/in/abhik-saha-919646108/