"Creating a 'MAXVALUE' Partition of an Oracle Table in BigQuery"

Published in

Google Cloud - Community

4 min readApr 15, 2023

One frequent problem while migrating data from Oracle Datawarehouse to GCP BigQuery is the movement of partitioned tables.

Now in Oracle, there is a type of range partition known as the Maxvalue partition which stores the records from the table which do not fall under any partition defined either during the table creation or when new partitions are added.

There is also a limitation in Oracle that a partition column cannot be NULL. So to overcome the issue, the developers put a default high date like ‘9999–12–31 23:59:59’ in that column so that the data stays in the high partitions(Maxvalue).

During a package/procedure trigger, when the partition column value is updated to a lower date (such as SYSDATE), the records are supposed to move to lower available partition if the partition for that specific date exists. This shuffling does not happen automatically in Oracle and a ‘gather_stats’ block (which contains username, the table name, the partition granularity, etc. ) needs to be defined and then executed manually.

In BigQuery, this is quite straight forward and the data shuffling happens automatically in the background when partitioned column values are updated.

Also writing data to a specific partition is allowed.

For example, you might want to load historical data or adjust for time zones. You can use any valid date between ‘0001–01–01’ and ‘9999–12–31’. However, DML statements cannot reference dates prior to ‘1970–01–01’ or after ‘2159–12–31'.

So to work around this, we can take help of the __NULL__ partition in BigQuery where no high date is required.

Let me demonstrate the same with an example.

Suppose you have a table which contains both OPEN and CLOSED orders. The partition column is the order closed date and it is NULLABLE column. So, for OPEN orders, the order closed date is NULL and for CLOSED orders, it has a specific timestamp.

The sample table can be created with the code give below.

CREATE OR REPLACE TABLE
  `sample_project.manual_input.all_orders_table`
PARTITION BY
  DATE_TRUNC(order_closed_date, DAY) OPTIONS ( 
    partition_expiration_days=365,
    require_partition_filter =FALSE ) AS
WITH
  base_cte AS(
  SELECT
    '2-101' AS order_num,
    'OPEN' AS order_status,
    CAST('2022-10-01' AS datetime) AS order_date,
    CAST(NULL AS datetime) AS order_closed_date
  UNION ALL
  SELECT
    '2-102' AS order_num,
    'OPEN' AS order_status,
    CAST('2022-10-01' AS datetime) AS order_date,
    CAST(NULL AS datetime) AS order_closed_date
  UNION ALL
  SELECT
    '2-103' AS order_num,
    'CLOSED' AS order_status,
    CAST('2022-10-01' AS datetime) AS order_date,
    CAST('2023-04-15' AS datetime) AS order_closed_date
  UNION ALL
  SELECT
    '2-104' AS order_num,
    'CLOSED' AS order_status,
    CAST('2022-10-01' AS datetime) AS order_date,
    CAST('2023-04-10' AS datetime) AS order_closed_date )
SELECT
  *
FROM
  base_cte;

After we execute the above statement, the partitioned table is created in BigQuery.

We can check the number of records in each partition using the below query.

SELECT
  * EXCEPT(table_catalog)
FROM
  `manual_input.INFORMATION_SCHEMA.PARTITIONS`
WHERE
  table_name='all_orders_table';

We get the below screenshot as output.

INFORMATION_SCHEMA.PARTITIONS is a dictionary table which contains partition data

Note that there are two rows in the null partition. Now let us update the OPEN order rows using the below query. It updated the order closed date to current date where the order closed date is null.

UPDATE
  `manual_input.all_orders_table`
SET
  order_closed_date=CURRENT_DATETIME(),
  order_status='CLOSED'
WHERE
  order_closed_date IS NULL;

We get the below output after we query the table.

Also, if we query the INFORMATION_SCHEMA.PARTITIONS table, we get the following as output. We can see no __NULL__ partition exists, and the two rows have automatically moved to partition ID ‘20230415' which is today’s date hence proving that the shuffling is automatically done by BigQuery.

It’s important to note that shuffling records in partitioned table can be resource-intensive and can impact query performance. Therefore, it’s recommended to only shuffle data when necessary and to limit the amount of data being shuffled whenever possible.

"Creating a 'MAXVALUE' Partition of an Oracle Table in BigQuery"

Written by Abhik Saha