BigQuery create partitioned table and load CSV with Apps Script

Stéphane Giron
4 min readJan 2, 2020

--

BigQuery is the famous Google database to store huge amount of data and query them fastly. For a recent a project I had to use it for storing some logs so here the code used with Google Apps Script to create the table and load a csv file from Google Drive.

Before to start

In the BigQuery world to begin you will need a Google Cloud Platform project and to activate the ‘BigQuery API’ + the Drive API for our code..

Then to store your data you will need to create a ‘Dataset’ and a ‘Table’. We will cover that in the next parts.

Important, BigQuery is made to store data and not to be able to modify it, it means you can’t delete or modify data (this is not completely true but better to have that in mind). If you want to modify data it is better to delete the table and recreate it.

That is why partitioned table is interesting, it is a kind of virtual table. You load your data by specifying a partition and if you need to upload again your data you just have to delete the partitioned table and not the whole table.

In your apps script, use this appsscript.json file :

{
"timeZone": "Europe/Paris",
"dependencies": {
"enabledAdvancedServices": [{
"userSymbol": "Drive",
"serviceId": "drive",
"version": "v2"
}, {
"userSymbol": "BigQuery",
"serviceId": "bigquery",
"version": "v2"
}]
},
"exceptionLogging": "STACKDRIVER",
"oauthScopes": ["https://www.googleapis.com/auth/script.scriptapp", "https://www.googleapis.com/auth/bigquery", "https://www.googleapis.com/auth/drive", "https://www.googleapis.com/auth/script.external_request", "https://www.googleapis.com/auth/spreadsheets"
}

Create BigQuery Dataset and Table with Apps Script

For our example we will partition our table using Date (ref). Fill out global variable with your information.

// Replace values with your data.
var projectId = 'ID_PROJECT_GCP'; //Your Project ID
var datasetId = 'new_dataset'; //Don't use - but _
var datasetDescription = 'My new dataset';
var tableId = 'new_dataset_entries'; //Don't use - but _
var tableName = 'My Table';
var tableDescription = 'Partitioned table with all my data.';
// Execute code to create table
function createTable() {
var tableFields = [{"description": "Unique Id","name": "id","type": "String","mode": "Required"},
{"description": "Date event","name": "date","type": "Date","mode": "Required"} ,
{"description": "Type of data","name": "type","type": "String","mode": "Nullable"}];

// First we create Dataset
var dataset = BigQuery.newDataset();
dataset.setDatasetReference( BigQuery.newDatasetReference()
.setDatasetId(datasetId)
.setProjectId(projectId))
.setDescription(datasetDescription);

BigQuery.Datasets.insert(dataset,projectId);

// Second we create Table
var table = BigQuery.newTable()
.setDescription(tableDescription)
.setId(tableId)
.setFriendlyName(tableName)
.setTimePartitioning({"type": "DAY"}) // Param to partition
.setTableReference(
BigQuery.newTableReference()
.setDatasetId(datasetId)
.setProjectId(projectId)
.setTableId(tableId)
).setSchema(
BigQuery.newTableSchema().setFields(tableFields)
);

try {
BigQuery.Tables.insert(table, projectId, datasetId)
Logger.log('BigQuery Table created');
} catch(e) {
Logger.log('Error to create table : ' + e.message );
}
}

In this example we create a simple table with 3 columns. 2 are ‘Required’ and last one is ‘Nullable’, it can be empty.

For the date column be careful because in BigQuery your have the Date and Timestamp format. If you want to store date time values you have to select Timestamp.

Partitioned table created with Apps Script

Load your csv to BigQuery with Apps Script

For this example our data is in a Google Sheets, so we will get the sheets as CSV and we will load the file to BigQuery. Be careful of the date format in the sheets, must be YYYY-MM-DD for date.

function loadCsvFile(){
var partition = getDollarsPartition();
var urlSheets = 'https://docs.google.com/spreadsheets/d/SHEET_ID/edit#gid=0';
var split = urlSheets.split('/');
var fileId = split[split.length-2];
Logger.log(fileId)
var file = Drive.Files.get(fileId);
var url_csv = file.exportLinks['text/csv'];
var req = UrlFetchApp.fetch(url_csv, {
method: "GET",
headers: {
Authorization: 'Bearer ' + ScriptApp.getOAuthToken()
}
});
var blob = req.getBlob().setContentType('application/octet-stream');
var job = {
configuration: {
load: {
destinationTable: {
projectId: projectId,
datasetId: datasetId,
tableId: tableId+partition
},
skipLeadingRows: 1
}
}
};
job = BigQuery.Jobs.insert(job, projectId, blob);

var jobId = job.jobReference.jobId;

var sleepTimeMs = 100;
var queryResults = BigQuery.Jobs.get(projectId, jobId);
while (queryResults.status.state != 'DONE') {
Utilities.sleep(sleepTimeMs);
sleepTimeMs *= 2;
Logger.log('Wait ' + sleepTimeMs + ' ms.');
queryResults = BigQuery.Jobs.get(projectId, jobId);
Logger.log(queryResults.status.state)
}

if(queryResults.status.errorResult){
var errors = queryResults.status.errors;
errors.forEach(function (error){
Logger.log('**** ERROR ****')
Logger.log(JSON.stringify(error))
})
}else{
Logger.log('Success !!')
}

}
function getDollarsPartition(){
var date = new Date();
var txt = date.getFullYear().toString()+zero(date.getMonth() + 1).toString()+zero(date.getDate()).toString()
return '$'+txt
}
function zero(val){
if(val < 10){return '0'+val;}
return val;
}

We have the function getDollarsPartition() which will return the date in the appropriate format to load csv in the partitioned table.

ID of our table is new_dataset_entries, as we use partitioned date table, we have to load the csv in the table ID new_dataset_entries$partition_date. Our program will run daily so each day we will load the data in the table new_dataset_entries$YYYYMMDD, the November 23rd of 2019 it will be new_dataset_entries$20191123.

In the big query interface the partition is completely transparent. In the future if a rework is needed, we will be able to delete data loaded at a specific day and re-load only this day without affecting the whole database.

Delete a partitioned table with Apps Script

It is quite simple :-), here the code.

function deletePartiotionedTable(){
var partition = getDollarsPartition() ; // return the curent day
var url = 'https://bigquery.googleapis.com/bigquery/v2/projects/'+projectId+'/datasets/'+datasetId+'/tables/'+tableId+partition
var response = UrlFetchApp.fetch(url, {
method: "DELETE",
headers: {
Authorization: 'Bearer ' + ScriptApp.getOAuthToken()
}
});

var result = JSON.parse(response.getContentText());
if(result == ''){
Logger.log('Delete success !')
}else{
Logger.log(JSON.stringify(result))
}
}

Get the code

You can get the full code on GitHub : link

--

--