Analytics Vidhya
Published in

Analytics Vidhya

Using NPM Library in Google BigQuery UDF

Javascript UDF’s are cool and using with NPM library is a whole new world to explore!

Originally published at https://asyncq.com

Background

One of the main reason to build ETL pipeline was to do data transformation on data before loading into the data warehouse. The only reason we were doing that because data warehouses were not capable to handle these data transformations due to several reasons such as performance and flexibility.

In the era of modern data warehouses like Google BigQuery or SnowFlake, things have changed. These data warehouses can process terabyte and petabyte data within seconds and minutes. Considering this much improvement, now performing data transformation within a data warehouse make more sense. Hence to create common transformation logic via UDF (user-defined functions). In this blog, we will see how can we utilize the power of javascript UDF and NPM library to generate data in BigQuery.

What is UDF?

From Google Cloud Documentations:

A user-defined function (UDF) lets you create a function by using a SQL expression or JavaScript code. A UDF accepts columns of input, performs actions on the input, and returns the result of those actions as a value.

You can define a UDFs as either persistent or temporary. You can reuse persistent UDFs across multiple queries, while temporary UDFs only exist in the scope of a single query

The below function is a simple example of creating a temporary UDF.

CREATE TEMP FUNCTION multiplyInputs(x FLOAT64, y FLOAT64)
RETURNS FLOAT64
LANGUAGE js AS r"""
return x*y;
""";

WITH numbers AS
(SELECT 1 AS x, 5 as y
UNION ALL
SELECT 2 AS x, 10 as y
UNION ALL
SELECT 3 as x, 15 as y)
SELECT x, y, multiplyInputs(x, y) as product
FROM numbers;

Package Existing NPM library

  • We will package credit card generator npm library using webpack.
  • Create a card-number-generator folder and install the creditcard-generator library. Create webpack config file to create webpack file.
  • We need to edit webpack.config.js and mention about entry point for the library and output path.
  • Run below command will generate webpack file at card-number-generator/dist/main.js
  • After packaging, we will upload the packaged file to the google cloud storage bucket.

Using Packaged NPM library In BigQuery UDF

  • We will use packaged npm credit card generator library in our Javascript UDF by passing it as OPTIONS parameter.
  • We will use one of the methods called GenCC(“Amex”)
  • We will also create another UDF just to generate a random customer id.
  • Now we can use the above two udfs to generate our table with customer_id and credit card number. Additionally, in order to generate N numbers of rows, we can use any public or authorized table with LIMIT.
  • In the below query, we are using bigquery public dataset to generate 100 records.
  • Let’s summarize all the information and our entire code looks like below

if you execute the above query it will generate 100 records as shown below.

Conclusion

This blog only covers the surface area and shows what’s possible when you use javascript UDF with the existing NPM library. It's a whole new world for the ELT paradigm. Please do share your experience about using UDF and NPM libraries in the comment section.

Liked this blog ? Find more @ : https://asyncq.com/
Please subscribe to my youtube channel for tech related contents.

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Recommended from Medium

Node.js Streams in Action

Xpansion Airdrop Event — Invite Playmates to receive ~350WAX

Thinking in React — React

🎉 Announcing Angular Hot Toast — The Best Angular Toast in Town

Seeing Javascript objects as functions totally changed the way I see mapping over an object!

Over The Wire — Natas 3 4 5

Why and How to Use Component Composition in React

Usefull MongoDB Queries

MongoDB Usefull Queries

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Sam

Sam

Google Cloud Certified Professional Data Engineer. Find more blogs: https://asyncq.com/

More from Medium

Working with JSON data in BigQuery

Airflow: Create Custom Operator from MySQL to PostgreSQL

Automate Your Data Warehouse with Airflow on GCP

GCP Big Query — Writing my first Python Connector