Supercharge BigQuery with BigFunctions
Framework to build a governed catalog of powerful BigQuery functions
BigQuery is Google’s Petabyte-scale data warehouse that lets you analyze massive amounts of data quickly and easily, without needing to manage your own servers.
BigFunctions is a framework to build a governed catalog of powerful BigQuery functions, including 100+ open-source functions to supercharge BigQuery that you can call directly (no install) or redeploy in your catalog.
In Part 1 of this series, we’ll explore BigFunctions and how to easily get started calling Open Source BigFunctions directly from BigQuery.
What is/are BigFunctions?
BigFunctions is an opensource framework that allows organizations to build a governed catalog of BigQuery functions — also known as UDFs or User Defined Functions. UDFs are functions you create yourself within BigQuery using SQL or Javascript to extend its functionality for your specific needs.
Why create functions?
- Extend functionality: BigQuery offers a lot of built-in functions, but there might be cases where you need something specific that isn’t available. UDFs let you create your own custom logic to manipulate data in the way you need. For instance, you could create a UDF to calculate a specific industry metric not covered by a built-in function.
- Improve code reusability: UDFs can be reused throughout your BigQuery queries, saving you time and effort. Instead of writing the same complex calculation multiple times, you can define it once as a UDF and then call it wherever needed. This makes your code more concise and easier to maintain.
- Promote consistency: UDFs can help ensure consistent application of business logic across your data analysis. By creating a central UDF for a specific calculation, you can guarantee everyone is using the same definition. This can be especially important in large teams or organizations.
Why BigFunctions?
BigFunctions provides the framework build and manage a governed catalog of UDFs at scale. BigFunctions provides something different for different types of users. According to the BigFunctions GitHub project, BigFunctions provides something for each of the follow user groups:
As a data-analyst
You’ll have new powers! (such as loading data from any source or activating your data through reverse ETL).
As an analytics-engineer
You’ll feel at home with BigFunctions style which imitates the one of dbt (with a yaml standard and a CLI). You’ll love the idea of getting more things done through SQL.
As a data-engineer
You’ll easily build software-engineering best practices through unit testing, cicd, pull request validation, continuous deployment, etc. You will love avoiding reinventing the wheel by using functions already developed by the community.
As a central data-team player in a large company
You’ll be proud of providing a governed catalog of curated functions to your 10000+ employees with mutualized and maintainable effort.
As a security champion
You will enjoy the ability to validate the code of functions before deployment thanks to your git validation workflow, CI Testing, binary authorization, etc.
As an open-source lover
You’ll be able to contribute so that a problem solved for you is solved for everyone.
Reposted from the BigFunctions GitHub project: https://unytics.io/bigfunctions/#2-why-bigfunctions
Using BigFunctions
BigFunctions consists of over 100 Open Source functions that can be called directly from BigQuery without hosting any additional code or services. In addition, BigFunctions provides a framework to download and deploy functions in your own Google Cloud environment, enabling a governed catalog of curated functions that conform to your organization’s requirements.
In this article, we’ll explore how to call the Open Source functions directly from SQL. We’ll explore creating your own BigFunctions catalog in a future article.
Calling BigFunctions from BigQuery in your Google Cloud project
The BigFunctions project hosts ready-to-use Open Source functions in public datasets for each of the BigQuery region and multi-regions. Let’s try a query that calls a function called faker in the US multi-region. The faker function generates fake data of type what
and localized with locale
parameter (using faker python library)
faker(what, locale)
In BigQuery Studio, issue the following query:
select bigfunctions.us.faker("name", "en_US") as fake_name
Let’s try another query with the is_email_valid function. The functions will return true
when a valid email is provided.
is_email_valid(email)
In BigQuery Studio, issue the following query:
select bigfunctions.us.is_email_valid('jake@cloudjake.com')
Explore ready-to-use Open Source functions
Currently, there are over 100 ready-to-use Open Source functions in the BigFunctions project. Since anyone can propose a new function, this number grows every day. To explore the available functions, check out the Catalog of Open Source functions:
- 🧠 AI
- 💬 Notify
- 🛢 Get data
- 🚀 Export
- 1️⃣ Transform numeric
- ✨ Transform string
- 🌐 Transform geo data
- 📆 Transform date
- {…} Transform json
- […] Transform array
- 🧠 Machine learning
- 🌐 Graph
- 🔨 Convert data format
- 👀 Explore
- 🔨 Utils
Star the BigFunctions project
For quick access to the growing list of BigFunctions directly in BigQuery, we can add the BigFunctions project to our Explorer view in BigQuery Studio by adding it as starred project.
In BigQuery Studio, click the +ADD link in the top right of the Explorer pane.
In the Add pane, scroll down and click Star a project by name
Enter the project name bigfunctions and click star
You should now see the project bigfunctions in the BigQuery Studio explorer pane as a starred project.
Expand the bigfunctions project to expose the dataset for each region/multi-region. Within each dataset, you can browse the full list of ready-to-use Open Source functions.
Summary
In this article, we explored the BigFunctions project and how to leverage BigFunctions to supercharge BigQuery with over 100 ready-to-use Open Source functions.
In the next part of this series, we’ll explore how to leverage BigFunctions in your Google Cloud project to build a governed catalog of powerful BigQuery functions for your organization!
Reference
BigFunctions Website — https://unytics.io/bigfunctions/
BigFunctions GitHub — https://github.com/unytics/bigfunctions