Media CDN Custom Dashboard using BigQuery and Looker Studio

Gauravmadan
Google Cloud - Community
8 min readNov 28, 2023

Authors : Gaurav Madan and Raghav Gandotra

Introduction

Media CDN is Google Cloud’s media delivery CDN solution. Media CDN complements Cloud CDN, which is Google Cloud’s web acceleration solution. Media CDN is optimized for high-throughput egress workloads, such as streaming video and large file downloads. Media CDN uses Google’s global edge-caching infrastructure to serve your content as close to your users as possible. By using Google’s infrastructure to serve content, you can reduce load on your origin infrastructure. Media CDN lets you easily fetch content from publicly accessible HTTP endpoints. You can use Media CDN with your existing origin infrastructure, whether the content is hosted within Cloud Storage, in another cloud, or within your on-premises infrastructure. Read more about Media CDN here.

This document will guide you through the steps to deploy a Media CDN custom Dashboard in the Google Cloud Console using Log Sink, BigQuery and Looker Studio.

Pre-requisites

  1. Media CDN service is running and generating required logs. The logging enabled / disabled is a setting in Media CDN service configuration and user can also configure log sample ( 0 to 1 )
  2. In this document , we have considered the media CDN specific logs to be moved to a BigQuery bucket ( instead of default logging explorer )
  3. BigQuery dataset and Looker Studio administrator access.

Step 1 : Create BigQuery Dataset

  1. Navigate to the BigQuery screen on the google console. Under your bigquery project, click create dataset
  2. Provide dataset name (e.g. cdn_logs), Data location. Keep the remaining values default and create the dataset

Step 2 : Create Sink to route log data to BigQuery

  1. Navigate to Log Explorer -> Logs Router
  2. Click on Create Sink
  3. Create the sink by providing the following details. Configure an inclusive filter based upon your requirement

4. Above step will create a “request” table in the bigquery dataset.Ensure that your Media CDN service is running and generating logs .

5. Validate the sink by running select query on “request” table in BigQuery.

IMPORTANT: Make sure you use BigQuery “partitioned tables” to improve query performance and, thus, better control your BigQuery costs by reducing the number of bytes read by a query.

Documentation: Please refer to configuring the sink document for more information.

Step 3 : Optimizing Data Source

  1. Once the Log Sink has been created in the Log Router, ensure that your media cdn service starts generating logs and check that these logs are available in the BigQuery table. You can go to the BigQuery tab in google cloud console and check if the table (edgecache_googleapis_com_edge_cache_request) is created and logs are getting stored or not.

The schema details of BQ table may look something like following :

2. The log table consists of a number of fields of the url service. For our dashboard, we only need specific fields. The list of parameters used for our dashboard is:

  • remoteIp
  • requestTime
  • clientregioncode
  • clientcity
  • clientasn
  • requestUrl
  • userAgent

3. We will be creating additional fields for our dashboard. The additional fields are:

  • autonomous_system_organization
  • browserName
  • osName
  • hostname
  • urlPath

NOTE 1: By default, the service logs provide us information only about the client’s “ASN” and not the client’s “Autonomous System Organization”. To get information about the “Autonomous System Organization” from “ASN”, you can use a 3rd party database — MaxMind GeoIP2 Database. For demo purposes, we have used a free version — MaxMind’s GeoLite2 Free Geolocation Database.

NOTE 2: By default, the user agent details in logs come in the following format: “Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36”. To extract information of browserName and osName, parsing of userAgent has been done using JavaScript UDF (User-defined Functions).

NOTE 3: The logs provide a request url from which the hostname and urlPath is extracted using Net functions and regular expressions.

4. [Free Version Setup] Fetching Autonomous System Organization against ASN:

  • Follow this to create a free account on MaxMind to download GeoLite2 Free ASN Database
  • For this demo, csv file format was downloaded. Only ​​”GeoLite2-ASN-Blocks-IPv4.csv” is used to get the ASN and Autonomous System Organization details
  • Go to BigQuery console -> Select dataset where log table is present -> Navigation Menu -> Create Table
  • Other settings being default, click “Create Table”
  • The created table has the following fields: “network” (ip address), “autonomous_system_number” and “autonomous_system_organization” . Sample below -
  • There are multiple entries of the same “autonomous_system_number” against different “network” (IP address). For the dashboard, only “autonomous_system_number” and “autonomous_system_organization” are required. So, we create a new table to store only the relevant data
  • Open query editor and run following query:
SELECT
DISTINCT(autonomous_system_number),
autonomous_system_organization
FROM
<project_name>.<dataset_name>.client-asn-main
  • Unique entries of “autonomous_system_number” are fetched in the result against respective “autonomous_system_organization”. Save the results to a new table
  • Click on “Save Results” -> “BigQuery table” -> Select dataset where log table is present -> Give any name to the table e.g. client-asn-info . Sample shown in below snippet
  • Click on “Export” and a new table with relevant data is created. This table will be used for mapping using joins

5. [Setting up JavaScript UDFs] Parsing user agent string to fetch Browser Name and OS Name:

  • For the parsing process, a javascript code has been used from the open source repository Github — javascript function parsing user agent string and providing Browser Name and OS Name
  • Download the js file from here
  • Go to the “Cloud Storage” console. Create a Cloud Storage bucket (Follow default options)
  • Upload the downloaded js file in the newly created bucket
  • Once uploaded, you need to copy “gsutil URI” of the object which will be used in the query to create a javascript udf and reference javascript file. Example of gsutil URI can be “gs://media-cdn-ua-js/ua-parser.min.js”
  • Now, go back to the BigQuery dashboard. Create a new BigQuery dataset for storing javascript udf. Give name e.g. mediacdnudf
  • Open a new tab to run a query to create javascript udf. Run the following query:
CREATE FUNCTION mediacdnudf.parse_userAgent(ua STRING)
RETURNS STRUCT<
browserName STRING,
osName STRING
>
LANGUAGE js AS """
let uainfo = UAParser(ua);
(uainfo.browser.name == null) ? uainfo.browserName = "Other" : uainfo.browserName = uainfo.browser.name;
(uainfo.os.name == null) ? uainfo.osName = "Other" : uainfo.osName = uainfo.os.name;
return uainfo;
"""
OPTIONS (
library=”<gsutil URI of bucket object copied in above steps>”
);

Running above query will create a permanent javascript udf “parse_userAgent” in the dataset “mediacdnua”. You just need to pass the userAgent string from the logs to this udf to fetch browser name and os name . Sample below

6. Finally, run the following query to extract relevant data to create the Looker Studio dashboard:

SELECT
httpRequest.remoteIp,
FORMAT_TIMESTAMP("%c", DATETIME(receiveTimestamp, "Asia/Calcutta")) as requestTime,
jsonpayload_v1_edgecachelogentry.clientregioncode,
jsonpayload_v1_edgecachelogentry.clientcity,
jsonpayload_v1_edgecachelogentry.clientasn,
clientasninfo.autonomous_system_organization,
mediacdnudf.parse_userAgent(httpRequest.userAgent) as ua,
NET.HOST(httpRequest.requestUrl) as hostname,
REGEXP_EXTRACT(httpRequest.requestUrl, r'(?:[a-zA-Z]+://)?(?:[a-zA-Z0-9-.]+)/{1}([a-zA-Z0-9-./]+)') as urlPath
FROM
<project_name>.<dataset_name>.edgecache_googleapis_com_edge_cache_request
JOIN
<project_name>.<dataset_name>.client-asn-info AS clientasninfo
ON
CAST(jsonpayload_v1_edgecachelogentry.clientasn as INT64) = clientasninfo.autonomous_system_number
WHERE httpRequest.remoteIp IS NOT NULL

7. Create “View” from the results of the above query:

  • Click on “Save Query”. Select “Save copy as view”
  • Select the dataset where you want to create the view. Choose the same dataset where the logs are stored.
  • Give name to the view, e.g. “media-cdn-log-view”
  • Click on “Save”
  • View is created with relevant data fields

Why did we created a view :

  • This is more of a cost saving exercise
  • Our dashboard is based on a selected set of columns in BQ table and hence we picked those columns only and parked them in a view

Step 4 : Build Looker Studio Dashboard

  1. Click on the link below to view the Looker Studio Media CDN dashboard template:

https://lookerstudio.google.com/c/reporting/a2ac916d-aabc-4307-a171-5f16c4d3aa6c/preview

2. Since this is a Looker Studio dashboard template, you need to add your own data source. Click on “Use my own data” to add the data source

3. Select “BigQuery” as the data source. Select respective project, dataset and view. Click “Add” to add the data source to the dashboard template.

4. This creates a working copy of the Looker Studio dashboard template with your respective data source which you can further customize.

5. You can edit the data source settings to change data freshness parameters and other fields. In “Edit” mode, click on “Edit data source” in the Data tab (see below). Customize the data freshness and other parameters if required and click on “Done”.

If you notice the above snapshot, the freshness is 12 hours . This field can be customized to as low as 1 minute.

Alternatively, you can click navigation dots on the top-right corner to “Refresh Data” whenever you want. It will run the “View” query and fetch the latest data from the table.

IMPORTANT: Make sure the schema of the data source of the dashboard should match the template’s data source.

IMPORTANT: By default, the data refreshness is set to 12 hours. The user can change this parameter as per the requirements and more frequent updates can give you fresher data, but may also slow performance and increase query costs for paid data services like BigQuery in this case.

IMPORTANT: In the Looker Studio free version, the option of “Auto Refresh” is not available at this point. The “Auto Refresh” option is available for Looker Studio Pro version users only. Check here for more information.

--

--