How to get your Google Analytics data with Google Analytics Report API

M. Cagri AKTAS
6 min readMar 2, 2024

--

If you need to report within your site, you may need to use Google Analytics. Also, for analyzing the rapidly growing consumption and understanding society, analyzing data is an inevitable necessity. Therefore, Google provides assistance through the Google Analytics Report API, which exists on Google Cloud. This API helps you pull all the data you see on Google Analytics.

Article Steps:

  1. Activating API on Google Cloud.
  2. Creating Service Accounts on Google Cloud.
  3. Assigning a key to the Service Account.
  4. Establishing the connection between Google Analytics and Google Analytics Reporting API through Service Account linkage.
  5. Creating a Python script.
  6. GA-Dev-Tools.
  7. Installing Airflow.

— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —

— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —

  1. Activating API on Google Cloud.

Activating the API we will use to fetch data from Google Analytics.

Google Cloud → Left Menu → APIs & Services → Library → Search Bar: Google Anayltics Reporting API → Enable

2. Google Cloud üzerinde Services Accouts oluşturma.

Google Cloud → Left Menu → IAM & Admin → Services Accounts → (+) Create Services Account → When filling out the Service Account Name section, Google automatically creates a service account. → Done

You can skip steps 2 and 3.

3. Assigning a key to the Service Account.

After creating the service account, we need to assign a key to this service account.

Google Cloud → Services Accounts → The ellipsis (…) at the end of the generated user line. → Manage Keys → Add Key → Create New Key → Json format → Create

The file is automatically downloaded. Don’t forget the key is personal, so never ever share your key because everybody can access your data, and maybe they can manipulate your data.

You must copy your googledatareportingapi@test.gserviceaccount.com because we’ll insert our service account in Google Analytics.

4. Establishing the connection between Google Analytics and Google Analytics Reporting API through Service Account linkage.

Left bottom wheel → Property access management → Right top (+) → Add users → copy your services account. → and give a role (if your data is so important, I prefer to your just use Viewer Role. → Right Top (Add)

Then you must check your Property details. You’ll see the Property ID on the top left side.

Finally, you must obtain two things: one is your .json format key, and the second one is your Property ID.

5. Creating a Python script.

Firstly, we must set up our module. But don’t forget to change your .json file name to “testmedium.json”. We created this .json file (key) in the 3rd step. Then don’t forget to change the property ID; I changed it to 12345678865, so you need to set up your property ID.

import os
import pandas as pd
import itertools
from datetime import datetime, timedelta
from google.api_core import exceptions

from google.analytics.data_v1beta import BetaAnalyticsDataClient
from google.analytics.data_v1beta.types import DateRange, Dimension, Metric, RunReportRequest

os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'testmedium.json'

client = BetaAnalyticsDataClient()

enddate = (datetime.now() - timedelta(days=60)).strftime("%Y-%m-%d")
enddate_datetime = datetime.strptime(enddate, "%Y-%m-%d")
start_date = (datetime.now() - timedelta(days=1)).strftime("%Y-%m-%d")

request_api = RunReportRequest(
property="properties/123456789123456", <<<--- Propertiy id.
dimensions=[
Dimension(name="date")
],
metrics=[
Metric(name="activeUsers"),
Metric(name="totalUsers"),
Metric(name="newUsers"),
Metric(name="active1DayUsers")
],
date_ranges=[DateRange(start_date=enddate, end_date=start_date)],
)

try:
response = client.run_report(request_api, timeout=30)
except exceptions.DeadlineExceeded:
print("Deadline exceeded.")

The second function involves creating our dataframe and fixing our column data type and format.

def query_data(api_response):
dimension_headers = [header.name for header in api_response.dimension_headers]
metric_headers = [header.name for header in api_response.metric_headers]
dimensions = []
metrics = []

for i in range(len(dimension_headers)):
dimensions.append([row.dimension_values[i].value for row in api_response.rows])

for i in range(len(metric_headers)):
metrics.append([row.metric_values[i].value for row in api_response.rows])

headers = dimension_headers, metric_headers
headers = list(itertools.chain.from_iterable(headers))
data = dimensions, metrics
data = list(itertools.chain.from_iterable(data))
df = pd.DataFrame(data)
df = df.transpose()
df.columns = headers

df['date'] = pd.to_datetime(df['date'], format='%Y%m%d')
df = df.sort_values(by=["date"], ascending=False)

df.insert(0, "date", df.pop("date"))

return df

query_data(response)

final_data = query_data(response)

### WRITE TO POSTGRESQL
def write_to_database():
import psycopg2
from sqlalchemy import create_engine

conn = psycopg2.connect(database="postgres", user='postgres', password='3541', host='localhost', port='5433')

engine = create_engine('postgresql://postgres:3541@localhost:5433/postgres')

final_data.to_sql('testdata_daily', engine, if_exists='append', index=False)

conn.close()

write_to_database()

I used the second function to write our data to PostgreSQL. We’ll use psycopg2 and sqlalchemy.

Of course, you need to adjust the database, user, password, host, and port parameters according to your own database.

Don’t forget you must create a table in your database, and if you wish, you can check my other article about how to install PostgreSQL and pgAdmin with Docker. You can read the article.

6. GA-Dev-Tools.

You can customize data retrieval according to your needs by examining the parameters of the Google Analytics Reporting API. Simply log in with your Google account through the provided link. Then, find the parameter you want and click on the box to its left. In the interface above, it will indicate whether this parameter is a Dimension or Metric, and it will tell you what name to add to the Python script. Descriptions of the parameters are available in the documentation.

When you check the documentation, you’ll find nearly 250 parameters available. By examining the most requested pages, advertisements, tags, etc., efficient analyses can be derived, especially when combined with Machine Learning techniques.

7. Installing Airflow.

After preparing the desired data, you can execute the scripts automatically every day to write the data into the database.

from airflow.operators.bash_operator import BashOperator
from datetime import datetime, timedelta
from airflow import DAG

start_date = datetime(2024, 2, 2)

default_args = {
'owner': 'cagri',
'start_date': start_date,
'retries': 1,
'retry_delay': timedelta(seconds=30)
}

with DAG('google_data', default_args=default_args, schedule_interval='@daily', catchup=False) as dag:
t0 = BashOperator(
task_id="googledata_start",
bash_command="echo 'script is starting...'",
retries=1,
retry_delay=timedelta(seconds=30)
)

t1 = BashOperator(
task_id="googledata_daily",
bash_command="docker exec spark_client bash -c 'export GOOGLE_APPLICATION_CREDENTIALS=/opt/examples/putyourjsonfilename.json && python3 /opt/examples/googledata_daily.py'",
retries=1,
retry_delay=timedelta(seconds=30)
)

t2 = BashOperator(
task_id="googledata_hourly",
bash_command="docker exec spark_client bash -c 'export GOOGLE_APPLICATION_CREDENTIALS=/opt/examples/putyourjsonfilename.json && python3 /opt/examples/googledata_hourly.py'",
retries=1,
retry_delay=timedelta(seconds=30)
)

t3 = BashOperator(
task_id="googledata_end",
bash_command="echo 'script is end...'",
retries=1,
retry_delay=timedelta(seconds=30)
)


t0 >> t1 >> t2 >> t3

Output!

— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —

— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —

Thank you for reading, if you wish you can ask me everything about articles…

Cheers in Peace ^^

--

--

No responses yet