How to filter your data in GA4 (Google Analytics 4) Using Python

Mayuran
3 min readDec 29, 2023

--

In this article, we will explore how to apply various filters when extracting data from GA4 using Python. This process aids in reducing the amount of data that needs to be extracted, allowing us to add conditions based on the specific data we require.

There are four different types of filters available for applying during data extraction: stringFilter, inListFilter, numericFilter, and betweenFilter. Depending on your use case, you can employ these filters in the appropriate format. These filters can be applied to both dimensions and metrics. Below, we provide examples of how to use these filters.

Example 1: Extract data using the stringFilter for a specific dimension. In this case, we are applying a filter to the sessionDefaultChannelGroup Dimension where the value is ‘Paid Social’.”

from google.analytics.data_v1beta.types import Filter, FilterExpression

def run_report_with_filter(property_id="YOUR-GA4-PROPERTY-ID"):

client = BetaAnalyticsDataClient()

request = RunReportRequest(
property=f"properties/{property_id}",
dimensions=[Dimension(name="sessionDefaultChannelGroup"), Dimension(name="sessionSourceMedium"),
Dimension(name="sessionCampaignName"), Dimension(name="date"),],
metrics=[Metric(name="eventCount"), Metric(name="sessions")],
date_ranges=[DateRange(start_date="yesterday", end_date="today")],
dimension_filter=FilterExpression(
filter=Filter(
field_name="sessionDefaultChannelGroup",
string_filter=Filter.StringFilter(value="Paid Social"),
)
),
)
response = client.run_report(request)
print_run_report_response(response)



if __name__ == "__main__":

Example 2: Extract data using the inListFilter for one dimension. Here, we are applying a filter for the eventName Dimension with values such as 'page_view', 'purchase', and 'view_item'.

from google.analytics.data_v1beta.types import Filter, FilterExpression

def run_report_with_filter(property_id="YOUR-GA4-PROPERTY-ID"):

client = BetaAnalyticsDataClient()

request = RunReportRequest(
property=f"properties/{property_id}",
dimensions=[Dimension(name="sessionDefaultChannelGroup"), Dimension(name="sessionSourceMedium"),
Dimension(name="eventName"), Dimension(name="date"),],
metrics=[Metric(name="eventCount"), Metric(name="sessions")],
date_ranges=[DateRange(start_date="yesterday", end_date="today")],
dimension_filter=FilterExpression(
filter=Filter(
field_name="eventName",
in_list_filter=Filter.InListFilter(
values=[
"page_view",
"purchase",
"view_item",
]
),
)
),
)
response = client.run_report(request)
print_run_report_response(response)

if __name__ == "__main__":

Example 3: Here’s how to use multiple dimension filters with a joined and_group expression. We’re applying a filter to the sessionDefaultChannelGroup dimension where the value is ‘Organic Search’, and to the eventName dimension where the value is ‘next_button_click’. Depending on the use case, an or_group also can be utilized in the filter condition. And here we imported FilterExpressionList library also in this purpose.

from google.analytics.data_v1beta.types import Filter, FilterExpression, 
FilterExpressionList

def run_report_with_filter(property_id="YOUR-GA4-PROPERTY-ID"):

client = BetaAnalyticsDataClient()

request = RunReportRequest(
property=f"properties/{property_id}",
dimensions=[Dimension(name="sessionDefaultChannelGroup"), Dimension(name="sessionSourceMedium"),
Dimension(name="eventName"), Dimension(name="date"),],
metrics=[Metric(name="eventCount"), Metric(name="sessions")],
date_ranges=[DateRange(start_date="yesterday", end_date="today")],
dimension_filter=FilterExpression(
and_group=FilterExpressionList(
expressions=[
FilterExpression(
filter=Filter(
field_name="sessionDefaultChannelGroup",
string_filter=Filter.StringFilter(value="Organic Search"),
)
),
FilterExpression(
filter=Filter(
field_name="eventName",
string_filter=Filter.StringFilter(value="next_button_click"),
)
),
]
)
),
)
response = client.run_report(request)
print_run_report_response(response)

if __name__ == "__main__":

Example 4 : Here’s how to use both dimension and metric filters. In this example, we’ll employ a numericFilter and a StringFilter. We’re applying a filter to the dimension eventName, which should exactly match the value ‘next_button_click’. Additionally, the metric filter is set to include only users whose sessions metric value is greater than 1000.

from google.analytics.data_v1beta.types import Filter, FilterExpression

def run_report_with_filter(property_id="YOUR-GA4-PROPERTY-ID"):

client = BetaAnalyticsDataClient()

request = RunReportRequest(
property=f"properties/{property_id}",
dimensions=[Dimension(name="sessionDefaultChannelGroup"), Dimension(name="sessionSourceMedium"),
Dimension(name="sessionCampaignName"), Dimension(name="date"),],
metrics=[Metric(name="eventCount"), Metric(name="sessions")],
date_ranges=[DateRange(start_date="yesterday", end_date="today")],
dimension_filter=FilterExpression(
filter=Filter(
field_name="eventName",
string_filter=Filter.StringFilter(
match_type=Filter.StringFilter.MatchType.EXACT,
value="next_button_click"),
)
),
metric_filter=FilterExpression(
filter=Filter(
field_name="sessions",
numeric_filter=Filter.NumericFilter(
operation=Filter.NumericFilter.Operation.GREATER_THAN,
value=NumericValue(int64_value=1000),
),
)
),
)
response = client.run_report(request)
print_run_report_response(response)

if __name__ == "__main__":

In the above use case, we utilized an additional parameter named match_type in StringFilter. This parameter defines various types of match_type, such as:

  • EXACT: Ensures an exact match with the string value.
  • BEGINS_WITH: Matches if the string value begins with the specified characters.
  • ENDS_WITH: Matches if the string value ends with the specified characters.
  • CONTAINS: Checks if the string value contains the specified characters.
  • FULL_REGEXP: Allows a full match based on the regular expression with the string value.
  • PARTIAL_REGEXP: Enables a partial match based on the regular expression with the string value.

In NumericFilter, we employed a parameter called operation. This parameter allows the use of different operations, including:

  • EQUAL
  • LESS_THAN
  • LESS_THAN_OR_EQUAL
  • GREATER_THAN
  • GREATER_THAN_OR_EQUAL

Example 5: Extract data using the stringFilter for one dimension. Here, we apply a filter for sessionDefaultChannelGroup Dimension, where the value is NOT equal to ‘Paid Social’. For this purpose we used not_expression in the code.

from google.analytics.data_v1beta.types import Filter, FilterExpression

def run_report_with_filter(property_id="YOUR-GA4-PROPERTY-ID"):

client = BetaAnalyticsDataClient()

request = RunReportRequest(
property=f"properties/{property_id}",
dimensions=[Dimension(name="sessionDefaultChannelGroup"), Dimension(name="sessionSourceMedium"),
Dimension(name="sessionCampaignName"), Dimension(name="date"),],
metrics=[Metric(name="eventCount"), Metric(name="sessions")],
date_ranges=[DateRange(start_date="yesterday", end_date="today")],
dimension_filter=FilterExpression(
not_expression=FilterExpression(
filter=Filter(
field_name="sessionDefaultChannelGroup",
string_filter=Filter.StringFilter(value="Paid Social"),
)
)
),
)
response = client.run_report(request)
print_run_report_response(response)

if __name__ == "__main__":

I hope the guide was helpful. Please let me know in the case of any questions or comments and have a nice day!

Reference : https://developers.google.com/analytics/devguides/reporting/data/v1/rest/v1beta/FilterExpression#FilterExpressionList

--

--