Using BigQuery and Looker Studio with Google Analytics 4
The most difficult part of migrating from Universal Analytics to Google Analytics 4 is the reporting. GA4 does not have many of the same built-in reports that Marketers and Analysts have come to expect from previous versions of Google Analytics, and it can be difficult to replicate them.
However, with a little practice, you can create just about any data visualization you want by using a combination of Google Analytics 4 + BigQuery + Looker Studio. The goal of this post is to share tips and best practices for visualizing your GA4 data in BigQuery and Looker Studio, and give you a few samples to help get started.
Setting Up the Data Flows
Before we begin you will need to enable the BigQuery integration to export your raw data from Firebase Analytics (which is where Google Analytics 4 data is stored) into a BigQuery table daily. Once this is done you will be able to use BigQuery to run SQL queries on your raw data, or connect Looker Studio to BigQuery (Google’s free data visualization tool) to create reports and dashboards. Your data flow will work like this:
Getting Familiar with Google Analytics 4 Data in BigQuery
The GA4 data stored in BigQuery is not completely raw (some processing has been done on it), but it’s helpful to think of it as a raw and unsampled list of events. Google makes two key modifications to the data to protect user privacy before it lands in your BigQuery table:
- The IP address is removed, but geographic information has been added
- The user agent is removed, but device information is added
Each day Google generates a table with data from the prior day, and each row in the table represents an event. Now here's the part that confuses beginners: each event (or row) in your table is in a JSON format (this is true of Crashlytics data too). Here is a simplified example of a single screen_view event in BigQuery:
event_date | event_name | event_params.key | event_params.value.string_value |
---|---|---|---|
20200101 | screen_view | firebase_screen_class | product_detail_page_12345 |
firebase_previous_class | search_results_page |
You’ll notice that there is no “screen name” column, but there is a parameter called “firebase_screen_class” that stores the screen name. There’s also another parameter called “firebase_previous_class” that stores the prior screen name. These parameters and their values are stored in the columns “event_params.key” and “event_params.value.string_value”. Here’s what the data actually looks like when you export it from BigQuery.
We have multiple values for the “event_params…” columns because the data is actually an array. If this concept is new to you then you will definitely need to read my post on How to Flatten a BigQuery Table with UNNEST.
Getting Familiar with Google Analytics 4 Data in Looker Studio
It was simple and intuitive to build dashboards in Looker Studio with the old Google Analytics API because you had a pre-built list of 513 metrics and dimensions to choose from.
With Google Analytics 4 you have two options:
- Use the Google Analytics connector that is built into Looker Studio (which relies on the new GA4 reporting API)
- Or connect directly to your exported data in BigQuery.
If you choose option 1 (the GA4 connector) your reports are easier to pull, but you are limited to the 146 fields available in the API.
If you choose option 2 (BigQuery), you'll notice that your data is more raw, and you’ll have to transition to using events, parameter names and parameter values. This has a bit of a learning curve, but when you get use to it will give you unlimited flexibility on how to pull your data.
As an example, here is how you would pull a list of pageviews by URL from your web stream data in BigQuery:
- Dimension: “Event Param Value (string)”
- Metric: “Event Count”
- Filter: Event Name = “page_view”
- Filter: Event Param Name = “page_location”
The resulting chart will look like the image below:
Beware: Data in BigQuery Does Not Match the User Interface
Don't be alarmed if you see different values for metrics when pulled from BigQuery because this is by design. BigQuery is representing the data you collected, and is not applying a variety of modifications that are included when using the user interface. Google has a great article about why HERE, but the primary differences are:
- The UI can apply Google Signals to identify users, but this is not available in BigQuery.
- The UI can apply behavioral and conversion modeling to fill gaps in the data caused by cookie restrictions or declined user consent. This will impact a whole host of metrics, including traffic attribution and metrics such as "active users".
- The UI uses an algorithm called HyperLogLog++ to estimate metrics with low processing requirements, and this is not necessary in BigQuery where the processing costs are covered by you and not Google (READ MORE).
- Data collection delays (mainly in Firebase and when yo uare using Measurement Protocol) can cause data to be updated up to 72 hours after it is originally collected (READ MORE).
- Cardinality limits can cause reports in the UI to create the "other" label that is not applied in BigQuery (READ MORE).
Resource Library
Looker Studio Samples
I've compiled a series of reporting examples in a public Looker Studio report that anyone can view and copy.
It can be difficult to explore the events that are being recorded and their parameters in the GA4 User Interface. This may change with future updates, but for now the best way to do this is with Looker Studio.
If you are new to using Looker Studio with Firebase data, start with The Event & Parameter Explorer.
Any time I build a Looker Studio dashboard using GA4, I start by copying this report to help me drill down into each event to view the parameters and their values. This helps you visualize how your data is organized, and also make sure that you spell event and parameter names correctly in your filters.
BigQuery + Looker Studio Posts
I've compiled a list of posts below for reporting on GA4 data in BigQuery and Looker Studio. This list will grow over time.
Like always, please contact me if you find any problems with these resources or if you would like to contribute new items for me to add.
- Cross-Stream First-Touch Attribution with Google Analytics 4 and BigQuery
- How the WITH Clause Can Simplify Your SQL Queries
- BigQuery + GA4: How To Get the First or Most Recent Event for a User
- How to Use the Over clause to Create a Moving Average in BigQuery
- Looker Studio + BigQuery + GA4: Viewing a Parameter Across Multiple Events
- Looker Studio + BigQuery + GA4: How to View Multiple Parameters on the Same Event
- How to Flatten a BigQuery Table with UNNEST