How the WITH Clause Can Simplify Your SQL Queries

This is a quick post to highlight how you can simplify your SQL using the WITH clause. I’ve used this technique a lot lately to prepare queries for clients that are easy to understand and work with (the queries, not necessarily the clients).

Benefits of using the WITH Clause

The WITH clause allows you to reduce joins and logically sequence your code. It does this by creating temporary tables (technically they are views) that are usable by your main SQL statement so that you can break your code into smaller and easier to understand snippets.

Syntax of the WITH Clause

NOTE: As always I am working with Google Analytics 4 data in BigQuery for this post, but the following will be applicable to many other database systems.

The WITH clause is a prefix for your main query. You can use it to create one or more SELECT statements, give each a name, and then reference them in your main query. As an example, here is how you might pull the unique ID’s of users who visited your site yesterday and then came back again today:

# Step 1: Get a list of users who visited the site yesterday
WITH yesterday AS (
    SELECT 
        user_pseudo_id AS yesterday_user_pseudo_id
    FROM `<project name>.<dataset name>.events_08012021`
)

# Step 2: Get a list of users who also visited the site today
SELECT 
    user_pseudo_id
FROM `<project name>.<dataset name>.events_09012021`,yesterday
WHERE 
    yesterday.yesterday_user_pseudo_id = user_pseudo_id

In the example above I’m defining a temporary table with the name “yesterday”. You could also put a comma at the end to add other temporary tables if you choose, like this:

WITH yesterday AS (
    SELECT 
        user_pseudo_id AS yesterday_user_pseudo_id
    FROM `<project name>.<dataset name>.events_08012021`
), last_week AS (
    SELECT 
        user_pseudo_id AS yesterday_user_pseudo_id
    FROM `<project name>.<dataset name>.events_02012021`
)

Performance

Typically the WITH clause is not executed until it is referenced, and the results are cached in case the reference is used multiple times within the query. However, I have not confirmed whether or not BigQuery operates this way (I’ll update this post when I do).

Examples and Use Cases

As I publish samples that utilize the WITH clause I will link them here. Additionally, feel free to add any other links to the comments and I will move them here as well:

  • Coming soon…

Other Resources for BigQuery and GA4

Here are a few other posts that I’ve created for using BigQuery with data from Google Analytics 4:

Leave a Reply

Your email address will not be published.