Understanding the User ID in Google Analytics 4
At first glance the User ID seems simple enough, but it can get a bit complex when you start pulling reports. The purpose of this post is to explain how the User ID is used to generate reports. If you need help installing the User ID and making sure that it works properly, take a look at my post on Properly Setting the User ID.
User ID in Summary
To explain how the User ID works in Google Analytics 4, I’m going to break it down into two categories: data collection and reporting identity.
On the client side, data is persisted differently in a mobile app than it is on a website. On a website you must explicitly set the User ID with every event that fires, but in a mobile app the User ID automatically persists after it is set once. In this way, mobile apps treat the User ID like a user-scoped custom dimension which will continue to fire with all events after it has been set.
In Google Analytics 4, "Reporting Identity" refers to the user identifier(s) that are used to generate reports. You have an option to configure this under your property settings (I've written more about what this means HERE), but for this article I am assuming that you have NOT selected "By device only".
All events that are generated for a user (with a User ID or not) will include a "user pseudo ID" (sometimes this is displayed as the "Device ID" or "App-Instance ID" in reports). On the web, this is supplied by a first-party cookie, and was known as the "client ID" in previous versions of Google Analytics. For Android and iOS apps, this is set to the App-Instance ID.
If the user lands on a website unauthenticated, and then authenticates and begins passing a User ID in a later event, Google Analytics 4 will use the “User Pseudo ID” to attribute the User ID to previous events where the User ID was not set.
In the example above, the User Explorer report will only display one “App-instance ID” (which is the User Pseudo ID) set to “12345” for both of these events. BUT, if more events are detected by the same User Pseudo ID without the User ID set then these will not be attributed to the User ID.
In this example, the User Explorer report will show you two App-instance IDs: “12345”, and “abc”. This next part can be counterintuitive: The first two events can be attributed to both of these App-instance IDs. If you drill down into “12345” you will see two events, but if you drill down into “abc” you will see all 3 events.
Using the User ID Across Sessions & Platforms
Let’s walk through a more detailed example. Say that a user follows these steps to create 4 events over 3 unique sessions:
The user opens your iOS app and views the home screen without being authenticated.
The user authenticates and views a second screen (User ID is set to “ItsMyNewDevice”).
The user closes the app and returns 2 hours later without authenticating again (no User ID is set).
Finally, the same user opens your website where she is already authenticated and views the homepage (User ID is set again).
Here’s an approximation of what these events will look like in BigQuery:
|Event Number||Session Number||Event Name||Platform||User_ID||User Pseudo ID|
There are two things to notice in this chart:
The User Pseudo ID is different on each device. Again, you can learn more about the various identifiers in my post on Setting the User ID.
The User ID will persist across sessions on a mobile app automatically. As you recall, the User ID was not set in event #3, but it still appears in the data because it persisted on the mobile device. My recommended best practice is still to set the user ID once per authenticated session, but as long as the data is not deleted the User ID will actually persist in this way.
So How Many Users will the Example Scenario Create in my Reports?
The standard reports will all show 1 single user. You can verify this by creating a test property and firing only those example events for a single date.
If you open a "User Explorer" report, you will see one single App-Instance ID listed.
I always recommend copying the user_id to a user scoped custom dimension called "uid" when it is available so that you can use it as a dimension in the exploration reports. If you do this and then click on the App-instance ID, you will see a list of events generated by this user. The "uid" User Property will tell you if the user_id was set with this event. You can see in the screenshot below that it was not set on my initial events.
So, this confirms that Google Analytics 4 was smart enough to attribute the unauthenticated events to the correct user_id, despite the fact that the user_id was not set with the first few events.
You can also apply "UID" as a dimension to the User Explorer view. In the screenshot below you can see that during my test I sent 6 events without the user_id, and then 3 with the user_id in a single session. The "App-Instance ID" was correctly backwards applied to the first 6 events as we saw before, but you can also see that only 1 session was counted.
If you replicate this test you might be tempted to create a filter for your user_id to find yourself, but you cannot do this. This would filter out all events where the User ID was Null, and they will not appear in the detailed User Activity report.