Google Signals and Privacy in Google Analytics 4
Digital Analytics platforms have been around for about 15 years now, and in early 2020 they all basically share the same functionality and features. However, Google has been building an ace up their sleeve since 2004 that the others cannot replicate: 1.8 billion gmail accounts.
With the integration between Google Signals and Google Analytics 4, all of the reports in your GA4 property can leverage Google’s ability to identify users who visit your website or app multiple times from different devices (as long as they have enabled ad personalization)… even if they are not logged in.
Google Signals has been around since 2018, but this announcement is a very big deal for two reasons:
- Analysts can leverage this functionality in all reports, whereas it previously only applied to a few pre-built reports.
- This announcement reveals Google’s plan to build an Analytics product that respects the privacy of users who choose it, while using machine learning to infer how these users are likely behaving by observing the behavior of those users who opt-in to personalization.
In this post I will explain what Google Signals is, how it integrates with Google Analytics 4, and what this means for privacy.
What is Google Signals?
Unless a friend printed this blog post and faxed it to you, there is a good chance that you’re logged in to Google right now. Gmail and Google Chrome are outrageously popular. To give you an idea, Chrome makes up 66% of the web browser market (source) and about ¼ of humans have a gmail account (source).
As a result, Google is very good at recognizing you across all of your devices, and you benefit from this in a variety of ways. If you’re like me, you might use your Google account to log in to a variety of websites. You probably also benefit from a seamless experience across mobile and desktop devices when you use Google products. And of course, Google uses this to target ads that are personalized to your interests and demographics (which is arguably a benefit).
Google Signals launched in July 2018, and it allows marketers to recognize users on their website who are not logged-in by leveraging Google’s identity software. This does not provide the marketer with any personally identifiable information, but allows them to do three things:
- Understand if (and how frequently) users access their website or app from different devices
- Learn some demographic and interest information that Google has aggregated about their users
- Create lists of users who they would like to advertise to on Google’s network (such as everyone who added an item to the cart but did not purchase)
NOTE: Google Signals is not enabled by default (more on why below). To enable it in GA4, go to Data Settings > Data Collection in the property settings section of your Admin menu.
The integration between Google Analytics and Google Signals was originally limited to a few pre-built reports, and very few of the Analyst’s I know found this useful for anything other than creating remarketing lists. What’s new in Google Analytics 4, is that Google Signals can now be used as your Reporting Identity.
What is Reporting Identity in Google Analytics 4
The “Reporting Identity” in GA4 refers to the method you would like to use to identify a user. There are three ID’s that GA4 might use:
The Device ID
On the web, this is known as the client ID. It is a random integer stored in a first-party cookie on the user’s first visit and set to persist for 2 years. In the classic version of Google Analytics (pre-2013) this was the only option. In a mobile app, this is set to the App-Instance ID.
The User ID
This was released with Universal Analytics, and allows you to set a value that Google should use to recognize a user when a user is authenticated (read more). With the user ID, companies were able to view a logged-in user’s activity across mobile and desktop devices for the first time.
This is the new option that uses the Google account, and it is only available for users who have enabled Ads Personalization.
When you create a property in Google Analytics 4, you can set the Reporting Identity from the property settings menu. The selection you make here will apply to all of your reports in GA4.
By Device Only
This method will only use the Device ID described above, which is less reliable than it was in the past now that browsers are currently rolling out various limits on cookies.
As an example, since Safari (or any browser on an iOS14 device) will limit the duration of cookies to 7 days, a user who returns for a session > 7 days after the last session will appear as a new user in your Google Analytics reports using this method.
By User-ID, Google Signals, then Device
When you select this option, you are telling GA4 to use the best identifier available. If your user has authenticated and a user ID exists, then this method will be used because it is the most accurate. Otherwise, if Google signals data is available, this method will be used. Finally, if there is no other option, the last resort will be the device ID (again, this is the client ID for web).
You can find Google’s documentation on reporting identity HERE.
At this point you’re probably wondering why anyone would choose “By device only”? The answer is privacy, so let’s discuss that next.
The State of Privacy
Browsers and legislatures have been working hard in recent years to protect your online privacy. Since there is no single entity that has the power to solve this problem, the leading browsers are each working on solutions with no shared consensus about what the Internet should look like in the future.
To the website or mobile app Analyst, the impact of this fragmented approach on her daily work is extremely complex and nuanced, because she needs to keep a list of known data gaps and approximate how those gaps might be impacting her analysis. For example, the Analyst needs to explain why the cost per conversion will differ significantly between Android users on Chrome vs iOS users on Chrome vs desktop users running FireFox, etc.
Privacy in Google Analytics 4
As I write this, Google has not launched any new features to directly address this problem, but Philip McDonnel (Director of Product Management at Google) recently suggested a solution to this problem in August of 2020, when he posted an article suggesting that machine learning could be used to approximate the data gaps that are created by users who opt-out to ads personalization, by modeling the behavior of users who have opted-in.
“Conversion modeling refers to the use of machine learning to quantify the impact of marketing efforts when a subset of conversions can’t be observed.”Philip McDonnel
Under this approach, the Analyst no longer needs to caveat the data with a list of known gaps, because the algorithm is responsible for identifying and filling these gaps. Additionally, it is no longer problematic to respect a user’s privacy request, because the data gaps that will be created as a result are expected and accounted for as long as a sample of users who have complete data remain (now you can see why Google Signals is so important).
As I mentioned above, conversion modeling is not currently an available feature within Google Analytics, but the expectation that it will be released in the near future would certainly help explain why Google Analytics 4 has rolled out with the following broad range of user-privacy features (originally outlined by Senior Product Manager Dan Stone in July) without any concern that these may create additional data gaps for Analysts to deal with:
- You can opt-out of Google Signals
- You have the option to accept or reject the Data Processing Terms
- Enable IP anonymization
- Partially or completely disable data collection on the client side
- Users may install the Analytics opt-out Add-on
- Select how long user-level and event-level data is stored
- Select what data can be shared with the Google support team
- Control ads personalization for your entire Analytics property, for specific countries, for specific events and user properties, or for an individual event or session
- Submit a request to delete data for a time period, for an individual user, or for an entire Analytics property
Privacy shifts in recent years have caused Analysts and marketers to live with uncertainty about the future. Although some of the features discussed in this post are not live yet, it is comforting to see that Google is moving in a direction that embraces privacy without degrading the value of the work that we do as Analysts.
In the same spirit, my recommendation to clients is also to fully embrace user privacy where possible. Request permission, collect the minimum amount of data necessary, purge it when you’re done, and clearly communicate what you’re doing with it.