Handle Sessions and Engagements in the Google Analytics 4 (GA4) data in a correct way
Use GA4 raw data in BigQuery. By Sanu Maharjan
When you are using GA4 to track your website or app, looking at sessions and engagement could be very insightful. Without any delay, first, let’s look at how sessions and engagement are defined in GA4 and then how we can extract them to Bigquery.
A session is triggered when one of the following conditions is met:
User opens your website or app
User views a page or screen (given that no other sessions are active)
Note: A session usually lasts for 30 minutes. That means when a user goes to your website and comes back after 31 minutes, then a new session starts.
Sessions can be useful to see, how many users are coming or visiting your platform, but it would be more interesting to see, how many of them are having some sort of ‘engagement’. Then let's look at how is engagement defined by Google.
In GA4, a session is considered to be engaged, if:
There are 2 or more page_views OR
There are 1 or more conversion events OR
The user lasts more than 10 seconds
Note: If you think, 10 seconds is too short for an engaged session, then you can manually change the time up to 60 seconds in your GA account.
Now the question remains, how to extract the data for sessions and engaged sessions? If you look at the raw GA4 data, there is no direct answer. So, the trick is to count ga_session_id and user_pseudo_id, by executing the following query:
Short explanation, of what is happening behind the scenes: First I created a subquery named prep (as in preparation). In the prep subquery, I have selected parsed event date and user_pseudo_id.
Next, I have UNNEST the event_params array and select only those values, which have the key = ‘ga_session_id’. These values are stored with ‘int_value’ meaning they are INT64 data types. Here, ga_session_id is a timestamp when a user enters your platform. We can’t simply COUNT ga_session_id and call them a number of sessions and the reason is that, if two users enter your platform at the same time, they both will have the same ga_session_id.
Then for session_id, I first UNNEST event_params and extract integer values with key = ’ga_session_id’ and then concatenate user_pseudo_id and ga_session_id. By doing so, even in the case of multiple users entering the platform at the same time, they would have a different session_id.
Similarly, for engaged_session and engagement_time_msec, I have UNNEST the event_params column and then only select those keys, which are ‘session_engaged’ and ‘engagement_time_msec‘ respectively. Also, note that values of session_engaged are strings, whereas for the engagement_time_msec it is an integer and they are measured in microseconds.
In the end, I have grouped by event_date and then COUNT the DISTINCT session_id for the number of sessions, and when there is an engagement session I have again COUNT session_id and taken the averaged engaged time in microseconds.
With such a query, one can get information on how many sessions were triggered and how many were engaged in those sessions, and also what was their average time for the engagement. In the upcoming post, I’ll write down more queries on how to extract data from raw GA4.
This post is part of the series to get insights from your GA4 raw data in BQ.
Check out our LinkedIn account, to get insights into our daily working life and get important updates about BigQuery, Data Studio, and marketing analytics.
We also started with our own YouTube channel. We talk about important DWH, BigQuery, Data Studio, and many more topics. Check out the channel here.
If you want to learn more about how to use Google Data Studio and take it to the next level in combination with BigQuery, check our Udemy course here.
If you are looking for help to set up a modern and cost-efficient data warehouse or analytical dashboards, send us an email at firstname.lastname@example.org and we will schedule a call.