In the trail of exploring more business values through data analysis, the integration of Google Analytics into BigQuery is quite challenging. We will fathom the challenges while using Google BigQuery with GA4 data in this blog and also straightforward solutions for it. Before all else, we will read through some main changes made in GA4 to overhaul Universal Analytics.
Google brought in GA4 as a replacement for Universal Web Analytics in the fall of 2020 making the businesses analyze useful metrics and KPIs not only with Web Analytics but also with App analytics. Even Though Google announces a new property type as App+Web in 2019, it is designed for any organization, even those without a native iOS or Android mobile application. To know more about the evolution of GA4, please read Google Analytics Evolution and also check out the difference between UA and GA4.
Until July 1, 2023, Google announces normal data processing in Universal Analytics properties. Though after July 1, 2023, users will be able to access only previously processed data in Universal Analytics property for at least six months. Google recommends the users who created Google Analytics properties before October 14, 2020, should change to GA4 Properties compulsorily. For more information, please visit UA goes away and to make a switch to GA4.
The main difference between Universal Analytics and GA4 is Universal Analytics session records are based on different types of hits such as page hits, event hits, eCommerce hits, and social interaction hits, but GA4 data is event-based where all user interactions are captured as an event.
For more information on events, check out the Events reference page.
I assume that the reader already knows about Google BigQuery and possesses basic knowledge of SQL. BigQuery plays a vital role in turning the big memory of data into useful information for business insights. Google enables Cloud platform users to import the GA4 analytics raw data in Google BigQuery with no charges (if exceeds fixed sandbox limits, then charges incur).
Please read this document on how to link BigQuery to Google Analytics 4
In the wake of importing GA4 into BigQuery, your BigQuery project stores the imported GA4 analytics raw data in a dataset named ‘analytics_<property_id>’ which contains multiple tables. Each table in the dataset represents the collection of data for each day. The dataset always contains two data tables,
In the above figure, the number after analytics_ is the property ID of GA4 having the dataset consisting of imported GA4 raw data and the events_ consists of day-by-day analytics. The first events_ (509) contains all sessions of the last 509 days, which means more than a year of data is present in the table.
Our first challenge is to deal with consolidating all events of analytics to analyze and report weekly, monthly, and even yearly performance.
The above query retrieves overall events (days) in GA4, which eventually takes a lot of time and space. To query sessions of specific days or periods, we use WILDCARD Queries.
The above query retrieves all sessions between the date range 01-01-2022 and 28-02-2022. In GA4 analytics raw data, each session is named by the respective date in the format of a string which is called ‘Static Date’.
With the help of dynamic dates in a BigQuery, we can retrieve the information for a specific period such as the last 30 days, yesterday, and so on.
In the above query, DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY) takes the last 30 days interval from the current date (today) and for DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY) the query gives yesterday’s analytics as the interval is 1 from the current date.
BigQuery also enables the use of both the static and dynamic date range.
We can also select the preferred date with the year, month, and date selection option in the BigQuery UI as shown in the below screenshot.
Data filter option in BQ
Now we will dive into the structure of GA4 events, where the column contains not only single data but also multiple records or arrays. For instance, in the below picture, the field name such as event_params, privacy_info, user_properties, geo, app_info, user_ltv, trafficSource, device, and so on has a drop-down list.
GA4 structures of events
These dropdowns are the array of various subfields in each of the columns in the table which makes the GA4 analytics data table in JSON format. The JSON table consists of multiple records of value inside each field and subfield. Here comes the special tips to handle nested fields in the GA4 dataset.
GA4 analytics data table in JSON format
Overview of the structure of the GA4 data table
The solution is nothing but, flattening the nested fields (also said like un-nesting the nested field) which is in the form of an array. In the above picture, you could see hierarchical arrangements of the nested field or repeated record-like events_param to its subfield page. Hence, the flattened nested field will be represented as (<columnname.fieldname.subfieldname>). In our example, it is events_param.value.string_value, but if you use this type of pointer addressing directly in the SQL query for repeated records, the BigQuery prompts an error.
Hence, we use UNNEST or CROSS JOIN UNNEST in the way that BigQuery understands and fetches the required information from the flattened nested field or repeated records. For example, if you want to get analytics information specifically conditioned for the file_download event with extension ‘.pdf’, then use ‘UNNEST (events_param) AS ep’ where ep is used in place of events_param as below.
The above query fetches event and user information such as ‘file_download’ (list of downloaded files by which user_id and using which platform). The UNNEST makes it possible, by creating a new column events_param.value.string_value as key and its single bit of data as value.
GA4 table Repeated records in BQ
For simple subfields like category in device as in the picture above, the field names can be addressed directly by giving their column name device. For example, in the below query, device.category is being used directly without the help of UNNEST as <fieldname.subfieldname> to call that particular subfield value country.
To use more than one repeated record simultaneously in the query, we can join two UNNEST in one query easily by chaining nested fields together.
In the above example, we want to unnest two different nested fields event_params and user_properties, but BigQuery prompts an error if you use both the UNNEST in the same query.
Hence, we use WITH clause to unnest multiple repeated records in BigQuery. We have created two tables for each UNNEST (event_params) and UNNEST (user_properties) in the above example query. This query retrieves a list of all downloaded files by which user in what device category, platform, and country. As mentioned earlier, the query used two nested fields by chaining them together in one query. The same WITH clause can be used to handle more than two nested fields in BigQuery.
The output of an above query
As given below, using PARSE_DATE in the query, the string type date field is converted into proper date format. Using the UNNEST name, the nested field can be used with SELECT statement in a query mainly wherever the array is referenced and if it is not referenced to an array otherwise STRUCT, then use a simple naming convention for nested fields as <columnname.fieldname.subfieldname>.
Output for the above query
With these basic handling techniques, you can easily access GA4 analytic data in BigQuery. For more information, please read wildcard queries.
Sample output of queries
GA4 properties can capture additional information such as Coupons, discounts (new), Promotions, and Refunds. Please refer to GA4 Exploration to know more about GA4 features.
Since Universal Analytics made our work effortless and serene all over the years, now it could be a little over our head in the commencing phase to migrate to new Google Analytics 4 properties. On the brighter side, GA4 has enormous essential features and intriguing events tracing than UA, and also hoping for more new GA4 features from Google.
While on the other hand, to make the migration process easier, Google recommends users handle a parallel tracking approach. Which is nothing but, the users already having existing UA property can create a new GA4 property that collects data in parallel to UA. For more information, please visit Parallel tracking and Migrate E-commerce data from UA to GA4.