BigQuery is a Data Warehouse tool to store and transforms big amounts of data. One of the biggest differences to an operational database is, that you can have intentionally duplicate information in different tables and no needed relations between the tables.
Google added a full feature set of Key constraints to BigQuery now. You can set keys during the table creation process, view the current constraints in a dataset, and many more.
Everybody who is interested in the new BigQuery feature has to read this post.
Primary Keys and Foreign Keys are usually one of the first things you learn when you start with SQL or databases.
Primary Keys:
Foreign Keys:
To create a table with a Primary Key you have to do the following:
CREATE TABLE key_constraint_tests.planet (
sun_id INT64,
position INT64,
name STRING,
number_of_mons INT64,
PRIMARY KEY (sun_id, position) NOT ENFORCED
);
So due to no enforcement, it is not a problem to add the following rows to the table:
INSERT key_constraint_tests.planet (sun_id, position, name, number_of_mons)
VALUES(1, 1, 'mercury', 0),
(1, 1, 'venus', 0),
(1, 3, 'earth', 1),
(1, 4, 'mars', 2)
In the data is an error and Mercury and Venus get both position 1. So we have the same combination twice inside of a Primary Key.
Creating a table with foreign Keys works kind of the same as for the Primary Key:
CREATE TABLE key_constraint_tests.atmospheres (
sun_id INT64,
position INT64,
element_1 STRING,
element_2 STRING,
element_3 STRING,
FOREIGN KEY(sun_id, position) REFERENCES key_constraint_tests.planet (sun_id, position) NOT ENFORCED
);
Then we can also add the following lines to the table:
INSERT key_constraint_tests.atmospheres (sun_id, position, element_1, element_2, element_3)
VALUES
(1, 1, 'Hydrogen', 'Oxid', 'Helium'),
(1, 2, 'Carbon dioxide', 'Nitrogen', 'Sulfur dioxide'),
(1, 5, 'Hydrogen', 'Helium', 'Methane')
The 3rd insert is working too. There is no 5th planet in the original table, but due to the no enforcement setting, there is no error.
You can also easily add Primary or Foreign keys to tables later.
With the ALTER TABLE statement, you can add a Primary Key:
ALTER TABLE key_constraint_tests.planet_without_pk
ADD PRIMARY KEY(sun_id, position_from_sun) NOT ENFORCED;
Or you add a Foreign Key:
ALTER TABLE key_constraint_tests.atmospheres_without_fk
ADD FOREIGN KEY(sun_id, position_from_sun) REFERENCES key_constraint_tests.planet (sun_id, position_from_sun) NOT ENFORCED
When you have your first tables with the constraints in a dataset, you can get an overview of the current situation with a new metadata view.
The view is the table_constraints view. You can look at the results with the following command:
SELECT *
FROM key_constraint_tests.INFORMATION_SCHEMA.TABLE_CONSTRAINTS
You just have to change the dataset name in the query above and you will get a valid result.
The information you get is quite basic. One row is one constraint.
A full list you can find here.
Google added a functionality to BigQuery, which is already available in operational databases like MySQL, PostgreSQL, … . Primary and Foreign Keys are super important in these systems.
The current status in BigQuery is just a first version. A lot of features are still missing to give a comprehensive set of features to the analysts and engineers. The most important missing features are:
In a summary, it is a good and easy starting point, but to use it in projects, a lot of features are still missing. Currently, it is more for showing the references between tables.
This post is part of the new features in the Google cloud series from datadice and gives you detailed insights about a new feature in BigQuery or Data Studio.
Check out our LinkedIn account, to get insights into our daily working life and get important updates about BigQuery, Data Studio and marketing analytics
We also started with our own YouTube channel. We talk about important DWH, BigQuery, Data Studio and many more topics. Check out the channel here.
If you want to learn more about how to use Google Data Studio and take it to the next level in combination with BigQuery, check our Udemy course here.
If you are looking for help to set up a modern and cost-efficient data warehouse or analytical dashboards, send us an email to hello@datadice.io and we will schedule a call.