By clicking “Accept”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Privacy Policy for more information.
Apr 27, 2023
More

How comprehensive are the new Primary Keys and Foreign Keys in BigQuery?

Google is starting to add features from systems like MySQL. By Alexander Junke

BigQuery is a Data Warehouse tool to store and transforms big amounts of data. One of the biggest differences to an operational database is, that you can have intentionally duplicate information in different tables and no needed relations between the tables.

Google added a full feature set of Key constraints to BigQuery now. You can set keys during the table creation process, view the current constraints in a dataset, and many more.

Everybody who is interested in the new BigQuery feature has to read this post. 

What are Primary Keys and Foreign Keys?

Primary Keys and Foreign Keys are usually one of the first things you learn when you start with SQL or databases. 

Primary Keys:

  • Each table can have 1 or more columns flagged as Primary Keys
  • The combination of the primary key values in a row must be unique in the table

Foreign Keys:

  • Each table can have 1 or more columns flagged as Foreign Keys
  • The Foreign key column needs to be in another table in the same format
  • The combination of the Foreign Key columns does not need to be unique in the table

Creating tables with Primary Keys

To create a table with a Primary Key you have to do the following:

CREATE TABLE key_constraint_tests.planet (

sun_id INT64,

position INT64,

name STRING,

number_of_mons INT64,

PRIMARY KEY (sun_id, position) NOT ENFORCED

);

  • Start with “PRIMARY KEY”
  • in brackets which column values are forming the primary key
  • In the end, choose if the Primary Key is enforced or not
  • Currently, (22.03.2023) enforcement of the Primary Key is not possible

So due to no enforcement, it is not a problem to add the following rows to the table:

INSERT key_constraint_tests.planet (sun_id, position, name, number_of_mons)

VALUES(1, 1, 'mercury', 0),

     (1, 1, 'venus', 0),

     (1, 3, 'earth', 1),

     (1, 4, 'mars', 2)

In the data is an error and Mercury and Venus get both position 1. So we have the same combination twice inside of a Primary Key.

Creating tables with Foreign Keys

Creating a table with foreign Keys works kind of the same as for the Primary Key:

 CREATE TABLE key_constraint_tests.atmospheres (

 sun_id INT64,

 position INT64,

 element_1 STRING,

 element_2 STRING,

 element_3 STRING,

 FOREIGN KEY(sun_id, position) REFERENCES key_constraint_tests.planet (sun_id, position) NOT ENFORCED

);

  • Start with “FOREIGN KEY”
  • which columns of this table are part of the key
  • Continue with REFERENCES 
  • Dataset and table name of the referencing tables
  • which columns of the referencing table build the connection
  • Choose between enforcement or not (Again just no enforcement is possible for now)

Then we can also add the following lines to the table:

INSERT key_constraint_tests.atmospheres (sun_id, position, element_1, element_2, element_3)

VALUES

     (1, 1, 'Hydrogen', 'Oxid', 'Helium'),

     (1, 2, 'Carbon dioxide', 'Nitrogen', 'Sulfur dioxide'),

     (1, 5, 'Hydrogen', 'Helium', 'Methane')

The 3rd insert is working too. There is no 5th planet in the original table, but due to the no enforcement setting, there is no error.

Change Keys in an already existing table

You can also easily add Primary or Foreign keys to tables later. 

With the ALTER TABLE statement, you can add a Primary Key: 

ALTER TABLE key_constraint_tests.planet_without_pk

ADD PRIMARY KEY(sun_id, position_from_sun) NOT ENFORCED;

Or you add a Foreign Key:

ALTER TABLE key_constraint_tests.atmospheres_without_fk

ADD FOREIGN KEY(sun_id, position_from_sun) REFERENCES key_constraint_tests.planet (sun_id, position_from_sun) NOT ENFORCED

Using the Table_Constraints view

When you have your first tables with the constraints in a dataset, you can get an overview of the current situation with a new metadata view.

The view is the table_constraints view. You can look at the results with the following command:

SELECT *

FROM key_constraint_tests.INFORMATION_SCHEMA.TABLE_CONSTRAINTS

You just have to change the dataset name in the query above and you will get a valid result. 

The information you get is quite basic. One row is one constraint.

  • Constraint_name: The name of the constraint, if it is the default name which contains the name of the table, then .pk or .fk and then for the Foreign Key the number of constraints (e.g.  planet.pk$, atmospheres.fk$1)
  • constraint_type: is it a Primary Key or Foreign Key
  • Is_deferrable, initially_deferred, enforced: Currently always NO
Part of the metadata table

A full list you can find here.

Conclusion

Google added a functionality to BigQuery, which is already available in operational databases like MySQL, PostgreSQL, … . Primary and Foreign Keys are super important in these systems. 

The current status in BigQuery is just a first version. A lot of features are still missing to give a comprehensive set of features to the analysts and engineers. The most important missing features are:

  • Check for no doubled primary keys in a table
  • Check that the Foreign Keys are existing in the referenced table
  • Better overview of the current constraints in a dataset or in a table (e.g. in the table detail tab)

In a summary, it is a good and easy starting point, but to use it in projects, a lot of features are still missing. Currently, it is more for showing the references between tables.

Further Links

This post is part of the new features in the Google cloud series from datadice and gives you detailed insights about a new feature in BigQuery or Data Studio.

Check out our LinkedIn account, to get insights into our daily working life and get important updates about BigQuery, Data Studio and marketing analytics

We also started with our own YouTube channel. We talk about important DWH, BigQuery, Data Studio and many more topics. Check out the channel here.

If you want to learn more about how to use Google Data Studio and take it to the next level in combination with BigQuery, check our Udemy course here.

If you are looking for help to set up a modern and cost-efficient data warehouse or analytical dashboards, send us an email to hello@datadice.io and we will schedule a call.