Drag-and-drop Data Modeling in Y42 (Part 2)
Use function nodes to transform your data. By Deborah Menezes
In the last blog article from the Y42 series, we covered how to build data models using Y42. We discussed the steps involved in building pipelines using the Y42 UI model and how field nodes can be manipulated. Let's get right into how function nodes can further define our F1 data!
Sanu has already explained how you can create the model and perform table manipulations. Once our data model is ready we can start transforming our data by using various drag and drop nodes that contain different functions pertaining to the desired end result. Here I will be explaining some of the important SQL functions for manipulating columns.
First, we enter the modeling layer. Then, we select the “Column Manipulations”. Here we can see all the functions that can be used on the columns.
Functions node 101
There are many cases where advanced functions and logical statements might be required in your models, and this needs a little bit of coding knowledge. We also have a course on Bigquery on Udemy, if you are interested in learning more about coding. For this reason, Y42 introduced the Functions node.
Using the Functions node, you can perform:
Number Functions: These are mathematical/ arithmetical calculations in which you can add, subtract, divide, or multiply numerical fields. Furthermore, you can also calculate the Ceil, Floor, and Round, or the absolute value of a specific field.
Text Functions: These are the functions that can be used to manipulate text or string fields. A few examples are trimming text, changing it to upper or lower cases, replacing text, splitting, and so on.
Boolean Functions: These are the functions that follow certain binary conditions and return either as true or false. A few examples of boolean functions or operators are AND, OR, NOT, GREATER_THAN, LESS_THAN, CONTAINS, etc.
Date + Time functions: These functions allow you to subtract dates, truncate date values, and cast types to convert it the data to a DateTime or timestamp.
Adding a Functions node to your model
When a functions node is selected, it can be dragged and dropped into the canvas.It has to be connected to a valid input node in the canvas to be used, i.e the model should already have an imported data source in Y42, which we have covered in the previous post. Please check that out for a refresher.
Once a specific field is entered that needs to be transformed, it automatically shows the data type. It’s always a good practice to name your function node to understand which transformation is taking place as you can have multiple nodes throughout your model and it can get quite confusing.
Setting up your Functions node
Click on the Functions node to apply the settings. On the right side, you have access to the documentation for each available function
Here you can transform the columns as you wish. You can either create a new one or edit the same one.
For example, let’s say, you need to upper case the names of employees, then you can create a new column ‘upper_name’. For this enter the column name “upper_name” in Target Column and in the formula box type UCASE(@name), and the results will look like the following:
Or if you simply need only all uppercased names then in the Target Column you can select the name column and it will be replaced with edited values.
In Y42, whenever you choose the column that you want to use, it always has to start with the @ symbol.
You can choose the function or a combination of functions you want to apply to your data and once you added it, you can check the result of the transformation by clicking on Output. If you are ready, then click on Save.
Remember to save your model after performing transformations by clicking on Commit Model. To have more insights in the version history, add a small comment to the commit.
You can perform multiple transformations within a single functions node, but just one transformation per column per specific function.
Using the Functions node in our model
As mentioned before, we are very fond of the F1 races. Every year during the F1 race, we send out a form to our team members who try to predict who would win the race. Along with that we even collect the top 5 players and the best team. The data is collected one day before the race and the points are calculated. So we have a track record of every race and every participant with points, this data has been integrated into Y42 and the previous blog shows you how we did that. The next step is understanding how to manipulate the data using data modeling which is also explained here.
We use the CONCAT function, we join two columns together by using a hyphen in between them as seen in the figure below. The targeted column name is entered, here being season_round.
Then we use an IF condition statement to display the winner if the driver is in the 1st position by taking the result_position which contains the contestant’s rank.
To summarize, we explored the Y42’s UI modeling layer once more, focusing on the Functions node and explaining how to use it. We then implemented it in our own model about F1 races by concatenating the season and the round number to identify which driver won the race.
I will end this session here for now but in the upcoming posts, Sanu and I will dive deeper into other nodes and their functionalities which will help in transforming our data just like we want it. In the end, we will build our very own F1 2022 datadice dashboard and see who the winners are!
This post is part of the Y42 series from datadice and the series gives you detailed information on different features in Y42.
If you think that Y42 could be an attractive solution to your data problems, be sure to get in touch with them.
Check out our LinkedIn account, to get insights into our daily working life and get important updates about BigQuery, Data Studio, and marketing analytics
We also started with our own YouTube channel. We talk about important DWH, BigQuery, Data Studio, and many more topics. Check out the channel here.
If you want to learn more about how to use Google Data Studio and take it to the next level in combination with BigQuery, check our Udemy course here.
Have trouble setting up an ETL Pipeline on Y42 or if you are looking for help to set up a modern and cost-efficient data warehouse or analytical dashboards in general, send us an email at firstname.lastname@example.org and we will schedule a call.