By clicking “Accept”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Privacy Policy for more information.
Apr 27, 2023
More

Big Data vs. Data Warehouse vs. Data Lake vs. OLTP vs. OLAP

Unlocking the power of data buzzwords. By Sanu Maharjan

The use of data in numerous businesses to inform choices, enhance operations, and gain a competitive edge has made it an essential part of modern life. Understanding the fundamental words used in the profession is essential as data gathering and analysis grow more and more relevant. This blog will provide you with a thorough understanding of the major terms and concepts that constitute the basis of data analysis, whether you're already familiar with them or just want to brush up.

Big Data

Big Data refers to large datasets, or the technology used to handle them. It is characterized by three key attributes: volume, velocity, and variety. 

  • Volume refers to the sheer size of the data, which can range from megabytes to Petabytes or even to zettabytes. 
  • Velocity refers to the speed of data acquisition, which can range from ad-hoc (processed on-demand) to batched (processed in regular intervals) to real-time.
  • Variety refers to the different types of data, which can include key/value, tabular, images, audio, video, and unstructured data.

We can look at a 3D graph to see, where your data lies. In the x-axis we have volume, the y-axis is velocity, and the z-axis is the variety. The more your data is stretched out, it shows, how big your data is.

Big Warehouse & Data Lakes 

Data lakes are raw and/or unstructured datasets and are usually stored in their native format. They are ready to be analyzed but may require more advanced tools for analysis. Data lakes are also more flexible, as new types of data can be added at any time. Google Cloud Storage, Amazon S3, and Microsoft Azure Data Lake Storage are widely used. Data ingestion is typically done using an ETL process that extracts data from the data lake, transforms it into a format that is optimized for analysis and loads it into the data warehouse.

Data warehouses are processed datasets that are organized and stored in a structured way. They are ready to be consumed for a defined purpose and are typically rigid. Popular tools for warehouses would be Google BigQuery, Amazon Redshift, Microsoft Azure, and many more. It is harder to change the structure than in data lakes.

OLTP vs OLAP 

On-line Transactional Processing (OLTP) and On-line Analytical Processing (OLAP) are both types of data processing systems and both of them are online database systems, hence the name “On-Line Processing”. The difference between them is how they are used or the methods of querying the database. 

OLTP, is a technique for processing transactions instantly using an online database. It is frequently used by businesses like banks, hotels, and restaurants since it lets their staff and clients complete several transactions simultaneously while guaranteeing the data is accurate. With OLTP, the system automatically changes account balances and logs crucial information, including the date and time, as transactions happen.

Large volumes of data can be analyzed using the OLAP approach. For instance, a business can use OLAP to filter and analyze data depending on each component of its advertising efforts, including consumer exposure, ad length, product sales, and advertising expenses. Businesses frequently use OLAP for complex analytical calculations, data extraction, financial analysis, budgeting, and trend forecasting.

To put it in very simple terms, OLTP is used to modify the database, while OLAP is used to query the database.

Batch and Streaming Data 

Data can be collected in two ways: batch and streaming. Batch involves gathering data within a defined window of time and loading it into the system. This is best suited for large volumes of data, such as loading data from legacy systems. Streaming involves continuously collecting data into the system as it happens and is best suited for near real-time analytics. Windows and micro-batches are used to process the data.

Conclusion

In conclusion, big data, data warehouses, and data lakes are all essential tools for any data-driven organization. On-line Transactional Processing and On-line Analytical Processing are two distinct methods of querying the database, while batch and streaming are two ways to collect data. Understanding these terms and their distinctions is key to successful data management and analysis. With the help of these tools, organizations can make better use of the data they possess to make informed decisions and improve their operations.

Further links

Check out our LinkedIn account, to get insights into our daily working life and get important updates about BigQuery, Data Studio, and marketing analytics

We also started with our own YouTube channel. We talk about important DWH, BigQuery, Data Studio, and many more topics. Check out the channel here.

If you want to learn more about how to use Google Data Studio and take it to the next level in combination with BigQuery, check out our Udemy course here.

Have trouble setting up an ETL Pipeline on Y42 or if you are looking for help to set up a modern and cost-efficient data warehouse or analytical dashboards in general, send us an email at hello@datadice.io and we will schedule a call.