- BigQuery, Google Cloud Platform
- Big Query, Google Cloud
- 4 min reading time
Tobias Vogler
BigQuery Data Canvas (BQDC) is a service supported by Gemini AI on the Google Cloud Platform (GCP), designed to make finding, exploring, and analyzing data within Google BigQuery, the central analytics service of GCP, more intuitive. Learn more about what BQDC offers and the added value it can bring to different use cases in this article.
Table of contents
- What exactly is BigQuery Data Canvas?
- What other features does BQDC offer?
- What resources can be found through the search function?
- What added value do you get if you already have a very good knowledge of SQL?
- How much does it cost to use BQDC?
- What are the limitations of BQDC?
- What is BQDC NOT?
- Know more?
What exactly is BigQuery Data Canvas?
BQDC is a tool for developers and data analysts that is directly embedded in the familiar web interface of BigQuery Studio (BQS). Typically, in BQS, a SQL query is written and executed in a tab.
As you can imagine, when exploring data sources (especially tables and views), this quickly leads to many tabs being opened in parallel, between which you not only have to switch visually, but also mentally, which means you have to engage in efficiency-sapping context switching. This is naturally exacerbated by the number of data sources to be examined and the scope or depth of your analysis.
BigQuery Data Canvas aims to simplify and optimize this process. With BQDC, all analyses can be performed on a single “infinite canvas,” eliminating the need to switch between different tabs. This allows for a better overview and control of the analyses performed, making it easier to identify connections and patterns in the data.
What other features does BQDC offer?
In addition to the clearer presentation of the analyses, BQDC simplifies the finding of data through an extensive search function. As already mentioned at the beginning, the work in BQDC is supported by Gemini AI, so that some development activities can be carried out entirely or at least supported by writing prompts in natural language.
A found data source can also be queried via prompt in natural language. The prompt is translated into an SQL query, which can then be adjusted and executed if necessary.
However, the results of these queries are not stand-alone in BQDC, but can in turn be used as input for new queries. This then creates a directed acyclic graph, or DAG for short. This defines the execution order of the individual queries and can also be executed as a holistic data flow/pipeline. As a result, the technical definition and execution of data processing also better matches the mental model of data as a continuous flow.
Furthermore, it is also possible to generate meaningful visualizations from the data using prompts. This integrates analyses and visualizations on one interface, which overall can lead to a significantly faster gain of intuition for the data. Which charts can be generated can be viewed here.
In addition, the canvas can also be saved and shared with other people. There is also the option of converting the canvas or the DAG defined there into a Python or Jupyter Notebook and using this as a starting point for further work. Both aspects enable easier collaboration between several people with different roles.
What resources can be found through the search function?
First, a brief explanation of how the search works from the user's perspective: You can either find data sources in the project's data inventory using natural language, or, if you already have a clearer idea of your data sources, you can use a comprehensive keyword search with its own expressive search syntax. This considers not only the names of data sources, but also their attributes and other available descriptions.
Now to the actual question: The search function naturally does not traverse the entire GCP for suitable data sources, but uses the metadata managed under Google Dataplex for this purpose. Dataplex partly collects the metadata of your project independently, but data sources can also be explicitly added here. So, if you manage the resources of your Data Mesh, Data Lakes or Data Vaults via Dataplex, these assets will also be findable through the search in BQDC. In addition, the BigQuery Public Datasets provided by Google can also be searched.
What added value do you get if you already have a very good knowledge of SQL?
A large part of every SQL query is always the same, which is why generating a query can greatly reduce development time to a meaningful result, even if the generated query misses the originally intended result. Colloquially, one could say that you can have a large part of the "boilerplate code" generated.
Furthermore, many SQL processings are easy to define but are extremely verbose and repetitive. Consider, for example, a query that calculates the total number of records and the number of null values for each attribute. This can be a good indicator of the data source quality. Such a query is not complicated in itself, but it requires a disproportionate amount of time, which would be eliminated by generating the query.
Furthermore, the integrated visualization functionality makes it much easier to develop a holistic view or comprehensive intuition regarding the data and its quality.
How much does it cost to use BQDC?
The use of the service itself does not incur any costs. However, the execution of the defined DAGs or data pipelines in the background uses other GCP services. These will then incur the corresponding fees. The main cost factor will generally be the fees for compute and storage in BigQuery. However, these would also be incurred if the requests were sent directly from BQS instead of via BQDC.
What are the limitations of BQDC?
Click here for the general limitations of BQDC. This mainly concerns resources that can be used by BigQuery, but are not native to the technology. Furthermore, the generation of queries on metadata, i.e. the data dictionary or information schema, is difficult. Problems can also arise for queries created by prompts due to nested data structures. This mainly concerns the BigQuery structures ARRAY and STRUCT, but also attributes of the JSON data type.
What is BQDC NOT?
It's often helpful to know what a particular technology isn't suitable for, or which use cases are better solved with other technologies, to avoid raising false hopes.
BQDC is not a replacement for developers such as Data Engineers or Data Analysts who possess extensive expertise in SQL and data modeling in general, as well as physical database design for performance optimization. However, BQDC can also help these people make their work easier and more efficient.
BQDC is also not a low-code or no-code tool that helps users with little experience build complex logic, as the results generated should still be checked for accuracy and consistency.
Furthermore, it does not replace data pipelines developed by subject matter experts, which have been developed using sophisticated technologies and tested with care. A possible technology that can define and execute an end-to-end pipeline via SQL would be dbt (data building tool). We have already held a webinar on this topic in the past, the recording of which can be found here.
Know more?
Would you like to delve deeper into this topic? Then we would be happy to talk to you personally about the possibilities of the Google Cloud Platform (GCP).
Published by:
Tobias Vogler
Tobias Vogler
How did you like the article?
How helpful was this post?
Click on a star to rate!
Average rating 3 / 5.
Number of ratings: 2
No votes so far! Be the first person to rate this post!







