Building 3D Trusted Data Pipelines With Dagster, Dbt, and Duckdb
Part of the All Things Data! specialist track
This talk will demonstrate an example of building trusted data pipelines with 3D packages: Dagster, Dbt, and Duckdb. First, it presents the importance of trusted data pipelines by testing data quality. It then discusses what we need to test data quality, from high levels (like tables, relations, …) to low levels (like rows and columns). After that, I will show how to implement these tests using different Dbt packages like dbt-utils and dbt-expectations.
Finally, a demo with a complete ELT workflow will be presented. In this demo, Dagster is used as a data pipeline orchestrator, and Dbt is utilized for data transformation with its related testing packages. These transformed pipelines sit on top of Duckdb, which acts as a small data warehouse. This demo is published in a GitHub repository, allowing developers to clone and run the demo independently.
The talk will help data and analytics engineers build more robust tests for their data pipelines. These trusted data pipelines could enhance the data quality and validation process, reducing the risk of other data issues like data drift for downstream channels.
See this talk and many more by getting your ticket to PyCon AU now!
I want a ticket!Danh is keen on applying data engineering and machine learning tools to solve real-world business problems. He is a Microsoft Certified Data Engineer and an open-source contributor.
Danh has actively contributed to popular open-source projects on data engineering and probabilistic machine learning on GitHub.
Please visit his webpage at https://danhphan.net for further information.