
ETL vs ELT: A Practical Guide with Real Examples
In this article, let us look at what ETL and ELT actually mean in practice — not the textbook definitions you have already read a hundred times, but what happens when you sit down to build a pipeli...

In this article, let us look at what ETL and ELT actually mean in practice — not the textbook definitions you have already read a hundred times, but what happens when you sit down to build a pipeli...

In this article, let us look at the Medallion Architecture — a data design pattern that organises your data lake into three layers: Bronze, Silver, and Gold. If you have heard terms like “bronze ta...

When I started building data pipelines on Databricks, the orchestration piece was always the awkward part. You would write your notebooks, get the transformations working, and then you had to figur...

When I first started working with GCP, I kept hearing about Dataflow but wasn’t quite sure where it fit. We had Cloud Functions for light event-driven work, Composer (managed Airflow) for orchestra...

If you have worked with BigQuery for any reasonable amount of time, you have probably stared at a query that scanned a few hundred gigabytes when it really only needed a few megabytes. That is usua...

In a previous article we looked at Terraform for managing data infrastructure on GCP. Terraform is great, but if your team works mostly in AWS and writes Python or TypeScript all day, you might fin...

In this article, let us look at AWS Step Functions and how you can use it to orchestrate your data pipelines. If you have been building data pipelines for a while, you have probably used tools like...

Infrastructure as code has moved from being a niche platform-engineering skill to something every data engineer bumps into. A few years ago, my job stopped at making the pipeline run — someone else...

In this article let us look at how you can use GitHub Actions to set up a simple CI/CD pipeline for your data workflows. If you are building dbt models, Airflow DAGs, or Python-based ETL scripts an...

In this article, we will build a simple but real ETL pipeline using Apache Airflow. If you have been writing cron jobs or one-off scripts to move data around and are looking for something more mana...