dbt Basics for Analytics Engineering Teams: A Practical Guide

In this article let us look at dbt (data build tool) — what it actually is, why analytics engineering teams reach for it, and how to get a basic project running. I will walk through setting up mode...

Jul 22, 2025 Data Engineering, Analytics

Querying Your S3 Data Lake with Amazon Athena — A Practical Guide

If you have data sitting in S3 and someone asks you a question that needs SQL to answer, you basically have two paths: spin up something that loads the data somewhere first, or query it where it li...

Jul 15, 2025 Data Engineering, AWS

S3 Data Lake Folder Design – Best Practices from the Trenches

You have probably heard the phrase “S3 is schema on read” a hundred times. What people say less often is that your folder structure becomes your schema — whether you planned it that way or not. In...

Jul 8, 2025 AWS, Data Engineering, Data Lake

Batch vs Streaming: A Practical Guide for Beginner Data Engineers

When I started building data pipelines, I kept hearing “batch” and “streaming” thrown around like they were completely different worlds. It took me a while to realise they are not as far apart as t...

Jul 1, 2025 Data Engineering, Beginners

ETL vs ELT: A Practical Guide with Real Examples

In this article, let us look at what ETL and ELT actually mean in practice — not the textbook definitions you have already read a hundred times, but what happens when you sit down to build a pipeli...

Jun 24, 2025 Data Engineering, Data Pipelines

Medallion Architecture Explained: A Practical Guide to Bronze, Silver, and Gold Layers

In this article, let us look at the Medallion Architecture — a data design pattern that organises your data lake into three layers: Bronze, Silver, and Gold. If you have heard terms like “bronze ta...

Jun 17, 2025 Data Engineering, Data Architecture

Databricks Workflows for Scheduled Jobs: A Practical Guide

When I started building data pipelines on Databricks, the orchestration piece was always the awkward part. You would write your notebooks, get the transformations working, and then you had to figur...

Jun 10, 2025 Data Engineering, Databricks

Getting Started with GCP Dataflow for Batch Pipelines: A Practical Guide

When I first started working with GCP, I kept hearing about Dataflow but wasn’t quite sure where it fit. We had Cloud Functions for light event-driven work, Composer (managed Airflow) for orchestra...

Jun 3, 2025 Data Engineering, GCP, Cloud

BigQuery Partitioning and Clustering: A Practical Guide for Data Engineers

If you have worked with BigQuery for any reasonable amount of time, you have probably stared at a query that scanned a few hundred gigabytes when it really only needed a few megabytes. That is usua...

May 27, 2025 BigQuery, Data Engineering, GCP

AWS CDK for Data Platform Teams: A Practical Guide

In a previous article we looked at Terraform for managing data infrastructure on GCP. Terraform is great, but if your team works mostly in AWS and writes Python or TypeScript all day, you might fin...

May 20, 2025 Data Engineering, AWS, Infrastructure as Code