
Getting started with AWS Glue crawlers
In this article, let us see how to get started with AWS Glue crawlers, what problem they solve, and why you might want to use them in a simple data lake setup. If you are keeping files in S3 and wa...

In this article, let us see how to get started with AWS Glue crawlers, what problem they solve, and why you might want to use them in a simple data lake setup. If you are keeping files in S3 and wa...

In this article, let us understand the basics of Unity Catalog in Databricks, why someone would use it, and how to think about it when you are just getting started. If you have been working with Da...

In this article let us go through how to use Databricks Delta tables in analytics pipelines and why this approach is useful when you want something more reliable than plain parquet files. If you ar...

In this article let us go through some practical basics for reducing BigQuery cost before it turns into a painful surprise in your monthly bill. If you are using BigQuery for analytics, reporting, ...

In this article let us see some cost optimization basics for AWS data pipelines, why they matter early, and what simple changes usually reduce the bill without making the platform too complicated. ...

In this article, let us see a few data quality checks that every beginner team should add early in their pipeline. This approach is useful because most data problems are not fancy platform problems...

In this article, let us see how to handle schema evolution in a data pipeline without breaking all the jobs that depend on it. This becomes important when your source system adds a column, renames ...

In this article let us see how to build reliable backfills in data pipelines, why we need them, and what things usually break when we run them in a hurry. Backfills sound simple at first. We missed...

In this article, let us see the basics of dbt and why analytics engineering teams use it so often. If your team is writing SQL transformations directly in the warehouse, copying queries between not...

In this article, let us see how to use Amazon Athena to query files from a data lake without building a separate database server or Spark job. This approach is useful when we already have data land...