PinnedPublished inData Engineer ThingsHow to Set Up CI/CD with Databricks Asset Bundles and GitHub ActionsFrom VS Code to Databricks — a step-by-step CI/CD workflow with GitHub Actions and Asset Bundles.Apr 16A response icon2Apr 16A response icon2
PinnedPublished inData Engineer ThingsData Quality Checks with Databricks DQX: A Step-by-Step GuideAutomate data quality checks, detect errors, and optimize your data pipeline using Databricks DQX Framework.Apr 29A response icon3Apr 29A response icon3
PinnedPublished inData Engineer ThingsData Quality With Airflow SQL Check Operators: A Step-by-Step GuideAutomate data quality checks, detect errors, and stop bad data downstream using Airflow’s SQL Check OperatorsJun 1A response icon2Jun 1A response icon2
Published inData Engineer ThingsHow I Validate Data Schema with GreatExpectations (GX): A Step-by-Step GuideAutomate schema validation, detect structural issues early, and keep your data pipelines consistent with Great Expectations (GX).Nov 8A response icon2Nov 8A response icon2
Published inData Engineer ThingsHow I Validate Data Freshness with GreatExpectations (GX): A Step-by-Step GuideAutomate data freshness checks, catch stale or outdated data, and ensure reliable pipelines with Great Expectations (GX).Sep 30A response icon1Sep 30A response icon1
Published inData Engineer ThingsData Pipeline Alerting With Airflow 3.0 Notifiers and Slack: A Step-by-Step GuideAutomate pipeline monitoring, catch failures early, and stay informed with Slack alerts from Airflow 3.0.Aug 27A response icon2Aug 27A response icon2
Published inData Engineer ThingsData Quality with Airflow Circuit Breakers: A Step-by-Step GuideDetect and prevent bad data from causing downstream failures using Airflow’s ShortCircuitOperator.Jun 18A response icon1Jun 18A response icon1
Published inData Engineer ThingsHow To Implement Column Masking in Databricks To Protect Sensitive DataGive Data Analysts the Data They Need — Without Exposing What They Shouldn’t SeeMay 18A response icon1May 18A response icon1
Published inData Engineer ThingsBuild a Streaming Deduplication Pipeline with Kafka, GlassFlow and ClickHouseUse Glassgen to simulate noisy data, Kafka to stream it, and GlassFlow to deduplicate and clean it before storageMay 7A response icon1May 7A response icon1
Published inData Engineer ThingsHow Spark Used Lazy Evaluation for Optimization— Spark seriesWhy Spark’s Laziness is a Good ThingMar 11Mar 11