Several years ago, I wrote the post Friends Don’t Let Friends Use JSON (in their…
In my last post, I developed a data pipeline to aggregate CloudTrail log files. When…
As I’ve written in the past, large numbers of small files make for an inefficient…
Following my post about the Chariot Data Engineering interview, I received some comments along…
Hiring good candidates is difficult. After nearly 40 years in this business, and…
My last few posts have focused on Redshift and Athena, two specialized tools for managing and querying Big Data…
Execution plans are one of the primary tools to optimize your database queries, but they…
I first experienced unbalanced data in a data warehouse thirty years ago. I was working for a mutual fund…
Amazon Athena is a service that lets you run SQL queries against structured data files stored in S3. It takes…