PinnedVishnu Rao[part1] DuckDb vs Spark on Iceberg 1 Billion NYC taxi ridesI ran a benchmark comparing Spark(local mode) on Iceberg & duckDb for 10 years of Nyc yellow taxi rides.May 12May 12
Vishnu RaoWhat is a Data Angiogram ? How can it help with Data pipeline sanity & Data QualityI recently had the honour to present at Smartdata 2024 on an idea i came up, related to data quality.Sep 10Sep 10
Vishnu RaoWhat is a Data-Angiogram ? How can we assess data pipeline quality with it — Talk Announcement.Hi guys,Aug 19Aug 19
Vishnu RaoHash of column: Apache Spark Hash vs Google Guava hashRecently we had to develop an Api to fetch records for a specified user. Our user base can range from 100k to 5million. Our data pipelines…Jul 10Jul 10
Vishnu Rao[part2] DuckDb vs Spark on Iceberg 1 Billion NYC taxi rides (trying duckDb on iceberg, polars)I had performed a benchmark on my mac earlier in part 1, which was basically Spark on Iceberg (on local filesystem) vs duckDb (on local…May 22May 22
Vishnu RaoMaking GitHub Workflows JoyfulColor definitely adds a bit of pizazz to your life or work flow!May 14May 14
Vishnu RaoWhen should u schedule your monthly maintenance ops in production ?Do you have a automated scheduled MONTHLY maintenance operation in production ?May 12May 12
Vishnu RaoIs Serverless is Server-less ? Is there a Serverless Database around ?When you hear the word ‘Serverless’, you think of Aws lambda or its counterparts in other clouds.Apr 27Apr 27
Vishnu RaoDocker image tag — Linking Jar with your imageTypically your docker images have a versioning system like v.x.z.z (v1.1.2), this makes is little hard to figure out which version of your…Apr 21Apr 21
Vishnu RaoDecade of LayoffsIt started after Covid was winding down, Elon Musk woke up and said to himself :Feb 22Feb 22