PinnedAmit JoshiData Ingestion from S3 to Snowflake using snow pipeGetting the latest data is essential for further business decisions. For that data ingestion process should be continuous. One of the…Mar 14, 20231Mar 14, 20231
Amit JoshiUnderstanding display() & show() in PySpark DataFramesWhen working with PySpark, you often need to inspect and display the contents of DataFrames for debugging, data exploration, or to monitor…Apr 161Apr 161
Amit JoshiNavigating the Medallion ArchitectureIn today’s data-driven world, organizations increasingly adopt the Medallion Architecture as a powerful approach to managing and processing…Dec 23, 2023Dec 23, 2023
Amit JoshiQuery pyspark dataframe without creating a temp viewYes, you heard it right — you can now query a PySpark DataFrame without the need to create a temporary view. In many scenarios, after…Dec 13, 20231Dec 13, 20231
Amit JoshiRepartition & Coalesce in Apache SparkIn the world of distributed computing, challenges like data spill and data skew often loom large. Data spill occurs when the volume of…Dec 12, 2023Dec 12, 2023
Amit JoshiExploring Basic Terms in Apache AirflowIf you’re taking your first steps into the world of Apache Airflow, you’ve entered a realm of powerful tools for developing, scheduling…Dec 5, 2023Dec 5, 2023
Amit JoshiKusto Query Language (KQL) — A Beginner’s GuideData is the lifeblood of modern decision-making and problem-solving. Whether you’re a data analyst, data engineer, or simply someone…Nov 7, 2023Nov 7, 2023
Amit JoshiA Beginner’s Guide to Prompt Engineering: Maximizing AI Tool’s PotentialArtificial intelligence (AI) tools have become indispensable in our lives, assisting us in various tasks. To harness their full potential…Sep 3, 2023Sep 3, 2023
Amit JoshiHow to Read Excel files using pyspark in Databricks?In one of my recent requirements, I encountered the need to read Excel files using PySpark in Databricks. While reading CSV files is…Sep 2, 20232Sep 2, 20232
Amit JoshiData Partitioning: Organizing Your Data EffectivelyPartitioning involves the practice of breaking down a dataset into smaller, more manageable sections known as partitions. Each partition…Aug 22, 2023Aug 22, 2023