The most insightful stories about Spark

Spark

Topic

2.7K Followers

9K Stories

Recommended stories

Raj Patel
Load data from GCS to BigTable with GCP Dataproc Serverless
Recently, I have a need to transfer data from Google Cloud Storage (GCS) into Bigtable by utilizing Dataproc Serverless Spark. The…
3d ago
Eduard Popa
in
Data Engineer Things
A Practitioner’s Guide to Developing Data Engineering Solutions with Databricks
Development Approaches, Environments, CI/CD and Testing with Databricks
Jul 26
7
Yingjun Wu
Kafka Has Reached a Turning PointIs Kafka still relevant in today’s evolving tech landscape? And where is Kafka headed in the future?
Sep 23
8
Sep 23
8
Christopher Shehu
Unlocking Big Data Potential: How PySpark Surpasses Classic SQLNote: This guide will cover both PySpark (Python-based API) and Spark SQL to showcase examples of both approaches and their respective…
4d ago
4d ago
George Zefkilis
in
Data Engineer Things
Building a Local Data Lake from scratch with MinIO, Iceberg, Spark, StarRocks, Mage, and DockerHello again, fellow technology enthusiasts! I am a software/data engineer who transitioned from data science. The learning curve in this…
Jul 13
6
Jul 13
6

Load data from GCS to BigTable with GCP Dataproc Serverless

Raj Patel

Load data from GCS to BigTable with GCP Dataproc Serverless

Recently, I have a need to transfer data from Google Cloud Storage (GCS) into Bigtable by utilizing Dataproc Serverless Spark. The…

3d ago

A Practitioner’s Guide to Developing Data Engineering Solutions with Databricks

Eduard Popa
in
Data Engineer Things

A Practitioner’s Guide to Developing Data Engineering Solutions with Databricks

Development Approaches, Environments, CI/CD and Testing with Databricks

Jul 26

Yingjun Wu

Kafka Has Reached a Turning Point

Is Kafka still relevant in today’s evolving tech landscape? And where is Kafka headed in the future?

Sep 23

Unlocking Big Data Potential: How PySpark Surpasses Classic SQL

Christopher Shehu

Unlocking Big Data Potential: How PySpark Surpasses Classic SQL

Note: This guide will cover both PySpark (Python-based API) and Spark SQL to showcase examples of both approaches and their respective…

4d ago

George Zefkilis
in
Data Engineer Things

Building a Local Data Lake from scratch with MinIO, Iceberg, Spark, StarRocks, Mage, and Docker

Hello again, fellow technology enthusiasts! I am a software/data engineer who transitioned from data science. The learning curve in this…

Jul 13

Tuning Spark Optimization: A Guide to Efficiently Processing 1 TB Data

Naveen Kumar

Tuning Spark Optimization: A Guide to Efficiently Processing 1 TB Data

The aim of this article is to provide a practical guide on how to tune Spark for optimal performance, focusing on partitioning strategy…

Oct 3

Rindhuja Treesa Johnson
in
Towards Data Science

Apache Hadoop and Apache Spark for Big Data Analysis

A complete guide to big data analysis using Apache Hadoop (HDFS) and PySpark library in Python on game reviews on the Steam gaming…

May 8

Apache Iceberg: Built for Big Data, Ready for Small?

Yingjun Wu

Apache Iceberg: Built for Big Data, Ready for Small?

Originally built for massive data lakes, Apache Iceberg is catching the attention of small teams. But can it really fit?

Sep 26

See more recommended stories