Open in app

Sign In

Write

Sign In

Mike Staszel
Mike Staszel

6 Followers

Home

About

May 23

Save Money on Spark Jobs with Karpenter

This is part 1 of a multi-part series on running a modern Spark stack in 2023. A typical modern Spark stack nowadays most likely runs Spark jobs on a Kubernetes cluster, especially for heavy usage. …

Spark

5 min read

Save Money on Spark Jobs with Karpenter
Save Money on Spark Jobs with Karpenter
Spark

5 min read


May 1

Resetting the Blog

Hi there, I’m Mike. 🔭 I’m currently working on big data engineering with Spark on k8s on AWS at iSpot.tv. 🌱 I’m focusing on mentoring and coaching my team to improve their skills and release awesome products. 🌎 I occasionally write blog posts about software engineering and other topics. Management and Software Engineering I…

Software Engineering

2 min read

Resetting the Blog
Resetting the Blog
Software Engineering

2 min read


Mar 14

S3A on Spark 3.3 in 2023

Updating my post from almost 3 years ago! The world has moved on to Spark 3.3, and so have the necessary JARs you will need to access S3 from Spark. Run these commands to download JARs for Spark 3.3.2: wget https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk-bundle/1.12.426/aws-java-sdk-bundle-1.12.426.jar -P $SPARK_HOME/jars/ wget https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/3.3.2/hadoop-aws-3.3.2.jar -P $SPARK_HOME/jars/

1 min read

1 min read


Feb 3

Spark to Google Cloud Storage

This is the last post in my series of how to connect Spark to various data sources. Here is how to connect to Google Cloud Storage using Spark 3.x. First, create a service account in GCP, then download the JSON key file. Save it somewhere secure (e.g. import it into Vault or Secrets Manager). Grant that service account permission to read and write to the bucket. Grab the Hadoop 3.x JAR (gcs-connector-hadoop3-latest.jar) from Google.

1 min read

1 min read


Jan 29

Spark to Azure Data Lake Storage Gen1

This is another quick post for how to connect Spark to various platforms. I used Azure Data Lake Storage on a project in the past and had a tough time figuring out what to do (there are huge differences between Azure Blob Storage, Azure Data Lake Gen1, and Azure Data Lake Gen2). This guide assumes that you have a client_id, tenant_id, and client_secret from Azure. Code Example # Acquire these JARs from Maven: # azure-data-lake-store-sdk-2.3.10.jar # hadoop-azure-datalake-3.2.3.jar # wildfly-openssl-1.0.7.Final.jar # place them in $SPARK_HOME/jars/ spark = SparkSession.builder.getOrCreate() tenant_id = "some-identifier-here" client_id = "some-identifier-here" client_secret = "super-top-secret-here" spark.conf.set("fs.adl.account.auth.type", "OAuth") spark.conf.set("fs.adl.oauth2.refresh.url", f"https://login.microsoftonline.com/{tenant_id}/oauth2/token") spark.conf.set("fs.adl.oauth2.client.id", client_id) spark.conf.set("fs.adl.oauth2.credential", client_secret) # That's all there is to it: df = spark.read.parquet("adl://something.azuredatalakestore.net/folder/")

1 min read

1 min read


Dec 15, 2022

Why I Got a Master’s Degree

I recently graduated from Georgia Tech’s Online Master’s in Computer Science program! I’m now taking some time to reflect on my experience. What was the program like? Georgia Tech’s program is a 10-course (30 credit hour) Master’s degree in Computer Science. It’s fully online and follows a traditional academic structure with lectures, office hours, homework…

Computer Science

3 min read

Computer Science

3 min read


Aug 19, 2022

CloudFormation Wishlist

This post originally started as a post about Terraform, but I decided to break that out into a separate post. It turned out that I had a wishlist for improvements I’d like to see in CloudFormation. I’ve been using CloudFormation for years and have been pushing teams I work with…

3 min read

3 min read


Aug 13, 2022

Cross-Posting to Medium

I love seeing when something I wrote helps someone. Someone sent me an email years ago thanking me for writing about some obscure bug or problem I solved and blogged about, and I remember it to this day! Medium is a popular blogging platform that a lot of software engineers…

2 min read

Cross-Posting to Medium
Cross-Posting to Medium

2 min read


Aug 7, 2022

Resetting the Blog

I’m giving myself a new personal challenge — to write at least one blog post per week, for the rest of 2022. These posts might be poor quality, but they’ll be better than not posting at all. The motivation behind the challenge is to write more. I don’t have any topics or posts ready to go or pre-written. I’ll have to think of something. I’m not sure if this posts counts as one of the first weekly posts.

1 min read

1 min read


Apr 21, 2021

M1 Mac + Logitech Mouse + Logi Options

This will be a quick post because I’m sure Logitech will fix this eventually (or maybe I’m the only person with this problem). I have a Logitech M705 mouse I use extensively with my M1 Mac Mini. I configured one of the side buttons to launch Mission Control (to see…

1 min read

M1 Mac + Logitech Mouse + Logi Options
M1 Mac + Logitech Mouse + Logi Options

1 min read

Mike Staszel

Mike Staszel

6 Followers

Engineering manager, builder, writer.

Following
  • The Pragmatic Programmers

    The Pragmatic Programmers

  • David Lee

    David Lee

  • Ashutosh Kumar

    Ashutosh Kumar

  • Solana

    Solana

  • SANG WON OH

    SANG WON OH

See all (12)

Help

Status

Writers

Blog

Careers

Privacy

Terms

About

Text to speech

Teams