Sitemap
Data Science Collective

Advice, insights, and ideas from the Medium data science community

Member-only story

Getting Started with Apache Spark: Easy Installation on Windows and Mac

6 min readApr 27, 2025

--

Image by author.

As Data scientists, we are ever-growingly required to stay acquainted with powerful tools and software that make data processing seamless.

Data science and the technology space as a whole evolve continuously. New developments, methods, models, and software are being introduced to increase the proficiency of data science professionals and guide data science beginners.

MIT Sloan Management Review shows that approximately 80% of data and technology leaders are using or considering data products that integrate analytics and AI capabilities.

With the rise of big data, the need for powerful software and tools cannot be overemphasized.

Last year, while interning at a fintech company, I was assigned a project to analyze transaction data from thousands of customers to detect unusual spending patterns — essentially a fraud detection prototype.

I started using Python with Pandas, but everything slowed as the data ballooned to millions of rows from multiple sources. Scripts crashed, memory overflowed, and simple aggregations took forever.

That’s when I turned to Apache Spark.

--

--

Data Science Collective
Data Science Collective

Published in Data Science Collective

Advice, insights, and ideas from the Medium data science community

Benjamin Nweke
Benjamin Nweke

Written by Benjamin Nweke

I value your time. Data scientist, positivity therapist, and mental health advocate. Works in: The Startup, Mindcafe, Towards data science, and others.