01#Spark BigData — Introduction

Andre Vianna
My Data Science Journey
Nov 24, 2021

Spark Introduction

Unified engine for large-scale data analytics

Apache Spark is an open-source unified analytics engine for large-scale data processing. Spark provides an interface for programming entire clusters with implicit data parallelism and fault tolerance. Originally developed at the University of California, Berkeley’s AMPLab, the Spark codebase was later donated to the Apache Software Foundation, which has maintained it since.

Big Data Concept

Apache Ecosystem

Big Data Architecture

PySpark

--

--

Andre Vianna
My Data Science Journey

Software Engineer & Data Scientist #ESG #Vision2030 #Blockchain #DataScience #iot #bigdata #analytics #machinelearning #deeplearning #dataviz