Machine Learning in Spark-2: Transformations and Actions on RDDs

Link to part-1 : Understanding Spark and RDDs


Once RDDs are created, 2 types of operations can be performed on them

  1. Transformations (similar to map)
  2. Actions (similar to reduce)
What are transformations ?

Transformations basically means to apply functions on each element of an RDD.

Transformations filter data that matches a certain condition

eg : function to find square values of the data

Map
Filter
What are actions ?

Actions basically means to return the results performed on the elements of the RDDs

eg : return the first element of the RDD

Transformations and actions on RDDs
Python code showing transformations and actions

Example 1:

lines = sc.textFile(“/home/suvir/Documents/SparkFiles/learn-spark-python/data/python_wiki.html”)
lines.first()

Example 2:

nums = sc.parallelize([1, 2, 3, 4])
squared = nums.map(lambda num: num**2)
squared.collect()

Example 3:

lines = sc.parallelize([“hello world”, “hi”])
words = lines.map(lambda line: line.split(“ “))
words.collect()

Example 4:

lines = sc.parallelize([“hello world”, “hi”])
words = lines.flatMap(lambda line: line.split(“ “))
words.collect()
References