Machine Learning in Spark-2: Transformations and Actions on RDDs

Link to part-1 : Understanding Spark and RDDs

Once RDDs are created, 2 types of operations can be performed on them

  1. Transformations (similar to map)
  2. Actions (similar to reduce)
What are transformations ?

Transformations basically means to apply functions on each element of an RDD.

Transformations filter data that matches a certain condition

eg : function to find square values of the data

What are actions ?

Actions basically means to return the results performed on the elements of the RDDs

eg : return the first element of the RDD

Transformations and actions on RDDs
Python code showing transformations and actions

Example 1:

lines = sc.textFile(“/home/suvir/Documents/SparkFiles/learn-spark-python/data/python_wiki.html”)

Example 2:

nums = sc.parallelize([1, 2, 3, 4])
squared = num: num**2)

Example 3:

lines = sc.parallelize([“hello world”, “hi”])
words = line: line.split(“ “))

Example 4:

lines = sc.parallelize([“hello world”, “hi”])
words = lines.flatMap(lambda line: line.split(“ “))