Actions and Transformation:

Apache Spark Transformation is a function that produces new RDD from the existing RDDs but when we want to work with the actual dataset, at that point action is performed. When the action is triggered after the result, new RDD is not formed like transformation.

Transformations are lazy in nature i.e., they get execute when we call an action. They are not executed immediately. Two most basic type of transformations is a map (), filter().

After the transformation, the resultant RDD is always different from its parent RDD. It can be smaller (e.g. filter, count, distinct, sample), bigger (e.g. flat Map, union, Cartesian) or the same size (e.g. map).

There are two types of transformations:

· Narrow transformation — In Narrow transformation, all the elements that are required to compute the records in single partition live in the single partition of parent RDD. A limited subset of partition is used to calculate the result. Narrow transformations are the result of map(), filter().

· Wide transformation — In wide transformation, all the elements that are required to compute the records in the single partition may live in many partitions of parent RDD. The partition may live in many partitions of parent RDD. Wide transformations are the result of groupbyKey and reducebyKey.

Actions : They are the certain operations that will return a final value to the driver program or write data to an external storage system. Actions performed will force the evaluation of the transformations required for the RDD they were called on, since they need to actually produce output as those are required.

Examples:

>>> baby_names = sc.textFile(“baby_names.csv”)

rows = baby_names.map(lambda line: line.split(“,”))

sc.parallelize([2, 3, 4]).flatMap(lambda x: [x,x,x]).collect()

Output : [2, 2, 2, 3, 3, 3, 4, 4, 4]

>>> rows.filter(lambda line: “MICHAEL” in line).collect()

Output : [[u’2013', u’MICHAEL’, u’QUEENS’, u’M’, u’155'],

[u’2013', u’MICHAEL’, u’KINGS’, u’M’, u’146'],

[u’2013', u’MICHAEL’, u’SUFFOLK’, u’M’, u’142']

Reference Links:

http://www.edupristine.com/blog/apache-spark-rdd-transformations-actions

https://people.csail.mit.edu/matei/papers/2015/sigmod_spark_sql.pdf

https://www.usenix.org/legacy/event/hotcloud10/tech/full_papers/Zaharia.pdf

Show your support

Clapping shows how much you appreciated Aishwarya’s story.