Left Semi Join in dataset spark Java.

Arun Kumar Gupta
2 min readMay 23, 2020

--

A left semi join returns that all rows from the first dataset which do have a match in the second dataset.
This is like an inner join, with only the left dataset columns and values are selected.

Also find video link : https://youtu.be/g1brXIJL3Cw

Example with code:

/*Read data from Employee.csv */
Dataset<Row> employee = sparkSession.read().option(“header”, “true”)
.csv(“C:\\Users\\Desktop\\Spark\\Employee.csv”);
employee.show();

/*Read data from Employee1.csv */
Dataset<Row> employee1 = sparkSession.read().option(“header”, “true”)
.csv(“C:\\Users\\Desktop\\Spark\\Employee1.csv”);
employee1.show();

/*Apply left semi join*/
Dataset<Row> leftSemiJoin = employee.join(employee1, employee.col(“name”).equalTo(employee1.col(“name”)), “leftsemi”);

leftSemiJoin.show();

Output:

1) Employee dataset
+ — — — -+ — — — — + — — — -+
| name| address| salary|
+ — — — -+ — — — — + — — — -+
| Arun| Indore| 500|
|Shubham| Indore| 1000|
| Mukesh|Hariyana| 10000|
| Kanha| Bhopal| 100000|
| Nandan|Jabalpur|1000000|
| Raju| Rohtak|1000000|
+ — — — -+ — — — — + — — — -+

2) Employee1 dataset
+ — — — -+ — — — — + — — — +
| name| address|salary|
+ — — — -+ — — — — + — — — +
| Arun| Indore| 600|
|Shubham| Indore| 2000|
| Mukesh|Hariyana| 40000|
+ — — — -+ — — — — + — — — +

3) Applied leftsemi join and final data
+ — — — -+ — — — — + — — — +
| name| address|salary|
+ — — — -+ — — — — + — — — +
| Arun | Indore| 500|
|Shubham| Indore| 1000|
| Mukesh|Hariyana| 10000|
+ — — — -+ — — — — + — — — +

--

--

Arun Kumar Gupta

Senior Software Engineer (Big Data | Spark | Java) Developer