Left Semi Join in dataset spark Java.
A left semi join returns that all rows from the first dataset which do have a match in the second dataset.
This is like an inner join, with only the left dataset columns and values are selected.
Also find video link : https://youtu.be/g1brXIJL3Cw
Example with code:
/*Read data from Employee.csv */
Dataset<Row> employee = sparkSession.read().option(“header”, “true”)
.csv(“C:\\Users\\Desktop\\Spark\\Employee.csv”);
employee.show();
/*Read data from Employee1.csv */
Dataset<Row> employee1 = sparkSession.read().option(“header”, “true”)
.csv(“C:\\Users\\Desktop\\Spark\\Employee1.csv”);
employee1.show();
/*Apply left semi join*/
Dataset<Row> leftSemiJoin = employee.join(employee1, employee.col(“name”).equalTo(employee1.col(“name”)), “leftsemi”);
leftSemiJoin.show();
Output:
1) Employee dataset
+ — — — -+ — — — — + — — — -+
| name| address| salary|
+ — — — -+ — — — — + — — — -+
| Arun| Indore| 500|
|Shubham| Indore| 1000|
| Mukesh|Hariyana| 10000|
| Kanha| Bhopal| 100000|
| Nandan|Jabalpur|1000000|
| Raju| Rohtak|1000000|
+ — — — -+ — — — — + — — — -+
2) Employee1 dataset
+ — — — -+ — — — — + — — — +
| name| address|salary|
+ — — — -+ — — — — + — — — +
| Arun| Indore| 600|
|Shubham| Indore| 2000|
| Mukesh|Hariyana| 40000|
+ — — — -+ — — — — + — — — +
3) Applied leftsemi join and final data
+ — — — -+ — — — — + — — — +
| name| address|salary|
+ — — — -+ — — — — + — — — +
| Arun | Indore| 500|
|Shubham| Indore| 1000|
| Mukesh|Hariyana| 10000|
+ — — — -+ — — — — + — — — +