Tagged in

Scala

datamindedbe
datamindedbe
Making Data Delightful
More information
Followers
966
Elsewhere
More, on Medium

Organize your data lake using Lighthouse

Lighthouse is an open source library (using Apache Spark and Scala) that we developed at…


Joining Spark Datasets

Ever wanted to do better than joins on Apache Spark DataFrames? Now you can!

The new Dataset API has brought a new approach to joins. As opposed to DataFrames, it returns a Tuple of the two classes from the left and right Dataset. The function is defined as