Stacking Spark DataFrames
While transitioning from NumPy and Panda’s DataFrames to Spark DataFrames one function I could not find a corollary for was
np.stack. I was filtering my dataset and assigning labels of +1 or -1 depending on the filter. After the labels were I applied I then wanted to stack the data vertically so I had one dataset of the positive and negative samples.
positive = df.filter('cost > 0').withColumn('target', F.lit('1')
negative = df.filter('cost = 0').withColumn('target', F.lit('-1')
The solution was the union function.
dataset = positive.union(negative)