https://stocksnap.io/photo/N2BQ8LTZWT

Stacking Spark DataFrames

While transitioning from NumPy and Panda’s DataFrames to Spark DataFrames one function I could not find a corollary for was np.stack. I was filtering my dataset and assigning labels of +1 or -1 depending on the filter. After the labels were I applied I then wanted to stack the data vertically so I had one dataset of the positive and negative samples.

positive = df.filter('cost > 0').withColumn('target', F.lit('1')
negative = df.filter('cost = 0').withColumn('target', F.lit('-1')
# Stack?

Union

The solution was the union function.

dataset = positive.union(negative)