Reading Apache Beam Programming Guide — 4. Transforms (Part 2)

Chengzhi Zhao
Data Engineering Space
5 min readAug 5, 2019

--

We have discussed Transforms Part 1 in the previous blog post,

To continue our discussion about Core Beam Transforms, we are going to focus these three transforms:Combine, Flatten, Partition this time.

Core Beam transforms

We are going to continue to use the Marvel dataset to get stream data.

4. Combine

Combine is a Beam transform for combining collections of elements or values in your data.

The use of combine is to perform “reduce” like functionality. There are numeric combination operations such as sum, min, and max already provide by Beam, if you need to write some complex logic, you would need to extend the classCombineFn .

Let’s try a simple example with Combine. If we want to sum the average players’ SkillRate per fight, we can do something very straightforward.

SumInts

static class SumDoubles implements SerializableFunction<Iterable<Double>, Double> {
@Override
public Double apply(Iterable<Double> fights) {
double sum = 0.0;
for (Double fightPoint: fights){
sum += fightPoint;
}
return sum;
}
}

You can apply it by calling the following. Since we need to write out using custom windowing, since this is a non-global windowing function, we need to call .withoutDefaults() explicitly.

--

--

Chengzhi Zhao
Data Engineering Space

Data Engineer | Data Content Creator | Contributor of Airflow, Flink | Blog chengzhizhao.com