One of the biggest problems when developing Big Data applications is figuring out whether or not your components will interact nicely with each other.
Integration testing usually requires setting up a staging environment in the cloud, possibly duplicating the production environment. This is not only expensive, but also cumbersome for developing: I’d much rather have everything set up locally, use my favorite editor and not bother my DevOps team with setting stuff up, granting access, and all that jazz.
The solution seems obvious at first: build a docker environment that you can use to run tests locally!
This, however, presents several…
Here at Jampp we process and analyze large amounts of data. One of the tools we employ to do so is PrestoDB, which is a “Distributed SQL Query Engine for Big Data”. Presto comes with many native functions, which are usually enough for most use cases. Nevertheless, sometimes you need to implement your own function for a very specific use.
Enter the User Defined Functions (UDFs, for short). Writing one for the first time is not as straightforward as it may appear, mainly because the information to do so is very scattered around the web (and across many Presto versions).
In this blogpost, we present our JSON_SUM function, how we wrote it, and some of the lessons we learned along the way. …