At LiveRamp, most of our heavy data processing is done by MapReduce jobs on our Hadoop YARN cluster. Since these jobs are critical to our data processing workflows, one of our top priorities is making sure they run quickly and reliably.
Yesterday, I noticed that one of our systems was using a Lock where a plain old synchronized() block would suffice, and I thought to myself, does this matter? Since the Lock was already fulfilling the same role, the only real question was performance.
tl;dr: When implementing a service or API, if you get a request you don’t quite understand, the kindest thing you can do is to return a noisy error.
Let’s consider an API like:
GET /mySum?num=3&num=42
The LiveRamp Identity Data Science team is excited to share some of our PySpark testing infrastructure in the new open source library mockrdd. This contains the class MockRDD, which mirrors the behavior of PySpark…
Here at LiveRamp, we use make heavy use of Apache Thrift. In some cases, we have Thrift clients in long-running processes. A variety of issues can cause these clients to disconnect, including:
When a developer thinks about monitoring and observability of their production application, two things generally come to mind: metrics and logs. While those are really useful for debugging and monitoring purposes, there is still a critical monitoring element that…
These were the top 10 stories published by LiveRamp Engineering; you can also dive into yearly archives: 2007, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, and 2019.