Task Stats at Last

Reed Allman
Iron.io Technical Blog
2 min readMar 7, 2017

For a long time our users have gotten the ominous OOM from time to time and always wanted to be able to see a chart of the memory usage inside of the task. Ask and you shall receive, we are proud to [finally] announce task stats for every task for every user. In addition to memory usage, we have charts of the CPU usage, network usage, and disk i/o. When you open up the HUD under the Worker tab for a certain code package the page will now have a chart of the CPU and RAM overview for each task:

And when clicking into a certain task you can see more stats, like the disk and network, and some additional stats for each:

This feature currently is the exact output of running ‘docker stats’ for a task container, but we are storing a histogram of points per second for a task. If the task exceeds 4 minutes then metrics will be averaged to down sample them to a longer interval for space savings (fun fact: across all users the average task time is 30s). This is one of the perks of using the docker HTTP API as we had issues trying to get this feature working by just parsing the CLI output, as it was very hard on docker and now we are able to just stream the results over HTTP, enabled by our new go runner that is now rolled out to almost every customer.

This should help users observe their task resource usage to see if they might be CPU, net or disk bound and allow us to help customers get on infrastructure to optimize their workloads. And best of all those pesky OOMs should no longer be a mystery. We’re hoping to offer some kind of callback API for these stats as well so that users can get these stats into their stats infra. If you are super interested in this feature, please shoot us a mail at support[at]iron.io so that we can try to get support for your environment first (statsd?)!

--

--

Reed Allman
Iron.io Technical Blog

Hopelessly trying not to tie my identity to my job title