Checking Rollbar service with beyondcode/laravel-self-diagnosis
It happens from time to time that your Laravel app is doing better than ever. You can tell that because you see no exceptions arriving to Rollbar at all.
Well, unless your application has 0 users or you are anything close to a semi-god maybe you can call it a day. Otherwise, if you are a solo developer just like me, you may suspect.
TL;DR, Jump to gist
Earlier this week I realized of having lost reported exceptions at Rollbar during a regular business day at Fimedi NET, after a customer called me for support and I required to go read some logs.
Not even I reached to see any of the info I was looking for, that I met an apocalyptic horde of zombies yelling “Unable to send messages to Rollbaaaaaar API!!! aaahhhh!!!”
Ok, it was more like this:
Dec 9 16:35:25 XXXXXXXX prd.XXXXXXXX[XXXXXXXX]: [2018–12–09 16:35:19] production.ERROR: Unable to send messages to Rollbar API. Produced response:
So I wondered how long has this problem been standing there, just to find out the terrible:
# grep "Unable to send messages to Rollbar" 2018-* | head -n1
2018-10-20-awesomeness.log:Oct 20 20:12:12 XXXXXXXX prd.XXXXXXXX[XXXXXXXX]: [2018-10-20 20:12:37] production.ERROR: Unable to send messages to Rollbar API. Produced response:
# grep "Unable to send messages to Rollbar" 2018-1* -oc
... Let me save you some tears ...
There it was: I have been missing exceptions on Rollbar since 20-Oct on it’s way for a single-malt. At first I didn’t suspect that much, because -indeed- I was taking care of solving standing issues so to reduce the amount of exceptions thrown.
That was easy!
Finding the root cause
So before taking any action, I needed to find out why did this happen in first place, and then I could think of (1) a fix and (2) a preventive countermeasure.
After some tinkering around I met this Rollbar-PHP-Laravel issue on GitHub where I realized that it apparently was related to a package/API breakage. So I checked my
git log and found that the very 19-Oct I also upgraded to Laravel 5.7 and deployed the next day. Everything started to make sense now.
Fixing the broken
The fix was actually quick, just upgraded to
3.x and everything got back to normal. It took me a while to realize that there was a 3.x available (and I know, there is a 4.x as well, but let’s go baby steps for now). UPDATE: The upgrade to
4.0.1 went great.
Adding some automated checks
So I wondered if I could have something like unit tests to make sure that the service was up and running with an established connection so that I could send a ping log message or something. Since unit tests are not meant for that, I remembered this really nice package by beyondcode called laravel-self-diagnosis, and thought I could make a custom check to be run after each deploy (or whenever you feel it’s right).
$rollbar = app()->make(\Rollbar\RollbarLogger::class);
$response = $rollbar->log('info', 'SELF-DIAGNOSIS PING');
$success = $response->wasSuccessful();
After getting it installed, building the check took me just about a few minutes:
- Create a Check called CanReportToRollbar.
- Make sure that I had rollbar channel in logging config.
- Add my custom check in the self-diagnosis config.
However, I spent a great deal of time figuring out why the check would pass in my local and fail on production, even when everything was already fixed and running. The point is that cached config was messing around the RollbarLogger instantiation, and it required an additional care to make sure that config values were ready during the task. This was caused by this bug and got solved by upgrading to 4.x.
I have also created a small unit test to make sure that RollbarLogger is instantiateable, which will save up a bunch of time debugging in case something is wrong with configuration, and thus causing problems with instantiation.
Meeting the awesome
Now I have a simple and automated way to make sure that this service is up and running any time odds are something will get broken.
php artisan self-diagnosis
You should see a ping at Rollbar’s dashboard:
I will leave you the honor to find out how’s the view when the check does not pass.
Note that many things can go wrong and end up on service unavailability. Fortunately we can catch those up on time this way.
I hope this helps to prevent some loss of reported exceptions due to changes on config, API version, package version or occasional loss of connectivity.