Postmortem with a Twix
At Holberton School, we have a project where we write a postmortem on either one of the web debugging projects we’ve had or something else that we encountered. To add a little humor to it, I’ve taken the one I’ve written and set a scene to it:
The servers are down and zombies are trying to eat me.
Current Time: 1440 Zulu (UTC), 25 AUG 17 ….
I am alive. I’ve made it back to home base. Captain Sylvain is grim but determined to read what happened on the day of 24 AUG 17 from 1247 to 1402 UTC. After a few days of rest and recovery, I now present this report:
Incident Report #524
Issue Summary:
The servers are currently down and I, the only known surviving member of Batch 2, am forced to attempt to restart the servers in a panic room located on level 4. Why? Well, the cameras are down and zombies are trying to eat me that’s why. The servers have failed to restart properly and I can hear my fellow zombified batchmates banging around on the surrounding doors. Some just keep repeating “the checker….the checker”. *shudders*
My only comfort are Bobby’s dogs, Hex and Linux. What happened to Bobby, you ask? Well, to cut to the point, he’s now “zombie” Bobby. It was either him or the dogs…I had to make a choice.
Anyways, what am I doing talking about the fallen? My batchmates are trying to find me so I can become one of them; I need to get the servers back up to locate them via the cameras and make it back to home base. Lisa, one of the fallen team leaders, has been bitten….I can hear her screaming about beignets as I frantically try to find the issue.
Timeline (UTC Military Time):
Let’s cut to the chase; here’s a timeline of what happened that day…the day that I would like to forget but I have to write up this report… To be honest, I could be eating homemade Nutella General Guillaume has made, but he said I had to write this report up. Anyways, here is the timeline of the day that I want to forget but can’t:
- 1247: Configuration changes were made and pushed to implement a new user,
Nginx - 1250: Outage begins when the lead engineer, Carrie, notices the servers failed to restart
- 1255: Engineer Carrie stops
web-01and rolls back to the backup configuration file,nginx.conf.bak - 1258: The doors are ripped open and engineer Carrie is bitten before she was able to check the errors.
- 1304: At this time, I was able to see what happened to her and I make a mad dash to the panic room on level 4.
- 1315: I’ve made it to the room, and remembered Carrie was working on the configuration file before she became…zombie Carrie. D:
- 13:20: I run a nginx script to detect if the issue were changes made to the configuration file
- 1325: The nginx script returns an error stating the configuration file’s syntax is incorrect.
- 1330: I reimplement the changes made to the default file under sites-enabled.
- 1335: Syntax was corrected, and I run the nginx script again to recheck syntax errors. No errors returned.
- 1345: I reimplement changes and deem it unnecessary to escalate the issue… but I can’t because everyone has been zombified or they’re dead. Great. Just…GREAT.
- 1356:
web-01restarts successfully and I reimplement changes to serverweb-02 - 1402: Both servers are back up and 100% of traffic is back online (….which at this point is only me… )
Corrective and Preventative Measures
- After local testing, testing should be made on the files itself before implementation via script.
- When changes are made on the
nginx.conffile, the engineer should run the nginx script (service nginx -c /etc/nginx/nginx.conf -t) to check if there were any syntax issues. - Have a backup server for the cameras, or else sacrifices must be made… i.e. Bobby.
The report is finished — I hope this suffices for Captain Sylvain. As I turn in this report, I overhear that it has been over 48hours since our last contact with General Julien. You see, due to low supplies of Twix bars, he made an executive decision to make a lone journey to the Twix bunker. His last contact with the leadership team was over 48hours ago… I fear he has been bitten….
Will General Julien make it back alive? Has he become zombified? Stay tune next time for incident report #525.
