Why your logs are important
It’s late and an operator call you to get help from a problem in your app at production. This system was release two days ago and we know this process of put thing in prod take time and we can receive call like this.
Next step is login at servers and try to figure out what happens. First problem: we have a lot of servers! Then you need login and open a “thousand” of terminals (I can use cssh for this).
After, you try to find where this logs live… Find… find … I found!!! Go to directory and make a tail. Job dones. Now, time to read the stream.
This is main project of enterprise and it receive 2k+request/sec . The tail will not help. It’s like a screen of Matrix operator. You need super power to read at the necessary speed.
Time to use a grep — “What i use to grep ? How isolate all request from a specific user? ”
You look at clock. Wow.. It was passed 1 hours for the moment guy called you and now. You have nothing. No clue what’s problem. The manager call you. He ask you what happens and how much time you need to fix this… At this point time is money and your job.
You gave up and you try to simulate using local env and debug tools to catch the bug.
12 hours after the alarm. You found the error and it’s on other API. You need to use a specific header for this request and without an param did not return. Fix in prod, everything is OK again. Let get some sleep.
It’s a true story happen with me. I believe this already happen with almost people reading this post.
First thing I want to highlight here is if I have a good log file, I could find problem more quickly.
Because this I learned how important good log files, log messages, etc are good. If you plan to use microservices logs is more important yet.
Good and reliable prod system have good logs. This is a serious thing and few people give attention for that.
Before you put your app in prod, look at your logs and ask yourself if can extract relevant info from them. Logs need to tell a history about how things go inside.
Consider too, use a platform to aggregate all messages and query them. (Logstash + elastic search + kibanna, for example).
Live your comments bellow. I want to read your opinion about this and some history about going to production.