From time to time, I see systems writing their application logs into their SQL database. Usually, the rationale why people are doing this is the fact that a database table can provide a single view, regardless how many application servers are being deployed; the logs table may be fairly easy to query — especially, if there are indices created for the relevant fields; and there is a straight forward option to structure the log format, for example, by having the userId, loglevel, timestamp and some other relevant fields separated from the message itself.
However, is this really a good idea? What is the cost of such a solution?
Let’s have a broader look on this, what are the implications of such a solution on the database performance. Here I assume that the system is under at least a moderate use, and there is a decent amount of logs being written during normal operation.
In a typical scenario, the database usage of an application is read heavy; meaning that there is much more reads compared to writes. Furthermore, most reads are usually hitting a small set of tables, not all of them. Database engines are good at optimizing for such a behavior by keeping the most frequently accessed data in the cache, so they can reduce the necessary disk operations and boost performance. Because of this, it is a good practice to put as much memory into the database server as our operational database, including the index data, so reads can be served from cache, as well as the writes can be cached to optimize the disk access.
Now let’s add db logging into the scene. Most requests hitting your application server will write one or many log entries into the database. This probably means that your log table will become the most accessed data table, and the one where most writes will go into. Even your whole DB may become write heavy, meaning writes will be more frequent that reads. This will eat up a big chunk from the DB cache and OS pagecache too. By having more data cached for writes, the system will have to drop significant amount from the read cache, forcing the DB engine to read it from disk again when it needs it next time. This will increase the amount of disk reads, while writing the log entries will naturally imply increased disk writes. So you pay double costs.
Your logs table will slowly outgrowth the rest of your operational data in your DB, and the bigger half of your memory cache will be lost on the logs table. Also, the logs indices needs to be maintained. Disk I/O will jump 2–3 times higher. Oh, and we did not speak about threads and locks. Logging will very likely happen in parallel DB connections. Let’s say your Java app server runs on 100 threads.. that can easily mean 100 parallel DB connections. Most of them will write the logs table. But parallel writes will not scale well, they need to happen sequentially, so they will try to get the same db lock constantly. This means that the heavy usage of locking will also kill DB performance, and the parallel threads will also introduce delays — imagine that your “normal” select query will need to compete for resources and scheduling with the many parallel DB threads.
The bottom line is that DB logging will slowly kill your operational DB performance; your DB response times will grow, and may become unreliable since they will have to wait for resources — most frequently, disk I/O. Mixing your read heavy operational DB with a write heavy logging is not a good idea. So what could be done?
Keep your operational database for real, operational data. Do not put logs there. There are dedicated solutions for log collection, aggregation and indexing; use those, they will do a much better job for log operations. Pick ELK, or something similar — they will do log collection, aggregation, indexing; will provide search, dashboards and much more you will need to work with your logs.