MySql is one of the most pervasive databases which is both relational and highly performant. It is one of the major offering by Amazon Web Services under its RDS managed databases suite. Its open source nature have created a huge fan base and promoted community support evolving the product over its long standing years of existence.
In any enterprise one of the most crucial capability is its ability to track changes in the different critical data on which they might running on. This serves as a very handy tool in case of issues arising due change in certain data which otherwise can be a gargantuan task to debug. This is where Mysql’s general logs stored under the table mysql.general_log can do wonders in providing insights about access patterns and change history to the critical data stored in the database instance.
However, for any moderate size business the number of transactions can be considerably high and the ability to retain all the changes done to the data can be a challenge inside the MySql instance itself for an extended time period where it can be used for debugging/investigation. Meaning that the table mysql.general_log can get quickly bloated and based on retention policy, which may not be long, MySql will purge any old data. As such we need an alternate way to store this data so that it can be efficiently accessed of a longer time epoch. In this post I am presenting a simple way to do this and access it leveraging Amazon’s Athena service.
The steps required to achieve can summarized broadly below:
1. Read periodically from the mysql.general_log table and store the records in S3 as a CSV or some structured format understood by AWS Athena.
2. This data is structured. Create an Athena table over this S3 bucket.
3. Run Athena queries on top this table to view access patterns and change histories of the tables and data in the MySql instance.
S3 storage is cheaper compared to the storage overhead on the MySql instance and as such the data can be retained for longer periods without any issues.
The necessary code and project is open sourced and licensed under GNU-GPL3 and can be found under Github below. Please follow the instructions in the README.md to set it up and preserve your database’s access patterns with a longer retention in a query-able form.