From time to time, I see systems writing their application logs into their SQL database. Usually, the rationale why people are doing this is the fact that a database table can provide a single view, regardless how many application servers are being deployed; the logs table may be fairly easy to query — especially, if there are indices created for the relevant fields; and there is a straight forward option to structure the log format, for example, by having the userId, loglevel, timestamp and some other relevant fields separated from the message itself.
However, is this really a good idea? What is the cost of such a solution? …
After spending quite a lot of time to work on GDPR compliance, I thought it would be useful to write a short post about what “Personal Data” means for Engineers.
Before looking into the definitions, I would stress 2 things:
Let’s see the law:
‘personal data’ means any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural…
Many blog posts and articles exist today in these topics but I feel most of them is too complicated, not straight to the point and well, in some cases, they are inaccurate and misleading. So I decided to write my own.
Long story short: it says you can have at most two out of Consistency (C), Availability (A) and Partition Tolerance (P) in a distributed environment. “Theorem”, by the way quite misleading as it has been actually proven since it first published a decade ago. I think the 3 main terms C, A and P also requires some clarification: