Finding Understanding in Compliance: Journey Towards the 100%
Vulnerability is something we all have to come to terms with, though it can be the hardest thing to face. When we think of protection, a trusty truck, a daily routine, we think of reliability. And when they fail to keep things in our life moving, we often feel betrayed.
How can something that has worked every time fail today, and why does it take so long to get things back in order? This is a frustration present from the early days of computers. Hours, days, years producing something, just to have it fail or be erased in seconds. Just like our data, trust in a product or process is destroyed. But how did it come to this?
Understanding the reality of compliance can be a hard pill to swallow. At first glance, compliance looks as simple as running software on a device as soon as it is available to market or backing up data daily. A checklist is created of all audited devices, and they are one by one marked as compliant. As routine as it seems, there are many factors that prevent that 100% compliance report full of green check marks and devoid of any of those pesky yellow yield signs or the dreaded red X.
Proactive culture is not simply starting a maintenance task that repeats on an interval. The culture needs to go deeper into a full understanding by both risk for the IT management team and the client. With any maintenance operation, there is always a risk of failure on part of the software or non-compliance by an end user to adhere to a risk lowering policy. These two factors need to be realistically understood when formulating any proactive measure towards compliance.
If a person does not reboot or leave a computer on, an alert email might be sent to a supervisor or even the end user. If a process to patch fails, an alert can be sent to the internal IT team to resolve the matter. In either case, the device is now non-complaint and possess an elevated risk. So, should we turn the device off until the problem is fixed or perhaps allow a technician to work on the device during production hours and in between backup cycles while new data is still being written?
The answer in most cases is a resounding no. The risk has to be accepted, but also the consequences of risk. The latter part is always the hardest to face in the case of a failure leading to a loss of productivity. There are many times in IT that we accept risk because it gives us the greatest immediate return in value. Just as in life, there is no true right or wrong, just consequences from risk, mostly unforeseen or misunderstood.
It is the job of the IT professional not just to maintain systems and mitigate risk through automation but to communicate risk in various ways and be creative when risk appears.
In all, the idea of 100% compliance at all times is not always feasible due to underlying factors and even then, is normally momentary like a snapshot. This 100% compliance can only truly be obtained by either freezing the standard for compliance, skewing data via lowering thresholds, or taking a mean average over a long enough period to get near 100% compliance values. Given this information, why even strive for 100% compliance or set that as a goal when it is impossible to obtain?
Going into the future, our goal in maintenance is just that. To understand the futility of that level of compliance and instead try another route for judging compliance. Just as a teacher is graded upon his or her children’s improvement of scores to judge an overall professional standing, we must also move to this model. Both setting a standard of 100% compliance, but understanding that even if reached, this most likely means the standard was altered in a way to make it reachable. Compliance should be gauged as an overall percentile of both meeting multiple thresholds throughout various lengths of time against a snapshot metric of current standing. But, it should also be seen as a progressive increase.
Understanding why devices become non-compliant can most often be found in why devices are not improving in compliance. When I look at a list of progressive data, it is much easier to find devices that are broken for an extended period of time than those where the maintenance agent is currently broken, but is only missing one cycle or two. Severity is also seen as the number of maintenance cycles missing, causing a larger window of non-compliance and higher risk. This risk is also communicated in the same way to the client.
Overall, standardizing of compliance based on vendor patch distribution or product uptime without maintenance failure, must exist in order to enforce some metric of productivity vs dysfunction. However, we must always understand that compliance only exists in relation to overall risk of failure for an IT infrastructure.
In the end, the journey to the 100% becomes more important than obtaining the highest marks on a vendor-created audit report. The journey of each device in its attempt to pull those newly created files conscripted from a vendor becomes the true goal of compliance.