Patch management might seem like an unnecessary process, until the very first patch deployment goes wrong. When your production environment is affected, the downtime is critical for the whole company. And it becomes even more critical if your MSP clients are affected and your SLA obligations and your reputation are at stake. This is why you should create a thorough patch management program. And you typically start by documenting a step-by-step workflow.
In this guide, we will overview the risks of bad documentation of patch management and the key points that you should keep in mind to create a system that works.
Risks of Poor Documentation of Patch Management
As we've already said, poor documentation of patch management can lead to serious process hiccups, and even to downtime for your production environment. Here are three main dangers:
- Downtime due to lack of tests. It is quite obvious that patches downloaded from untrusted sources are not the best and the most secure practice. And yet, even official patches can easily crash or even brick your systems. Thus, you should create a sandbox environment that replicates each of your essential systems to test your patches before rolling them out.
- Downtime due to bad timing. Patching of the production environment is sensitive and often time-consuming, even if patches have been properly tested. At the least, you will need to reboot the system in question. And rebooting a production server during working hours is not the best idea, although that isn’t to say that production servers can’t be rebooted without issues. Nevertheless, you should plan the timing of your patch carefully.
- Missed critical patches. Some patches are more critical than others. And a single non-patched system can be a disruption, for example, for a company as big as Maersk (as we should all remember from that devastating ransomware attack). Thus, you should carefully monitor all patches and filter those that are critical.
Patch Management Program: Pillars to Document
Here are the key points that you need to document in order to create a solid patch management process:
- Inventory. First of all, you should create a list of all the devices that you need to patch. Here you should also categorize them from the most critical to the least, to know in which order you should apply the patches. If you know that the patching of any of your devices will lead to downtime or disruptions for others, note that as well.
- Patch classification. Here, you should classify your patches according to how critical they are for the given piece of infrastructure. The typical classification is based on two levels of urgency: urgent (patches that can significantly improve security or performance of your systems) and normal (routine patching releases).
- Patch testing process. This is the biggest and most important process in your patch management documentation. You should carefully describe your sandbox environment for your patch rollout tests, actions that should be done to make sure that the tests were successful, and actions if these fail. Make sure that you carefully document each step of the procedure, and provide the location of the testing environment and the credentials if needed.
- Patch deployment process. This part should overview the actual patching process of your production environment. As in the previous step, you should document each part of the process and also add a member of your team who will be responsible for reviewing the results, once the patching is finished.
- Patch deployment timeline. As we mentioned earlier, your systems might cause downtime during patching. This is why you should choose time frames for the process that don’t affect any of your critical workloads.
- Patch installation report. Once you've finished, it is advisable to generate a patch installation report of the results of the process. You should note any vulnerabilities that were fixed, any new ones, if they appeared, and any peculiarities of the system you noticed during the patching process. This might help you to patch the system faster next time.
- Backup and recovery plans. Lastly, if your patching process goes wrong, you might brick the device in question. If that device is a production server, you should recover it as fast as possible to the exact same state it was right before you started patching. So you need to make sure that your backups are done regularly, and that your recovery procedures are tested.
Further reading Patch Management Best Practices and Essentials
How Can Patch Documentation Enhance Your SLA?
A service level agreement is a document between you and your client that sets the right expectations of how you provide your services. In some cases, it might include uptime KPIs, such as recovery time and recovery point objectives. If those objectives are not met, your client might try to sue you. And, even if they don’t, constant failures will lead to loss of reputation or lost clientele.
As we've already mentioned, bad documentation of patching will eventually lead to downtime. As a result, you want to build and maintain thorough processes. And the key to being thorough in something that is complex is to document every single step you are going to do.
To put it in a nutshell, your clients’ trust and your reputation are based on how well you stick to your SLA. And it’s almost impossible to do this if your processes are not well organized.