Healthcare Lessons From Recent Issues at Amazon and GitLab
“This was originally published on the Dash SDK Blog, a blog featuring the latest in healthcare, security, and development practices. Read more at https://blog.dashsdk.com”
GitLab provides a SaaS platform, as well a self-hosted product for managing and hosting git repositories.
In the process of running a few commands, a developer on the GitLab team accidentally deleted a production database directory, resulting in a 300GB loss of user data from its primary database server on January 31st.
Several initial repair attempts failed. Out of the 5 backup/replication techniques deployed none were working reliably, with some not setup in the first place. Eventually GitLab managed to restore services, but not without specific data loss.
GitLab did communicate quickly with the users and was transparent with the issues they faced. You can read the timeline and postmortem communications from GitLab here.
Why Should Healthcare Organizations Care?
Human error can occur at any organization and data loss is always a possibility, therefore it is important that healthcare organizations prepare, test, and perform backup and recovery operations.
Amazon S3 is the cloud storage/object storage platform for Amazon Web Services (AWS). It is used by a large portion of companies working with AWS and known for its scalability and flexibility.
There was a service outage, where one region of S3 (US-EAST-1) became unavailable for approximately 3 hours on February 28th. Amazon S3 is said to be used by around 148,213 websites, so this issue caused widespread issues across the web.
Many websites and services were down and major issues occurred with sites including Imgur, Github, Asana, Docker Hub, and more. You can read Amazon’s summary of the event here.
Why Should Healthcare Organizations Care?
Data availability is a must for healthcare organizations and healthcare vendors. Unexpected downtime can be detrimental to overall patient care. Under HIPAA covered entities are required to create and maintain a Disaster Recovery Plan.
Is This The Death Of Cloud Services?
Simply put, no. Reverting to on-premises hardware or co-located servers does not instantly make companies more secure or less in need of backup, recovery, and failover processes. In the case of the Amazon S3 outage, creating redundancies over several regions would help mitigate outage risks. Dash can help organizations manage secure environments with public cloud platforms.
That said, both cloud platforms and SaaS companies will have to continue to prove themselves going forward. Clients will be interested in what service guarantees, safeguards, and monetary assurances there are in case of an outage or data loss. SaaS companies risk losing customers and reputation, when client data is lost or unavailable.
Tech and Infrastructure Considerations
Incidents at Amazon and GitLab are a reminder that organizations must plan for worst case scenarios. Here are a couple of safeguards to remember:
Test backup and recovery processes: Loss of protected health information (PHI) is a serious issue, and can impact the normal operations of healthcare organizations. It is important that teams create backup processes and test disaster recovery before an incident occurs.
Engineer for failover and availability: Services can become unavailable. Being unable to send an email campaign stinks, being unable to access patient data is a problem. Configuring virtual machines (VMs) across different regions and creating multiple nodes/replica sets for databases helps ensure up-time and availability.
Review SaaS services: Review how SaaS companies interact with your core services and fit into overall availability and emergency procedures.
Organizational Policy Considerations
Technical problems often come from failures within the organization. In the case of GitLab, one wrong command crippled core services. Here are some preventative measures organizations should consider:
Limit access to production environments: Healthcare organizations and vendors should take constant inventory of which staff have access to production environments. Production access should be limited to only necessary individuals and testing and deployment processes should be created and reviewed.
Provide continuous staff training: Staff should be trained in regards to performing backup and recovery operations, as well as managing and interacting with development and production environments.
Not only are these policies HIPAA requirements, but they are good security best practices for any organization.
Technical disasters happen. What matters is that organizations prepare for them and create and test procedures for graceful recovery.