An Incident Response Plan for Startups

At some point your company will have to deal with a security incident. Let’s write an incident response plan that will help organize the chaos of incident response ahead of time.

There are 9 sections to write. A boilerplate is here, in example form, if you’d like to see it all at once.

Then, put it on your company wiki where it can be found by someone who is suddenly responding to an incident.

1. Escalation

Assume that any employee may discover an incident and may need to sound an alarm.

I had shivers just typing this example out

Have a very simple and familiar location for your employees to surface an issue. An email address, slack channel, or phone number is fine. It just need to be extremely reliable and well socialized.

Email to panic@company.com or a message to #panic should be used to notify the security team of run-of-the mill issues. Be a good witness. Behave as if you were reporting a crime and include lots of specific details about what you have discovered.

Also, provide some options for direct escalation. If the employee is under the impression that an executive stealing from the company, would you want them to escalate to a wide group? For this reason, it may make most sense for the employee to directly reach out to a contact instead of a highly visible escalation point. We’ll sort out a table of contacts later.

2. Severity

Your team must be well calibrated on what a small or large incident is. This decides how far up a leadership chain an incident should be communicated, and when external resources should be brought in to assist as well.

It may also be important to identify specific systems or processes that will require certain actions to be taken. For instance, user data, passwords, or credit cards.

Low and Medium Severity
This is for most unconfirmed issues that simply require some investigation. While important to investigate, it may not be critical to wake up a group of people or page someone.

Issues meeting this severity are simply suspicions or odd behaviors. They are not verified and require further investigation. There is no clear indicator that systems have tangible risk and do not require emergency response. This includes suspicious emails, outages, strange activity on a laptop.
For these issues, please ping someone on #security.

High Severity
Do not treat something High (or critical) severity until it is absolutely confirmed as a legitimate issue.

High issues should wake incident responders up and pull them off of other commitments. This level of severity is when malicious activity is found in an area of risk are found or excessive vulnerability is discovered that could easily be exploited.

High severity issues relate to problems where an adversary or active exploitation hasn’t been proven yet, and may not have happened, but likely to happen. This may include vulnerabilities with direct risk of exploitation, threats with risk or adversarial persistence on our systems (eg: backdoors, malware), malicious access of business data (eg: passwords, vulnerability data, payments information), or threats that put any individual at risk of physical harm.
High severity issues should include an email to panic@company.com with “Urgent” in the subject line, or a message to #security with “@channel incident” in the message to alert incident responders.

This severity level should not include actual damages, which we’ll reserve for the next category of severity. This is to avoid contacting outside parties or external resources.

As far as urgency, “High” and “Critical” are equal.

Critical Severity
The bad guys were successful and something was lost. The bad guys achieved an objective and observable harm was realized.

Executive leadership at this point needs to be involved.

Critical issues relate to actively exploited risks and involve a malicious actor. Identification of active exploitation is critical to this severity category.
Critical severity issues should involve an “@all” message to “@channel” in #security as well as messages to the CEO and CTO, COO, and PR. Continue escalation until you receive acknowledgement. Involvement of a crisis lead for public relations, a lawyer familiar with breach notification, and a “heads up” to our consultant response partners are highly recommended.

3. Internal Issues

Depending on how you’ll want to handle internal whistleblowing, you may want to have instructions for an employee to approach their manager, CEO, or HR.

Issues where the malicious actor is an internal employee, contractor, vendor, or partner requires sensitive handling. Please contact the CEO and CTO directly and do not discuss with other employees. These are critical issues and must be pushed to follow up.

4. Compromised Communications

If at any point the team considers their phone, laptop, email, SMS, or other means of communications to be compromised — they should practice good judgement about what is transmitted, and and make a best effort to communicate “out of band” of any adversarial eavesdropping. Wickr is encrypted end-to-end and designed around ephemeral messaging, making it great to standardize on during incident response.

If there are IT communication risks, the San Francisco team will announce an out of band solution within the office, and will communicate this to managers with directions over cell phones.
Incident responders must have Wickr messaging arranged before listing themselves as incident members on the wiki.

5. Response Steps

This section would largely be pulled from Security Breach 101. This is a meeting format that centers around an incident so responders can pivot on a constantly updated source of truth.1

To expand it further, you can pre-assign “Directly Responsible Individuals” to common incident areas. For instance, the “Breach Blog Post DRI”, or the “Evidence Collection DRI”. This prevents a single individual from sidetracking an entire response meeting with their own sense of urgency and needs.

For critical issues, the response team will follow an iterative response process designed to investigate, contain exploitation, remediate our vulnerability, and document a post-mortem with the lessons of an incident.
We will Update a Breach Timeline with all known temporal data related to the incident. All Indicators of Compromise will be updated and shared among breach responders. The group will add new knowns and unknowns to the Investigative Q&A. A list of tactical Emergency Mitigations will be updated. A list of long term, post breach Long Term Mitigations will be updated. Once items related to response are covered, technical responders may leave the meeting and meta-topics (Everything Else) related to the breach are discussed (communications, legal issues, blog posts) with leadership.

6. Team Members

This is a table of names, Email, Phone Number, Wickr, PGP, for anyone in the escalation path. The important characteristics of a team member:

  • They are pre calibrated on risk. They all know what a big or little incident is. They have a high tolerance to panic.
  • They are listening. They are included in all communication channels where an incident may be escalated.
  • They have secure communications. Assume your email and slack will be breached and you want to avoid a whole fiasco of signing people up for another secure messaging platform.

7. External Resources

This is similarly a table of external resources that you’ll have to include during escalation. Beyond the standard forensic contacts, you’ll want law enforcement, legal, breach notification, and crisis communication contacts.

These are the last things you want to be caught asking around for during a breach. It’s not much work to have them ready ahead of time.

8. Runbooks

This is where you’d create and reference “manuals” for specific mitigation steps that are likely to come up. Perhaps it’s changing the corporate Twitter account password, or investigating fraud in a piece of financial software, or reviewing recent deploy activity by an engineer… it goes on and on, but is entirely dependent on your own risks.

Runbooks are where incident response plans at mature companies really shine, but these are not as important as the overall coordination of all responders.

9. Required Retrospective

Choose a minimum severity level, and require the team to perform a retrospective of these incidents to improve from it and never suffer from them again.

All critical issues require a “post mortem” conversation including the CEO and CTO, along with all response actors. The response lead is accountable for organizing all available information before this conversation.

Conclusion

Your employees need a place to direct their panic. They need to be encouraged to provide good information until a larger group can take the lead. After this, the incident responders need to know what resources are available to them, and to move quickly to help their co-workers and customers as soon as possible. They also need to enforce a feedback loop they can improve from.


@magoo

I’m a security guy, former Facebook, Coinbase, and currently an advisor and consultant for a handful of startups. Incident Response and security team building is generally my thing, but I’m mostly all over the place.