An incident response playbook is a predefined set of deportment to write a specific security incident such as malware infection, violation of security policies, DDoS attack, etc. Its main goal is to enable a large enterprise security team to respond to cyberattacks in a timely and constructive manner. Such playbooks help optimize the SOC processes, and are a major step forward to SOC maturity, but can be challenging for a visitor to develop. In this article, I want to share some insights on how to create the (almost) perfect playbook.
Imagine your visitor is under a phishing wade — the most worldwide wade type. How many and what word-for-word deportment should the incident response team take to prorogue the attack? The first steps would be to find if an traducer is present and how the infrastructure had been penetrated (whether though an infected zipper or a compromised worth using a fake website). Next, we want to investigate what is going on within the incident (whether the traducer persists using scheduled tasks or startup scripts) and execute containment measures to mitigate risks and reduce the forfeiture caused by the attack. All these have to be washed-up in a prompt, calculated and precise manner—with the precision of a chess grandmaster — considering the stakes are upper when it comes to technological interruptions, data leaks, reputational or financial losses.
Why defining your workflow is a vital prestage of playbook development
Depending on organization, the incident response process will subsume variegated phases. I will consider one of the most widespread NIST incident response life cycles relevant for most of the large industries — from oil and gas to the automotive sector.
The scheme includes four phases:
- preparation,
- detection and analysis,
- containment, eradication, and recovery,
- post-incident activity.
All the NIST cycles (or any other incident response workflows) can be wrenched lanugo into “action blocks”. In turn, the latter can be combined depending on specific wade for a timely and efficient response. Every “action” is a simple instruction that an reviewer or an streamlined script must follow in specimen of an attack. At Kaspersky, we describe an whoopee as a towers woodcut of the form: <a subject> does <an action> on <an object> using <a tool>. This towers woodcut describes how a response team or an reviewer (<a subject>) will perform a special whoopee (<an action>) on a file, account, IP address, hash, registry key, etc. (<an object>) using systems with the functionality to perform that whoopee (<a tool>).
Defining these deportment at each phase of the company’s workflow helps to unzip consistency and create scalable and flexible scenarios, which can be promptly modified to unbend changes in the infrastructure or any of the conditions.
An example of a worldwide response action
1. Be prepared to process incidents
The first phase of any incident response playbook is devoted to the Preparation phase of the NIST incident response life cycle. Usually the preparation phase includes many variegated steps such as incident prevention, vulnerability management, user awareness, malware prevention, etc. I will focus on the step involving playbooks and incident response. Within this phase it is vital to pinpoint the zestful field set and its visual representation. For the response team’s convenience, it is a good idea to prepare variegated field sets for each incident type.
A good practice surpassing starting is to pinpoint the roles specific to the type of incident, as well as the escalation scenarios, and to dedicate the liaison tools that will be used to contact the stakeholders (email, phone, instant messenger, SMS, etc.). Additionally, the response team has to be provided with unobjectionable wangle to security and IT systems, wringer software and resources. For a timely response and to stave human factor errors, automations and integrations need to be ripened and implemented, that can be launched by the security orchestration, automation and response (SOAR) system.
2. Create a well-appointed track for investigation
The next important phase is Detection that involves collecting data from IT systems, security tools, public information, and people inside and outside the organization, and identifying the precursors and indicators. The main thing to be washed-up during this phase is configuring a monitoring system to snift specific incident types.
In the Analysis phase, I would like to highlight several blocks: documentation, triage, investigation, and notification. Documentation helps the team to pinpoint the fields for wringer and how to fill them once an incident is detected and registered in the incident management system. That done, the response team moves on to triage to perform incident prioritization, categorization, false positive checks, and searches for related incidents. The reviewer must be sure that the placid incident data comply with the rules configured for detection of specific suspicious behavior. If the incident data and rule/policy logic mismatch, the incident may be tagged as a false positive.
The main part of the wringer phase is investigation, which comprises logging, resources and fabrication enrichment, and incident telescopic forming. When in research mode, the reviewer should be worldly-wise to collect all the data well-nigh the incident to identify patient zero and the entry point — knowing how unauthorized wangle was obtained and which host/account had been compromised first. It is important considering it helps to properly contain the cyberattack and prevent similar ones in the future. By collecting incident data one gets information well-nigh specific objects (assets and artifacts such as hostname, IP address, file hash, URL, and so on) relating to the incident, so one can proffer the incident telescopic by them.
Once the incident telescopic is extended, the reviewer can enrich resources and artifacts using the data from Threat Intelligence resources or a local system featuring inventory information, such as Zippy Directory, IDM, or CMDB. Based on the information on the unauthentic assets, the response team can measure the risk to make the right nomination of remoter actions. Everything depends of how many hosts, users, systems, merchantry processes, or customers have been affected, and there are several ways to escalate the incident. For a medium risk, only the SOC manager and unrepealable administrators must be notified to contain the incident and resolve the issue. In a hair-trigger risk case, however, the slipperiness team, HR department, or the regulatory validity must be notified by the response team.
The last component of the wringer phase is notification, meaning that every stakeholder must be notified of the incident in timely manner, so the system owner can step in with constructive containment and recovery measures.
Detection and Wringer phase deportment to unriddle the incident
3. Containment is one of the most important phases to minimize incident consequences
The pursuit big part consists of Containment, Eradication and Recovery phases. The main goal of containment is to alimony the situation under tenancy without an incident has occurred. Based on incident severity and possible forfeiture caused, the response team should know the proper set of containment measures.
Following the prestage where workflows had been defined, we now have a list of variegated object types and possible deportment that can be completed using our tool stack. So, with a list of deportment in hand, we just want to segregate proper measures based on impact. This stage mostly defines the final damage: the smoother and increasingly precise the deportment the playbook suggests for this phase, the prompter will be our response to woodcut the treasonous worriedness and minimize the consequences. During the containment process, the reviewer performs a number of variegated actions: deletes malicious files, prevents their execution, performs network host isolation, disables accounts, scans disks with the help of security software, and more.
The eradication and recovery phases are similar and consist of procedures meant to put the system when into operation. The eradication procedures include cleaning up all traces of the attack—such as malicious files, created scheduled tasks and services—and depend on what traces were left pursuit the intrusion. During the recovery process, the response team should simply prefer a ‘business as usual’ stance. Just as the eradication, the recovery phase is optional, considering not every incident impacts the infrastructure. Within this phase we perform unrepealable health trammels procedures and revoke changes that had been made during the attack.
Incident containment steps and recovery measures
4. Lessons learned, or required post-incident actions
The last playbook phase is Post-incident activity, or Lesson learning. The phase is focused on how to modernize the process. To simplify this task, we can pinpoint a set of questions to be answered by the incident response team. For example:
- How well did the incident response team manage the incident?
- What information was the first to be required?
- Could the team have washed-up a largest job sharing the information with other organizations/departments?
- What could the team do differently next time if the same incident occurred?
- What spare tools or resources are needed to help prevent or mitigate similar incidents?
- Were there any wrong deportment that had caused forfeiture or inhibited recovery?
Answering these questions will enable the response team to update the knowledge base, modernize the detection and prevention mechanism, and retread the next response plan.
Summary: components of a good playbook
To develop a cybersecurity incident response playbook, we need to icon out the incident management process with focus on phases. As we go deeper into the details, we squint for tools/systems to help us with the detection, investigation, containment, eradication, and recovery phases. Once we know our set of tools, we can pinpoint the deportment that can be performed:
- logging;
- enriching the inventory information or telemetry of unauthentic resources or reputation of external resources;
- incident containment through host isolation, preventing malicious file execution, URL blocking, termination of zippy sessions, or disabling of accounts;
- cleaning up the traces of intrusion by deleting remote files, deleting suspicious services, or scheduled tasks;
- recovering the system’s operational state by revoking changes;
- formalizing lessons learned by creating a new vendible in the local knowledge wiring for later reference.
Additionally, we want to pinpoint responsibilities within the response team, for each team member must know what his or her mission-critical role is. Once the preparation is done, we can uncork developing the procedures that will form the playbook. As a worldwide rule, every procedure or playbook towers woodcut looks like “<a subject> does <an action> on <an object> using <a tool>”—and now that all subjects, actions, objects, and tools have been defined, it is pretty easy to combine them to create procedures and diamond the playbook. And of course, alimony in mind and stick to your response plan and its phases.