The security operations part-way (SOC) plays a hair-trigger role in protecting an organization’s resources and reputation by identifying, analyzing, and responding to cyberthreats in a timely and constructive manner. Additionally, SOCs moreover help to modernize overall security posture by providing add-on services like vulnerability identification, inventory tracking, threat intelligence, threat hunting, log management, etc. With all these services running under the SOC umbrella, it pretty much bears the undersong of making the organization resilient versus cyberattacks, meaning it is essential for organizations to evaluate the effectiveness of a cybersecurity operations center. An constructive and successful SOC unit should be worldly-wise to find a way to justify and demonstrate the value of their existence to stakeholders.
Principles of success
Apart from revenue and profits, there are two key principles that momentum merchantry success:
- Maintaining merchantry operations to unzip the desired outcomes
- Continually improving by bringing in new ideas or initiatives that support the overall goals of the business
The same principles are unromantic to any organization or entity that is running a SOC, vicarial as a CERT, or providing managed security services to customers. So, how do we ensure the services stuff provided by security operations centers are meeting expectations? How do we know continuous resurgence is stuff incorporated in daily operations? The wordplay lies in the measurement of SOC internal processes and services. By measuring the effectiveness of processes and services, organizations can assess the value of their efforts, identify underlying issues that impact service outcomes, and requite SOC leadership the opportunity to make informed decisions well-nigh how to enhance performance.
Measuring routine operations
Let’s take a closer squint at how we can make sure routine security operations are providing services within the normal parameters to a merchantry or subscribed customers. This is where metrics, service-level indicators (SLIs) and key performance indicators (KPIs) come into play; metrics provide a quantitative value to measure something, KPIs set an winning value on a key metric to evaluate the performance of any particular internal process, and SLIs provide value to measure service outcomes that are sooner linked with service SLAs. With regard to KPIs, if the metric value falls into the range of a specified KPI value, then the process is deemed to be working normally; otherwise, it provides an indication of reduced performance or possibly a problem.
The pursuit icon clarifies the metric types often used in SOCs and their objectives:
One important thing to understand here is that not all metrics need a KPI value. Some of the metric, such as monitoring ones, are required to be measured for the informational purpose. This is considering they provide valuable support to track functional pieces of SOC operations where their main objective is to squire SOC teams in forecasting problems that could potentially subtract the operational performance.
The pursuit section provides some touchable examples to reinforce understanding:
Example 1: Measuring analysts’ wrong verdicts
Process | Metric Name | Type | Metric Description | Target |
Security monitoring process | Wrong verdict | KPI (internal) | % of alerts wrongly triaged by the SOC analyst | 5% |
This example involves the evaluation of a specific speciality of the security monitoring process, namely the verism of SOC reviewer verdicts. Measuring this metric can aid in identifying hair-trigger areas that may stupefy the outcome of the security monitoring process. It should be noted that this metric is an internal KPI, and the SOC manager has set a target of 10% (target value is often set based on the existing levels of maturity). If the percentage of this metric exceeds the established target, it suggests that the SOC analyst’s triage skills may require improvement, hence providing valuable insight to the SOC manager.
Example 2: Measuring zestful triage queue
Process | Metric Name | Type | Metric Description | Target |
Security monitoring process | Alert triage queue | Monitoring metric | Number of alerts waiting to be triaged | Dynamic |
This specific specimen involves towage of a variegated element of the security monitoring process – the zestful triage queue. Evaluating this metric can provide insights into the workload of SOC analysts. It is important to note that this is a monitoring metric, and there is no prescribed target value; instead, it is classified as a dynamic value. If the queue of incoming alerts grows, it indicates that the analyst’s workload is increasing, and this information can be used by SOC management to make necessary arrangements.
Example 3: Measuring time to snift incidents
Service | Metric Name | Type | Metric Description | Target |
Security monitoring service | Time to detect | SLI | Time required to snift a hair-trigger incident | 30 minutes |
In this example, the effectiveness of the security monitoring service is evaluated by assessing the time required to snift a hair-trigger incident. Measuring this metric can provide insights into the efficiency of the security monitoring service for both internal and external stakeholders. It’s important to note that this metric is categorized as a service-level indicator (SLI), and the target value is set at 30 minutes. This target value represents a service-level try-on (SLA) established by the service consumer. If the time required for detection exceeds the target value, it signifies an SLA breach.
Evaluating the everyday operations of a practical SOC unit can be challenging due to the unavailability or inadequacy of data, and gathering metrics can moreover be a time-consuming process. Therefore, it is essential to select suitable metrics (which will be discussed later in the article) and to have the towardly tools and technologies in place for collecting, automating, visualizing, and reporting metric data.
Measuring improvement
The other essential element in the overall success of security operations is ‘continuous improvement’. SOC leadership should devise a program where management and SOC employees get an opportunity to create and pitch ideas for improvement. Once ideas are placid from variegated units of security operations, they are typically evaluated by management and team leads to determine their feasibility and potential impact on the SOC goals. The selected ideas are then converted into initiatives withal with the respective metrics and desired state, and lastly their progress is tracked and evaluated to measure their results over a period of time. The goal of creating initiatives through ideas management is to encourage employee engagement and continuously modernize SOC processes and operations. Typically, it is the SOC manager and lead roles who undertake initiatives to fix technical and performance-related matters within the SOC.
A high-level spritz is depicted in the icon below:
Whether it is for routine security operations or ongoing resurgence efforts, metrics remain a worldwide parameter for measuring performance and tracking progress.
Three worldwide problems that we often observe in real-world scenarios are:
- In the IT world the principle of “if it’s not broken, don’t fix it” is well known, and this mentality extends to operational units as well. Similarly, many SOCs prioritize current operations and only implement changes in response to issues rather than raising a continuous resurgence approach. This reluctance to transpiration acts as a stickup for achieving continuous improvement.
- Absence of a structured process to gather ideas for potential improvements results in only a fraction of these ideas stuff presented to the SOC management, and thus, only a fraction of them stuff implemented.
- Absence of progress tracking for improvements – it’s not sufficient to simply generate and discuss ideas. Implementing ideas requires diligent monitoring of their progress and measuring their very impact.
Example: Initiative to modernize reviewer triage verdicts
Revisiting ‘example 1’ presented in the ‘Measuring routine operations’ section, let us seem that the percentage of incorrect verdicts detected over the past month was 12%, indicating an issue that requires attention. Management has opted to provide spare training to the analysts with the goal of reducing this percentage to 5%. Consequently, the effectiveness of this initiative must be monitored for the specified elapsing to determine if the target value has been attained. It’s important to note that the metric, ‘Wrong verdicts’, remains unchanged, but the current value is now stuff used to evaluate progress towards the desired value of 5%. Once significant improvements are implemented, the target value can be adjusted to remoter enhance the analysts’ triage skills.
Metric identification and prioritization
SOCs often do measure their routine operations and improvements using ‘metrics’. However, they often struggle to recognize if these metrics are supporting the decision-making process or showing any value to the stakeholders. Hunting for meaningful metrics is a daunting task. The worldwide tideway we have followed in SOC consulting services to derive meaningful metrics is to understand the specific goals and operational objectives of security operations. Another proven tideway is the GQM (Goal-Question-Metric) system that involves a systematic, top-down methodology for creating metrics that are aligned with an organization’s goals. By starting with specific, measurable goals and working backwards to identify the questions and metrics needed to measure progress towards those goals, the GQM tideway ensures that the resulting metrics are directly relevant to the SOC’s objectives.
Let’s illustrate our tideway with an example. If a SOC is vicarial as a financial CERT, it is likely to focus on responding to incidents related to the financial industry, tracking and recording financials threats, providing newsy services, etc. Once the principal goals of the CERT are realized, the next step is to identify metrics that directly influence the CERT services outcomes.
Example: Metric identification
Goal | Question | Metric |
Ensure participating financial institutions are informed well-nigh latest threat | How can we determine the value of time the CERT is taking to notify other financial institutions? | Time it takes to notify participant banks without threat discovery |
Similarly, for operational objectives, metrics are identified to track and measure processes that support financial CERT operations. This moreover leads to the issue of prioritizing metrics, as not all metrics hold the same level of importance. In fact, when selecting metrics, it is crucial to prioritize quality over quantity and therefore it is recommended to limit the hodgepodge of metrics to sharpen focus and increase efficiency. In order to emphasize the importance of prioritizing metrics, the metrics that directly support CERT goals take precedence over metrics supporting operational objectives considering ultimately it is consumers and stakeholders who evaluate the services rendered.
To determine the towardly metrics, several factors should be taken into account:
- Metrics must be aligned with the primary goals and operational objectives
- Metrics should squire in the decision-making process
- Metrics must demonstrate their purpose and value to both internal operations and external stakeholders.
- Metrics should be realistically performable in terms of data collection, data accuracy, and reporting.
- Metrics must moreover meet the criteria of the SMART (Specific, Measurable, Actionable, Realistic, Time-based) model.
- Ideally, metrics should be streamlined to receive and unriddle current values in order to visualize them as quickly as possible.