SOC ROI

Security Operations Building with ROI

While Blogs can be read as independent articles, this blog is written to take the audience on a journey. Starting with the enumeration of issues that SOC leadership face through the struggles of a difficult and complex incident response event. The examples and stories provided serve as lessons learned, with guidance in later articles on how to align people, processes, and technology.

Dustin Nowak 12/16/24 Dustin Nowak 12/16/24

CTI Driven Investigation (Phishing Deadend)

Cyber Threat Intelligence (CTI) can be used not only for prevention and remediation, it can also guide an investigation. Understanding the attackers tactics and techniques can guide the investigation. Ten years ago I was investigating an APT that had access to the enterprise for an extended period of time, but was unable to ascertain the exact date of initial compromise. It wasn’t until investigating one of the Security Engineers that I found a phishing email that was part of the investigation several years earlier. Through threat intelligence I was able to attribute the APT actions to that phishing email.

What does this mean to threat intelligence and incident investigation? It was probably an avoidable situation, but how? We can, by understanding not only what we have collected as part of the investigation, but also the context of the attack. We will use a basic phishing example, to explain this approach.

*Initial Access*

Phishing Technique - T1566

Phishing attacks are part of the Initial Access tactic. Confirmation that initial access activities were successful of that activity cannot be limited to just a quick alert validation from the endpoint. The investigation needs to include the scoping of the incident across the enterprise, which should include: 1) search of all email boxes for the email, 2) identification of the attachment on any endpoints or file stores, 3) any other associated communications from the source address.

Pivoting from that Initial Access, the investigation should look for confirmation for tactics prior to and after the initial access. For instance, the investigation should include Reconnaissance activities such as network traffic from the domain or IPs, or other email traffic from the domain.

*Defense Evasion*

Impair Defenses - T1562

The initial reaction to investigating a phishing email includes looking for Execution either as an endpoint alert or user action on the file or link included in the email. It is not always the case that the Execution would be caught because of control gaps or defense evasion measures. The investigation should also consider other readily available telemetry on tactics such as Defense Evasion (T1562), Discovery (File Discovery: T1083), and Credential Access (Unsecured Credentials: T1552). These can all be correlated in the SIEM or manually investigated by the SOC analysts.

Being thorough in the investigation is critical, because in some cases gaps or incompleteness can lead to a compromise that lasts several years. Security operations analysts should be encouraged to correlate and contextualize to understand the lifecycle of the possible attack. In ideal scenarios, the planning and playbook development would work to not only collect information about a given event or alert, but also the context in the attack lifecycle using threat intelligence.

Dustin Nowak 12/9/24 Dustin Nowak 12/9/24

Use CTI to Informed Response Actions:

To set the stage, a few years back I supported a response to a ransomware attack. The corporation in question had invested significantly in a security program, largely prompted by regulatory requirements. The security operations team was staffed by solid security engineers, as an additional duty. The organization did have a cyber threat intelligence shop that was supported by a few dedicated individuals, but the CTI was not integrated into the security operations platform. When ransomware was detected when employees could not access files on the file share.

The security operations team reached out to my company to support the incident response two days after the notification from the employee to the security team. Unfortunately the security operations team was not familiar with CTI related to ransomware attackers. As opposed to writing an indictment on their response, let's view it through the lense of how CTI on ransomware actors could inform the response activities. Here is an example flow from Cybersecurity and Infrastructure Security Agency (CISA) on the CONTI Ransomware Attack.

For the purposes of the article let's break the flow down into some simple steps. In this case a simple phishing campaign brought an Excel file with malicious macros that checked for particular AV/EDRs running, then scanned the mounted file shares, and executed ransomware on those available file shares.

Setting aside the preventative and detection gaps on this attack let’s focus on the response. Each of the attacker’s actions in this path can provide insight into the response actions:

Phishing Campaign with Excel Macros:

Investigation: Search the Enterprise for this email by sender, attachment, or subject
Containment: Block/quarantine any suspicious messages

Attachment and AV check

Investigate: Capture the attachment and quickly open it in a sandbox for more intelligence
Investigate: Once the AV/EDR check is identified, scan the enterprise for any endpoints with or without disabled AV/EDR
Investigate: Scan the enterprise endpoints for that file
Containment: Contain/isolate any systems identified in those searches
Containment: Establish blocks (EDR and Email) for that file

File Share Enumeration:

Investigate: Search the file share and endpoint records for any other enumerations of the file share system
Containment: Possibly suspend “write” on the affected file shares
Containment: Possibly disable any accounts seen enumerating file shares

Powershell and WMI:

Investigate: Search the enterprise endpoint Powershell records
Investigate: Identify and network WMI queries across the network via RPC
Containment: Temporarily disable WMI RPC calls

Encryption:

Investigate: Search the file shares for any files with the encryption extension
Containment: Remove access to any encrypted files or folders
Containment: Stop backup process for those file shares in order not to overwrite good backups

Extraneous or Harmful actions:

Containment: Collect all laptops that reported attempted access to encrypted files
Containment: Turn off access to all file shares
Containment: Network isolate the entire file share backup process
Containment: Take down or isolate externally facing applications that have access to unaffected file stores

The desire to respond does create a sense of urgency during an incident. The decision to initiate response actions (containment and eradication) should always take two factors into account: 1) what is known about the attack and the impact of the response to the attacker (if that cannot be articulated the investigation is not mature enough) AND 2) what is the business impact of the response actions (VERSCHLIMMBESSERN).

Dustin Nowak 12/2/24 Dustin Nowak 12/2/24

Deep Dive on Detection Covering the Prevention Gap

As we discussed earlier in the blog series, preventative and detective controls are incentivized to reduce the false positives and provide high-fidelity responses. More specifically these tools utilize confusion matrices to focus on accuracy and precision. The optimization on accuracy does not eliminate misclassifications. “Anton on Security” discusses many of the issues in detection engineering his blog series here.

There are many factors that impact the accuracy of a detection (or prevention response), including:

Behavior patterns, such as: normal IT functions or application operating as intended
Contextual patterns: normal network traffic for a DMZ but not for an internal system
Data or input validation: a data query allowed but not intended
The “People Effect”: users find unintended ways to use tools or applications
Unknown/Unknown: The application or security tool enters a state that was not identified by the developer

In all of these cases there is some uncertainty or lack of data for a security tool to act with a high level of accuracy and precision. We can look to the scientific world for insight on how to address these issues.

Multi-Classification Confusion Matrix: The first approach is to utilize a Multi-Class Classification matrix for solving the detection problem. The mathematics of a multi-classification solution is explained well here V7Labs confusion Matrix guide. What does this mean in security detections… did we not classify the problem with enough labels or inputs? Maybe we have oversimplified the detection and need to consider more inputs or states.

Addition of Contextualization: If more states or inputs are needed we will likely need more data. Several SIEM vendors have appeared on the market promising to provide more contextualization. The approach here is to stitch together a series of events or related events from disparate telemetry sources. For example: An odd API query and the authentication token of the user that submitted that request.

Behavioral Analysis: In some cases the observed activity may or may not be part of a pattern of behavior. That means that a temporal element needs to be added as an input to the state. Many UEBA products carve out solutions in this space: Has user X ever logged into application Y at time T from location L?

Input Validation: Application inputs are regularly tested for the expected behavior or verification of the input. It is much harder to test for the validation of an input, that is something that was intended to be provided and responded to. A simple example of this is the query language escape “Robert’); DROP TABLES; –?” With the abundance of application integrations and growing presence of APIs this problem is exploding.

Dead-end State: IN some cases, applications or security tools receive input and enter a state that is unexpected. In many cases that is where error handling can play a role. Window Error Reporting (WER) is an example where the WER event can provide context or input of an unexpected threat.

Moving from theoretical to application, we need ways to identify these misclassifications: where are the security controls or tools working with high-precision and accuracy and where is there a chance for misclassification? Again we go back to threat modeling by using MITRE ATT&CK and MITRE D3FEND, we can identify attack scenarios, enumerate the security control responses, and look for places where uncertainty, lack of input, or expanded states (in Multi-Classification conduction Matrices) exist. Here are some issues to look for in this process: high-volume irregular traffic/events, many “use cases” in the event, limited information provided in the telemetry, or overlap with approved behavior (“Live-off-the-Land”). Detection engineering should focus on these gaps that cause misclassification by the security tools as part of their development. By doing so we can avoid overlapping preventions and detections covered by the existing tooling.

Dustin Nowak 11/25/24 Dustin Nowak 11/25/24

Threat Modeling Applied to Sec Ops

We discussed a process for applying Threat intelligence to Security Operations in the previous blog. Here we are going to examine a use case to provide the technical details on the process. For the purposes of this deep dive we will look at parts of a Ransomware Attack, specifically, Ragnar Locker (MITRE ATT&CK FLOW). This Threat Actor (TA) utilizes many “Live-off-The-Land” (LotL) and Defense Evasion techniques: RDP, COM Object Hijacking, Powershell and GPO Installations.

The first action in the attack model is the exploitation of externally available RDP services. It is possible that the RDP servers running EDR or Authentication services would identify a Brute Force (BF) attempt. However, without MFA or a centralized IAM security tool, attackers utilizing stolen credentials would be difficult to detect.

RDP can be utilized by administrators for maintenance purposes (although should be protected by a VPN). Utilization of valid remote session credentials alone cannot be used for detection. Contextualization through Network Traffic Analysis (NTA) or Account Authentication Event Thresholding can be used.

Existing security telemetry can be utilized to support analysis and anomaly detection around these detection techniques, in particular netflow data or authentication logs.

If we assume for this exercise that the RDP service is utilizing AzureAD sign-in logs, queries directly in the AzureAD SigninLogs Table based on Location can be accomplished using IPAddress or Location Fields, example:

SigninLogs
| summarize Successful=countif(ResultType==0), Failed=countif(ResultType!=0) by Location

When those logs are sent to a SIEM, SIEM native query languages or Detection-as-Code can further refine utilizing specific location or uniqueness filters.

Security Operations Analysts viewing these events in either table or alert format will require further information and contextualization to conduct the investigation. It is important to know if there is known travel, normal login location or IP, and the frequency and time of login.

Here is the summary of this example using the process from the previous blog.

Dustin Nowak 11/18/24 Dustin Nowak 11/18/24

Security Tools: Coverage vs Detection Engineering Gaps

Let’s assume that the SOC has built a SIEM with a core set of data sources to provide security alerts. The SOC is now likely seeing a plethora of alerts from EDR and NGFW stating that the security controls identified and blocked malicious or suspicious behavior. The SOC processes dozens of these alerts like this a day, with similar outputs: closed and resolved by the reporting security tool.

Security tool product managers are incentivized to minimize False Positives (FP) and False Negatives (FN). This creates a reality for the SOC where, if a tool alerts on something with a high enough confidence level then the tool can take a response action. What is the next step from here? We have to look at “telemetry,” which we define here as: logs, metrics, or events from IT and security tools that provide insight into the operations of a system. This is where we find visibility into the coverage gaps left by security tool alerting.

As we discussed in the previous blog, many attackers are utilizing defense evasion techniques and “Live-off-the-Land” (LotL) tactics to maximize their effectiveness and minimize the chance of being detected. Many sophisticated MDR services have learned how to identify these techniques:

Crowdstrike noted 75% of their detections were malware-free, meaning the attacker used LotL Techniques (Crowdstrike 2024 Global Threat Report)

For Security Operations centers without a team of event and intelligence analysts it is possible to provide some of the same insight and detections as the well resourced MDRs.

The process is straightforward, but it does take some experience and understanding of Cybersecurity Threat intelligence (CTI) and the MITRE ATT&CK Flow tool can provide an effective starting point. Here we will provide an overview of the approach and in the next blog we will do a deep dive on a specific use case from Threat Modeling to detection engineering. The steps are as follows:

Use Case: Identify a Use Case (Business Email Compromise - BEC) or Threat Attack to model
Infrastructure: Understand the infrastructure that would be affected (For a BEC use case the focus would be email infrastructure, e.g., M365 or Gmail)
MITRE ATT&CK Flow: Utilize the MITRE ATT&CK Flow or MITRE ATT@CK Navigator - tool to model a BEC attack (Example of StarBlizzard)
MITRE Defend: Translate the enumerated techniques and tactics to security tools (MITRE D3FEND)
Identify Detections: Identify security tool detections
Telemetry: Identify useful security telemetry (logs, events, and data)
Utilization of Telemetry: Consider ways of isolating the relevant telemetry

Correlation: a series of events
Contextualization: events on specific applications or systems
Behavior: Unique or unusual conditions in the event

SIEM Implementation: Develop and codify those events in SIEM platform
Playbook Information: Develop playbooks for monitoring and reviewing those events, in some cases human review will be required.

This is an esoteric process to improve the alerting and detection for techniques specifically designed to evade detection or LotL. In the next blog we will dive deep into a scenario and take it through detection engineering for implementation.

Dustin Nowak 11/11/24 Dustin Nowak 11/11/24

Millions Spent on SIEM/SOAR and Incidents are Still a Problem

In the last four years, average cybersecurity spending has increased from 13% to 21% of the IT budget. However, nearly 50% of companies reported an attack within the last year (73% experiencing multiple attacks) and only 9% said they were prepared adequately to defend with little to no business impact. Why isn't the effectiveness of cybersecurity programs meeting current challenges despite this significant increase in resourcing?

One of the biggest costs is the Security Operations Platform architecture, which includes the SIEM/SOAR as well as all the data sources for security telemetry (EDR, NGFW, Email Security, VM, and IAM providers). These architectures have become homogeneous and widely deployed. But, are the SIEM/SOAR platforms adding value commensurate to the investment?

There are likely two issues impacting the effectiveness of SIEM/SOAR solutions:

“On average 75% of default out-of-the box (OOB) rules provided by SIEM vendors are disabled, due to the difficulty of adapting generic rules to each organization’s unique infrastructure, log sources, naming conventions, and more.”
Attackers are increasingly aware of the security posture of their targets. In 2023, 40% of malware utilized defense evasion techniques.

What evasion techniques that attackers are utilizing? here are some statistics:

68% of attacks involve a human element or error (from the 2024 Verizon Breach Report)
11% of attacks utilized compromised credentials for initial access (Mandiant M-trends 2024)
Nearly 30% of attackers utilize techniques that included creating or modifying existing system processes (Mandiant M-trends 2024)
CrowdStrike noted 75% of their detections were malware-free, meaning the attacker used “Live-off-the-Land” (LotL) Techniques (CrowdStrike 2024 Global Threat Report)
IBM noted that 16% of their investigations identified stolen or compromised credentials, with an average of 292 days to identify the account compromised (IBM 2024 Cost of a Data Breach Report)
Mandiant notes that ~25% of the attacks they investigated included process injections (running in memory) (Mandiant M-trends 2024)

What does all of this mean for the Security Operations team and how a SIEM/SOAR should be employed?

First, the Security Operations team needs to have prioritize differently and focus:

“Despite 81% saying their security operations centers (SOCs) are adequately staffed, respondents admit their security personnel are spending far more time (60%) on routine IT operations tasks versus security-related functions (27%).”

To be discussed in future blogs…

Second, we need to expand the use of the SIEM beyond simply serving as a single pane of glass for all of the security tools. The SIEM must be supported by an analytics and detections team that can identify security control gaps and monitor for suspicious LotL behavior. LotL detections are not binary alerts, as many of the behaviors are also part of the IT team's operations and maintenance approach. Here are some statistics or metrics to consider in SIEM detection engineering for LotL techniques:

User Behavior Analytics (UEBA): Although tools can be bought to provide this, simple approaches to log-in times, locations, source systems or user-agent strings can be quickly implemented.
Compromise Credentials can be hard to identify but two approaches that can help are dark web monitoring and the UEBA analytics above.
Some Endpoint Detection and Response (EDR) tools provide mapping of execution paths, but many fall short in monitoring process memory space. IT operations tools can provide some insight into significant process memory utilization.
Threat Intelligence and a good Managed Detection and Response (MDR) can also provide significant insight into LotL techniques such as use of RDP applications, suspicious registry key value, and app data folders.
Outbound data flows and destinations can also be a suspicious indicator.

None of the above techniques are binary detections or alerts, this is where a seasoned team that understands the company's environment provides insight beyond the standard tool alerting.

Dustin Nowak 10/31/24 Dustin Nowak 10/31/24

Is a SIEM Like a Boat?

The First day as a CISO, the question always gets asked, “Do we have a SIEM?” The answers vary: “it's run by our MSSP,” “somewhere but we haven’t looked at it since the last incident,” “yes, but our SOC complains about it every week.” For any information security program, the SIEM represents the epicenter of alerting operations. for most CISOs, however, it’s more like a boat; the best days are the day they buy it and when they get rid of it (out-source, put it into the cloud, build a new one…)

The SIEM SOAR Market represents ~$5.7 billion worldwide. On average companies pay $18M annually for the core platforms of the Security Operations Center (SOC). Then there are the resources required to deploy it, integrate it, operate it, automate it… A SIEM represents the biggest security investment that a CISO will likely make.

With all this investment, 87% of security leadership surveyed note that their SIEM needs improvement. Chief amongst their concerns are:

Over-Collection: '“Study…suggests that 95% of SIEM incidents are generated by just 15% of rules”
Under-Collection: “Significantly diminished with an average of only 16% coverage across MITRE's ATT&CK framework”
Lack of Detection Engineering: “SIEM engineering capabilities are often overlooked”

The frustration with this cornerstone of security operations is so common that organizations are outsourcing their security operations to MSSPs and MDRs. The Security Operations services Market is expected to grow by a fact of 2.3x over the next five years. Their are scores of technologies and service providers trying to address the various problems driving these frustrations:

How do I get better at collecting logs?
What is the optimal retention and access strategy?
How can I utilize all the data I collected for analytics? … Or with AI?
How do I correlate all the disparate data?
What can I do to make the platform easier to use? Automation? APIs?

Like a boat, there is always another tool to add or another upgrade to fund.

Just as every aspiring sailor needs a boat, CISOs know they SIEM. But, when you ask them why they need it or what their return on investment is, they struggle to answer. Is it a regulatory requirement OR an investment in real-time alerting? Does it provide visibility into the enterprise OR support long term discovery? In order to properly appreciate and articulate a SIEM’s value, the CISO needs to understand, codify, translate and set measurable objectives for the Security Operations team and platform… more to come.