Whether you’re a seasoned Detection Engineer or just starting to build out your SIEM, there comes a point where you need to ask yourself: “Is this Detection valuable?”
If you’re a seasoned vet, you could probably make a quick audit of the detection logic and some alerts and come back with definitive yes or no after a few minutes.
But if you’re not, it could be difficult to give a yes or no with conviction. How does this detection fit into my overall SIEM coverage? Is it tuned correctly? Am I sure this is valuable..?
So, what exactly is it in a detection that makes you say “yup, this is in a good place” or “nope, this needs improvement”?
From a high level, there’s quite a lot of things to consider:
Is the coverage niche enough where it’s not too broad, but not too narrow?
Is the logic tailored correctly to my environment?
Is the detection architecture resilient over time?
Are there false positives? How many? What is the cause?
What is the labor cost associated with each alert that comes in?
Is the alert actionable?
That’s a lot, and even these questions are just the tip of the iceberg.
When it comes down to it, whether a detection is “good” or not is both subjective and objective.
High level analytics can give you some basic information from a glance, but a detection audit truly requires a human look.
It’s even hard to have AI tell you whether the detection is “good” or “bad” as it would require a plethora of historical alerts, analyst comments, trends, a backlog of improvements made through the detection’s lifecycle, and likely even more!
So what are the pieces I look at that are quantifiable? This is my mental checklist I run through when auditing a detection.
Visibility
Does this provide accurate visibility of the system?
It’s essential to make sure the detection logic captures the right activity for the use case. This requires the detection engineer to have a deep understanding of the logs in order to filter out noise vs value.
Am I looking for the right activity?
Piggy-backing on the previous question, a deep understanding of logs means understanding the ins and outs of the different log type. Within systems, there can be events that have similar names with minute differences. It’s important to gain an understanding of what each one is - which can require some manual testing and actually reading the documentation.
Actionable
Can I take action on this alert?
It’s important to provide all the relevant information needed for a SOC analyst to investigate an alert from a glance. Make it as easy as possible to identify important artifacts - Your future self will thank you.
Is it easy to understand how to triage this alert?
As your detection suite grows, you’ll likely forget how to triage some of the less frequent alerts. Make actionable steps easy to infer or provide links to a runbooks.
Is it easy to understand what to investigate?
Along with a runbook, make it easy to understand exactly what artifacts are in question. Extract these artifacts from the alert log and accurately label them so they are clear and obvious.
- Today’s Sponsor -
Prepare for a career in Cybersecurity, one sip at a time with The Security Sip. With rapidly evolving threats and technologies, many struggle to gain the right skills and experience to break into the cybersecurity industry. This course is designed to transform beginners into industry-ready professionals over 12 sections, 85 modules, and 155 exercises. Check it out!
Impact
Is the threat worth detecting?
Just because we can write a detection for something, doesn’t mean we should. Expansive visibility over all of the things is great at first, but inevitably, Alert Fatigue will set in. Instead, challenge yourself to spell out the value of the detection before creating it. If you can’t clearly state the “why?” then it’s probably not worth detecting.
Is there risk to false positives?
False positives are a given. But does a false positive provide any risk to the team or artifacts involved in the alert. This will take some critical thinking and weighing of different potential scenarios.
Is there risk to false negatives?
False negatives can have higher severity impact than a false positive. A false negative means a gap in visibility, and depending on the detection could mean missing something vital. Make sure to cover your bases and to actively monitor freshly implemented detection. This can mean threat hunting for missed activity during the initial implementation phase.
Robustness
Are severity and threshold counts tuned correctly?
Severity and threshold counts highlight what is important while ensuring your detection doesn’t create too much noise. For example, there is no reason to fire an alert on every single incorrect password attempt, but 5 consecutive could be a brute force attack. Use your understanding of the logs and the threat you are trying to detect to tune these accordingly.
Are false positives decreasing over time?
While false positives are inevitable, it’s important to identify and create ways to filter them out over time. Eventually, your detection should reach a place where you’re only seeing activity you truly want to see - the True Positives.
Is the detection logic written with quality?
Simpler is better. Don’t over-engineer your detection logic. Focus on what you want to see, and retrieve that in the most straight forward way possible.
Is the detection correctly documented and labeled?
The easiest way to answer this question is to read your detection logic as if it is the english language. Can you read your logic like it’s a sentence? It’s also critical to ensure variables are properly named and comments are added where needed (think IDs that are random strings).
Can this scale?
While this question can be ambiguous, it’s important to consider how to design your detection as effectively as possible for scaling. This means abstracting items out that will likely be reused. If you want to learn the best way to do this, I’d suggest reading my Exceptions as Code article.
Cost
Volume
How often are alerts firing? Now how many were actually valuable? Only you can answer this. If this isn’t an easy question to answer, then you need to shift your focus to creating a Data Driven Detection Lifecycle.
Is operational/labor cost for triage minimal?
The effort needed to determine whether the alert needs further investigation or not needs to be considered when determining what to include in the alert. This ties into many of the principles of the previous points.
Does the value justify the log storage cost?
Storing logs is expensive nowadays. Not only that, you may even have to pay twice to ingest the logs into your SIEM from your storage solution. So, is the threat worth the daily volume of logs ingested?
The Bottom Line
That was a lot, I know.
So, what makes a “good” detection? As you can see, that’s really not an easy question to answer.
But these points help shine some light on ways to determine if you have a quality detection in front of you.
Did I miss anything? Let me know below, or follow up with me on Twitter!
Securely Yours,
The Cybersec Cafe
Just a heads up, The Cybersec Cafe's got a pretty cool weekly cadence.
Every week, expect to dive into the hacker’s mindset in our Methodology Walkthroughs or explore Deep Dive articles on various cybersecurity topics.
. . .
Oh, and if you want even more content and updates, hop over to Ryan G. Cox on Twitter/X or my Website. Can't wait to keep sharing and learning together!