My SIEM-Agnostic Creative Process to Detection Engineering
Picture this: You’ve just started ingesting a new log source to your SIEM with no prebuilt detections. The log source is business-critical, and you have two business days to get basic monitoring in place.
How do you go from zero detections to a suite of rules providing valuable, real-time monitoring?
This is a situation every detection engineer will face at some point in their career. And when it comes, the difference between a junior and senior engineer often comes down to having a proven, repeatable process for building out a reliable detection suite.
But first, let’s level-set. What is detection engineering?
It’s the practice of creating threat detection rules that define specific patterns, behaviors, and Indicators of Compromise (IoCs) that may signal malicious activity. These rules are deployed in your SIEM’s detection engine, giving you real-time visibility into what’s happening across your environment.
In short: Detections tell you when malicious activity may be happening, as it’s happening.
The process I’m going to outline in this article for you to build out your detection suite is SIEM-agnostic - it’ll work in any environment.
Let’s dive in.
Phase I: Research
The Research Phase is all about gaining an understanding of the log source - what it captures, how it behaves, and what meaningful activities it records.
To start, I like to ask myself a few key questions:
What would I want to know?
What would an attacker try to do in this system?
What am I really interested in having real-time visibility over?
Whether you like it or not, the best way to answer these questions is by diving into the documentation. Nearly every SaaS platform today offers documentation, and while it’s not the most thrilling read, it’s by far the most valuable resource in this phase.
Start with the event types - these outline the specific activities being logged, giving you insight into what’s actually being recorded.
Next, communicate with key stakeholders - specifically administrators and day-to-day users of the platform. Their firsthand experience can offer unfiltered, practical insights into how the system is used. Some questions you may find value out of asking them are:
What actions do you perform regularly?
Is there any sensitive data being stored?
Are there specific events or behaviors you’d want visibility into?
Once you have a baseline understanding, shift the conversation toward security-specific concerns:
Is there anything you’d be worried about from a security perspective?
What events would you find valuable to have real-time visibility over?
If you were going to perform malicious activities in this system, where would you start?
Whenever possible, target technical stakeholders like IT, Infrastructure, Product, and AppDev - they’ll provide a clearer picture of potential attack surfaces, data flows, and areas worth monitoring.
Phase II: Detection Brainstorming
Once you’ve gathered enough information, it’s time to start brainstorming detections.
During this phase, I like to write down my ideas to give myself a checklist that I can systematically work through.
I typically focus on actions that can be performed legitimately, but could also indicate malicious activity when performed in a certain context. Here are some common behaviors worth monitoring:
CRUD (Create, Read, Update, Delete) Operations, although you can generally forego Read actions
Administrative/Privileged Actions
Actions Involving API Keys or Secrets
Configuration Changes
Data Exfiltration-like Behaviors
Bot-Like Actions
Failures (High attempt counts or failed sensitive actions)
Remember, as detection engineers, we constantly walk a fine line between value and noise.
We want meaningful alerts that catch suspicious behavior, but we don’t want a flood of false positives that drown your SOC team in noise.
That’s why it’s important to think critically from an attacker’s perspective when designing your suite of rules.
If you hit a creative wall, open-source detection repositories are a great place to get inspired.
Panther and Splunk both have open-source detection libraries.
While it may take some effort to decipher the detection logic, they’re fantastic starting points.
- Today’s Sponsor -
Prepare for a career in Cybersecurity, one sip at a time with The Security Sip. With rapidly evolving threats and technologies, many struggle to gain the right skills and experience to break into the cybersecurity industry. This course is designed to transform beginners into industry-ready professionals over 12 sections, 85 modules, and 155 exercises. Check it out!
Phase III: Diving into the Logs
Now comes the fun part.
With your detection ideas in hand, it’s time to start hunting for some events.
For the best results, you’ll want to have at least a week of logs ingested - otherwise you’ll have such a limited sample size you may find it difficult to find events or spot patterns.
The goal here is to validate what we identified in Phases I and II with one key precursor - we need to establish a baseline for what’s normal in the environment.
This takes some experience, so don’t get discouraged if it’s tricky at first. Start by making note of common accounts, actions, and activities.
One of my go-to techniques for this phase is what I call, Count By Queries. Here’s an example:
SELECT action, COUNT(*)
FROM logs
GROUP BY action
ORDER BY COUNT(*) DESC;
Counts are a great indicator as to what is regular and what is not. It’s simple - High counts are regular, low counts are not.
Now, this isn’t a strict rule, volume doesn’t necessarily tell the entire story. Use this technique as a guide, but still rely on your security engineering intuition when looking at the results.
Phase IV: Detection Organization
After Phase III, you should have a thorough understanding of the logs.
At this point, it’s time to start planning your detections.
While it might be tempting to dive right into writing them, I’ve found it much quicker to plan them out first.
Why?
Because you’ll often realize you can group logic into a single rule, reducing redundancy and making your suite easier to manage.
Here’s what I mean.
Detection logic typically begins with event types, event names, and actions. In many cases, you can create broader, blanket detections by grouping different event names and actions that share the same event type.
An example would be grouping all administrative actions under the same detection, rather than splitting them out by service.
Additionally, SIEMs nowadays support dynamic severity ratings, which allow you to fine-tune your alerts based on the specific contents of the event.
Then there’s Detection Exceptions.
These are users, accounts, or machines that frequently perform actions as part of their regular activity. Exceptions prevent unnecessary noise by excluding known, benign behavior.
However, I’m generally a fan of waiting to implement exceptions until you’ve verified that your detections are working properly at a basic level, otherwise you might accidentally filter out meaningful activity before you’ve validated accuracy.
But it doesn’t hurt to jot them down now!
Phase V: Detection Creation
Now it’s time to create our detections.
Let’s start with logic.
Your detection logic needs to be clear, concise, and as simple as possible without sacrificing functionality. Detection languages vary by SIEM, but regardless of the platform, make your syntax as human-legible as possible.
This makes your detections easier to understand, maintain, and troubleshoot later on.
Next, you’ll want to define the type of detection you’re creating. Three of the most common types are:
True/False - If a specific key-value pair is present, it fires an alert.
Threshold - These trigger when a certain count of events occur over a defined time period.
Scheduled - These run on a fixed interval, scanning for activity based on rule criteria.
Each detection type serves a unique purpose, and selecting the right one is completely up to the expertise of the engineer.
If you want a deeper dive of the actual detection writing itself, I’d recommend my article, Writing a Detection Rule.
Phase VI: Validation and Testing
Before deploying to production, it’s crucial to validate and test your detections to ensure they function as expected.
If your SIEM has this functionality built-in, leverage it to simulate specific scenarios and confirm your logic.
If your SIEM does not support test cases, a simple but effective approach is to rewrite your detection logic into a query and run it against your logs. While not ideal, it still helps identify any accidental errors, such as incorrect field names or faulty logic.
When architecting my test cases, I generally aim to cover three key use cases:
Success - This is a log that should trigger an alert.
Fail - This is a log that should not trigger an alert.
Edge Case - For complex detections, test values that sit right on the boundaries of logic (think the >= operator).
Phase VII: Iteration
The key to building a robust detection suite is through continuous iteration.
Detections are rarely perfect from the start.
As a matter of fact, you generally want to leave room for detections to be tweaked over time. This allows us to gradually gain confidence in their accuracy as you filter out different behaviors.
After deployment, monitor activities over the course of a couple weeks to a month.
As you iterate:
Tweak thresholds and exceptions to reduce noise
Stay skeptical when making changes and consider all potential use cases
Request reviews to double check logic changes
Keep digging into the logs and look for activities you may have missed
Slowly but surely, iteration will cause noise to decrease and value to increase.
Start to Finish
Now, this process isn’t the end-all be-all, but it sure will give you a solid foundation to start building out your detection suite.
When creating your initial set of detections, remember that it’s about coverage, not perfection.
Your first iteration isn’t about getting everything perfect, it’s about gaining visibility into what matters.
Refine and enhance your detections over time, and with each iteration, you’ll continue to build up your security posture.
Securely Yours,
Ryan G. Cox
Just a heads up, The Cybersec Cafe's got a pretty cool weekly cadence.
Every week, expect to dive into the hacker’s mindset in our Methodology Walkthroughs or explore Deep Dive articles on various cybersecurity topics.
. . .
Oh, and if you want even more content and updates, hop over to Ryan G. Cox on Twitter/X or my Website. Can't wait to keep sharing and learning together!
Ingesting logs in to any SIEM without a use case is not cost effective - with platforms like Splunk, there are costs associated with data ingestion.
Read the docs about the logs You are looking to ingest.
You will find some are rubbish and offer no security value or enrichment.
Thanks for this article! 😇