Starting is the easy part, scaling is the hard part.
Spinning up your Security Operations Center is a big day for your business. It means you’ve done so well to a point where a dedicated security team is needed to monitor the organization, its tools, its software, and most importantly - its data.
Now there are certainly expectations in place that your company and it’s data will be secure, bringing on a massive responsibility to your customers.
Here’s the Step by Step method to ensure you set up and scale your SOC the right way.
Step 1: Logging and Monitoring
The first step most organizations take when getting their SOC started is deciding on a SIEM. SIEM stands for Security Incident Event Management, and is the software tool that your SOC requires in order to effectively do your job. You’ll need tools in place for logging and monitoring audit logs happening throughout you environment, and aggregating all of that data in to one place can be daunting, especially with the technical knowledge that some products require.
The decision on which piece of software to use is critical too, as it completely dictates the direction your SOC takes from the start and switching tools later on can be extremely difficult. But, once chosen, your team can start ingesting audit logs from machines, firewalls, SaaS products, and everything in between in order to monitor your environment.
But how do you monitor your environment?
Well, this comes in the form of detections. Detections are logic built around your logs that will trigger security alerts based on if that certain criteria is met. Detections can come out of the box with your SIEM, and you can also create your own that are tailored to your environment. These are the foundation of your SOC, and facilitate all the monitoring that occurs - since hand sifting through logs looking for Indicators of Compromise would be impossible.
Don’t worry - I’ve got your covered. Check out this blog post here for your guide to building your first detection.
Step 2: Use Data to Fight Alert Fatigue
After you’ve begun the process of onboarding logs and creating your detections, you’ll see that alerts will start coming in. The more logs you ingest, the more detections you create, and the more alerts you will have. You’ll suddenly see, it becomes quite the juggling act to be working on all three of these processes at once when your SOC is up and running. You may even start wondering, how do SOC teams handle this without scaling the size of the team?
This feeling you’re getting is called Alert Fatigue, and it’s a fight that every SOC must endure. The way SOC teams first need to fight off Alert Fatigue is through an iterative process called Detection Engineering (want to deep dive on the topic? Check this out).
At this point, you’ve already gone through the first step of this process, which is the creation of the detections in Step 1. But, in Step 2 we get to the iterative review step. This is where as a team, it’s imperative to tweak your detections. Some tweaks will become immediately apparent, but others may require some critical thinking.
For example, is there one alert that consistently fires even though you know it’s expected behavior from a service account? It may be time to review that detection, and tweak the logic so it has an exception for that specific account. Or, is than an alert the fires from your known office IPs? Maybe it’s time to downgrade alerts with that IP to be informational.
Great, now you’ve started tweaking detections ad-hoc. But there must be a more efficient way to do this, right? Yes. And it comes down to collecting data. You can’t know what you don’t track. So tracking your alerts and collecting metrics is a great way to hit the low hanging fruit, and also detections that may have more complexities to them. I’ve written an entire article based around becoming data-driven, and I’d recommend reading more into this effective process to implement in your own SOC.
- Today’s Sponsor -
Prepare for a career in Cybersecurity, one sip at a time with The Security Sip. Learn a new cybersecurity topic each day in an order that encourages learning and prepares you to be a cybersecurity professional. Free and Paid Plans Available!
Step 3: Automation
Now that you’ve matured your SIEM tool and have triaged alerts for some time, you may start to notice that there’s a lot of manual labor that goes into ensuring an alert isn’t actually malicious activity. And that a lot of it is repeatable.
Queue the SOAR - your automation tool designed to help triage your alerts. The SOAR is a tool that I’ve seen SOC teams out there purchase, but not necessarily utilize to the full potential it has. The SOAR is extremely powerful, and has the power to automate all initial triage processes of a security alert. This alone can nearly destroy alert fatigue.
However, the technical nature of the SOAR scares some teams away. Nowadays, there are so many plug-and-play options out there, that as long as you have the knowledge on how to call APIs (which can definitely be learned if you don’t), you’ll be able to configure your SOAR. I even have you covered on how to create your very first SOAR workflow.
The SOAR requires some deep thinking about your detections. Like what does the typical triage process look like for this alert? What do I need to know in order to say this isn’t malicious? And what pieces of this can I reuse in other detections.
But, once configured, the SOAR is extremely powerful, and will be the perfect synergy to your SIEM.
Step 4: Incident Response Processes
The next step is all about getting prepared for the worst. After all, detecting the “worst” is what you’ve been building your systems around in the first three steps. And having the proper processes in systems in place for when SHTF (if you know, you know) will ensure that your organization survives in one piece when this inevitably (but hopefully never) happens.
First things first, everyone’s favorite: Documentation.
It’s important to have robust playbooks in a central location for every detection you have built. I’ve detailed the steps on how to draft up playbooks in a previous article. But in layman’s terms, it all boils down to, “When this thing happens do this, and if this thing happens it’s okay - but if this thing happens, sound the alarm.”
This will also help in onboarding new members as your SOC team scales in size. It will be daunting to learn triage processes on custom alerts at the start, but having documentation in place will help catch new Analysts up to speed quickly.
Once you have verbose playbooks in place, its now time to identify weak spots. This will generally come in the form of manual tasks baked into the Incident Response procedures. This is where our handy SOAR tool comes in again. Incident Response is stressful. So if you identify an area that can be automated to assist you, build out the workflow to do just that. Automate now, save time later (and possibly even the company).
Step 5: Maturation
The final step is maturity of your solutions and processes. At this point, it’s likely you’ll have quite a few tools and custom pieces in front of you. When you start building out all of these different systems, it’s hard to get them all to work seamlessly together the first time. It’s important to revisit these integrations and refine them.
Just as you learned back in Step 2, scaling your SOC is an iterative process. Sometimes, opportunities to improve your detections or automation workflows only reveal themselves after time and enough data has been collected. Having a system of iteration and auditing your detections and workflows can help to ensure nothing goes un-touched for too long. Sure, anecdotal experience can take you to a certain height, but its data and systems that drive action to the next level.
Maturation in your Incident Response process will likely only come after small incidents happen. It’s a very difficult thing to plan for. That’s why performing a Lessons Learned after incidents occur can dramatically improve your IR processes. This is the time to be critical and point out your shortcomings, but also pat yourself on the back in the areas you deserve. Incidents will always be different, so look with a sharp eye to identify opportunities for improvement early. This will help ensure your systems are in place if the worst comes.
The SOC is an integral part to SaaS companies, logistics companies, or any scaling organizations with precious data to protect. That’s why it’s important to make sure you SOC scales effectively too. Ensuring you take the right steps early help to set you up for success later, and these 5 steps are the main phases you need to make sure you have an industry leading Security Operations Center.
Securely Yours,
The Cybersec Cafe
Just a heads up, The Cybersec Cafe's got a pretty cool weekly cadence.
Every week, expect to dive into the hacker’s mindset in our Methodology Walkthroughs or explore Deep Dive articles on various cybersecurity topics.
. . .
Oh, and if you want even more content and updates, hop over to Ryan G. Cox on Twitter/X or my Website. Can't wait to keep sharing and learning together!