Cybersecurity is Data: Collect, Analyze, Interpret

Cybersec Café #77 - 07/29/25

Jul 29, 2025

Forget the movie scenes. Most days in cybersecurity aren’t about zero-days, red teaming, or duct-taped Python scripts written in the heat of an incident.

The real work often revolves around data.

Security professionals spend a large bulk of their time collecting, interpreting, and responding to streams of telemetry across systems, endpoints, and networks.

Without quality data, robust systems, and intelligent people to interpret and take action - there is no security team.

You can’t write effective detection rules.
You can’t hunt for threats retroactively or proactively.
You can’t investigate, contain, or recover from incidents.

If there’s no visibility into your environment, you’re flying blind. Or just as dangerous is having the data and not knowing how to read it.

That’s why data analytic and statistical knowledge aren’t just nice-to-haves. They’re critical.

In this field, if you don’t understand your environment, you can’t protect it.

Challenges

Even with the right tools and a skilled team, logging and monitoring isn’t as simple as flipping a switch.

There’s more to it than plugging different platforms into the SIEM, waving your magic wand, and suddenly you have valuable insights.

There are tradeoffs, tough choices, nuance, and plenty of considerations to be made along the way.

What do we collect?

Not all logs are created equal. You can’t collect everything - at least not realistically.

So a conscious decision must be made for every source.

At its simplest form, you need to determine what log sources are valuable by taking the time to spell out why.

Start by asking:

What’s the actual value of this log source?
Is it needed for real-time detection?
Does it help with incident response?
Does it enrich other logs through context?
Is it required for compliance?

A shared understanding of what you’re collecting and why helps avoid wasted effort and bloated pipelines.

This is the foundation of a smart, sustainable strategy.

Where do we store it?

Storage is a constant balancing act between cost and capability. Budget is not infinite and log storage is expensive.

You’ll likely have two primary tiers:

High-cost storage (e.g. your SIEM) for logs that support real-time detection use cases and require fast access.
Low-cost storage (e.g. AWS S3) for logs that provide investigative context or are required for compliance retention.

There’s no one-size-fits-all solution. It’s no longer realistic nor cost-effective to store all log sources in a single source.

As a team you’ll need to understand what you prioritize - speed, budget, a single-pane-of-glass…

If you have the budget to keep all logs in one place - consider yourself lucky!

How long do we keep it?

It’s not always obvious what data you will need, or when you will need it.

The safest answer is often: “Keep everything, for as long as you can stomach it.”

But the reality is storage costs add up fast, especially for high-volume, high-cost platforms like SIEMs.

Many teams default to keeping logs for 12-15 months, which aligns with common compliance requirements.

But what happens if a threat has been lurking quietly for beyond then? What if a legal hold or regulatory inquiry suddenly requires access to old logs?

These are the kinds of scenarios that make retention strategy a critical part of your logging plan. The key is balancing cost, compliance, and risk - while also preparing for the unknown.

How do we drive action from our data?

With so many sources, fields, and values flooding your SIEM every day, separating noise from real signals can feel impossible.

But at the end of the day, that’s the job. Turning raw data into meaningful insight is what makes a security program proactive instead of reactive. And that takes skill.

You’ll need to write queries, look for patterns, understand business context, and recognize anomalies. It’s not just an analyst’s job - it’s a core skill for anyone working in cybersecurity - whether you’re red team, blue team, or somewhere in between.

The good news? Once you learn how to work with data, that skill travels with you.

The hard part? Getting there. But once you’re on the other side, it’s one of the most valuable tools for your career.

Whether it’s Detection Engineering, Incident Response, or Threat Huting - Security Operations is built on data. And as a Security Engineer, you need to make that data work for you. Selecty is a database-agnostic, sidecar query assistant built to do just that. Generate queries based on your table schehmas, optimize them to your use case, iterate on them quickly, and debug faster than ever - all in one sleek interface. Check it out!

Learn More

Architecture

The Traditional Approach

The go-to strategy for many cybersecurity teams has long been to send all logs to the SIEM.

The goal? A mythical “single pane of glass” - or one place to see everything. But in today’s landscape, is that even practical? Or smart?

Relying on a single platform can quickly lead to vendor lock-in. The more time and effort you invest into the one platform, the harder it becomes to leave.

Migrating your data, retraining your team, rebuilding your infrastructure, reconfiguring alerts - it’s a heavy lift.

And vendors know this. But at this point, you become a slave to their pricing because they know you’re stuck. There are a couple vendors that are notorious for insanely high cost (but I won’t put them on blast here).

Then there’s the issue of siloed data. Along with security specific data, security teams also often ingest some similar sources as other departments - leading to double ingestion costs and unnecessary complexity.

The truth is, the traditional model is showing its age. New players are entering the market with flexible, cost-effective approaches.

That “single pane” is cracking, and it might be time to rethink what centralized visibility should really look like.

Data is on the Move

Data lakes are rapidly becoming the backbone of modern security architectures.

Why? Because they’re not just cheaper, they’re smarter. A well-architected data lake allows you to store security-relevant data at scale, run advanced analytics, and break down silos between teams.

All while avoiding traditional vendor lock-in. You have the ability to:

Centralize and unify data across departments.
Lower storage and compute costs.
Scale effortlessly.
Support more complex detection and investigation workflows.

As this model continues to gain traction, SIEM vendors are being forced to adapt. They’re now figuring out how to work on top of your data lake - a major shift in power and flexibility.

The result? You take back ownership of your data. You control the architecture. And you can swap in and out tools as your needs evolve without feeling handcuffed to a single platform.

The Cybersec Café Discord is officially live! Join a growing community of cybersecurity professionals who are serious about leveling up. Connect, collaborate, and grow your skills with others on the same journey. From live events to real-world security discussions — this is where the next generation of defenders connects. Join for free below.

Join Now!

How Security Teams are Operationalizing Data

Statistics is the science of collecting, analyzing, interpreting, presenting, and organizing data.

The SIEM is a big data engine. It provides the tools to ingest, store, and visualize your security telemetry. But without the skills to analyze and operationalize the data, it’s like owning a library and not being able to read.

Security teams must develop strategies to act on their data at scale. Otherwise detection engineering, triage, hunting, and incident response all break down.

Detections

Detections are the heart beat of security operations.

Traditional detections often rely on black and white boolean logic to determine whether an event matches known bad behavior. But as threats grow more subtle and user behavior more dynamic, this approach starts to fall short.

That’s where statistical thinking steps in.

Behavioral detections, especially user-based ones, are notoriously tricky to get right. But by applying basic statistical analysis like mean and standard deviation to historical activity, you can begin to identify anomalies by searching for outliers.

These are specific activities that are statistically improbable.

This mindset shift allows you to go beyond simple pattern matching and to find signals that are truly anomalous.

Combine this with boolean logic, and you’ve got a powerful hybrid.

Alert Triage

Whether you’re manually triaging alerts or building automated SOAR workflows, statistical reasoning is a crucial skill.

Every alert is in a sense, a question: “Is this worth our time to investigate further?”

To answer it, you need to think like both a security analyst and a data analyst - you need to sift through raw telemetry, identify the relevant pieces, and organize them into a coherent story about a user, system, or behavior.

The goal is to contextualize the signal and assess the likelihood that it represents real risk. Sounds straightforward - but the challenge lies in variety and business context.

Different log sources, enrichment layers, and detection types all introduce complexity. And in these moments, environmental knowledge becomes just as important a technical skill.

Performance

The numbers don’t lie.

When you’re dealing with massive volumes of data, gut feelings won’t cut it - you need your metrics to prove your security function is performing.

Start collecting performance data across your operations as soon as possible: detection, response, and SOC workflows. These metrics provide an honest snapshot of where you stand today and how you’re trending over time.

Track the fidelity of your detections, the mean time to triage, and how long it takes to resolve incidents.

This data will quickly become your compass - pointing the way to efficiency and continuous improvement.

Threat Hunting

At its core, threat hunting is about finding what doesn’t belong.

It’s a manual process rooted in curiosity, intuition, and a methodical approach.

The best hunters don’t just stumble upon threats - they use structured techniques to interrogate data, spot anomalies, and test their hypotheses.

That means slicing through big datasets, surfacing patterns, and building a story based on evidence.

It takes a blend of technical skill and investigative mindset. The challenge? Knowing what to look for and how to get there without drowning in the noise.

Security Incident Response

Incident response thrives on precision, and your data is the foundation.

You’re not just collecting metrics to see how your team responds, you’re also building a full timeline of events based on historical data.

Attacks often sprawl. Your job is to trace them: sift through logs, correlate data sources, and identify the start and spread of an incident.

That means narrowing scope, identifying what’s relevant, and cutting the rest.

If you can compare current activity against historical baselines, even better. You’ll move faster, make stronger decisions, and resolve incidents with confidence.

💬 How else do you utilize data analytics and statistics concepts in your day-to-day as a security engineer? Let me know below!

The Narrative

By now, you’re probably noticing a theme: using data and statistical analysis to craft a narrative.

In cybersecurity, it’s not enough to just make sense of data - you need to translate it into something others can understand and act on.

That means making data actionable - the skill of filtering through massive amounts of telemetry, identifying what matters, and drawing conclusions that drive decisions.

Sure, if you’re communicating engineer to engineer, raw data might be enough.

But let’s be honest - that’s not how the real world works. Most of the time you’ll need to explain your findings to people who don’t live in the logs like you do.

Data is the evidence. The narrative is the conclusion.

This is exactly why statistical proficiency is so critical in cybersecurity. It’s the intersection of math and communication - taking something complex and making it understandable.

The professionals who can look at a wall of numbers and translate it into a compelling, security-relevant story are the ones who stand out. That skill of turning raw data into a clear and confident narrative is a superpower.

Cybersecurity is challenging for this exact reason. It’s not just one discipline - it’s many combined.

You need technical chops across a massive stack, data fluency, communication skills, and strategic thinking. All working in harmony.

But like anything else worth mastering, it takes practice. You won’t learn this overnight, but you will learn it if you show up, do the work, and build on the basics.

If you’re looking to improve this specific skillset, I’d highly recommend checking out these two articles next:

Securely Yours,

Ryan G. Cox

P.S. The Cybersec Cafe follows a weekly cadence.

Each week, I deliver a Deep Dive on a cybersecurity topic designed to sharpen your perspective, strengthen your technical edge, and support your growth as a professional - straight to your inbox.

. . .

For more insights and updates between issues, you can always find me on Twitter/X or my Website. Let’s keep learning, sharing, and leveling up together.

The Cybersec Café