Introduction
I’ve spent the majority of my career in the defensive research space, first at F-Secure as a Threat Hunter & Detection Engineer and then later as a Senior Security Researcher at Sophos. The majority of my time was focused on detection engineering whether it was creating new rules for emerging threats, tuning and reviewing noisy detections or validating and reviewing our detection capabilities.
As time went on I found that the majority of this time wasn’t actually spent writing detection rules but rather generating telemetry and performing rule validation. Many detection engineers can relate to the dread of customers asking about coverage of certain TTPs and resorting to running grep commands against your rule base, we’ve all been there and we’ve all done that. To me it just felt like I was missing a tool, something designed for the blue team to help emulate and quickly validate rather than relying on guesswork.
Continuous Improvement In Detection Engineering
The difficult part about detection engineering isn’t actually the rule writing. Finding creative ways to detect complex behaviors does pose a challenge, but the hidden difficulty comes with the maintenance and management of your rule base.
As your detection capabilities expand (via new telemetry sources becoming available or new EDR features releasing) so does your operational overhead. With these new capabilities:
- Previously noisy rules can become high fidelity signals
- Previously infeasible rules can become practical
- Multiple similar and overlapping rules can be consolidated into a single detection
The challenge is identifying these opportunities.
In reality detection capabilities continue to improve but the rule base doesn’t evolve at the same pace. This leaves you with legacy “good enough” rules that never end up getting improved and over time these issues compound.
Emulating Activity As The Blue Team
Continuously improving and evolving your rule base is a challenge that requires constant proactive validation and testing which in practice, often means emulating activity associated with specific techniques.
There are many approaches to this type of testing, usually a combination of utilising:
- Open source C2 frameworks
- PoC’s from Github
- Atomic Red Team scripts
- Mitre Caldera
- Real malware samples detonated in sandboxes
While each of these provide value, all of them have their drawbacks.
Open source C2 frameworks provide realistic execution and exploitation (especially since certain threat actors actually use them). However for telemetry generation they add unnecessary complexity in the form of staging logic, obfuscation and operational guardrails. This can “muddy” telemetry making it hard to differentiate which parts are associated with the malware technique and which parts are tool specific.
Proof of concept code from places like GitHub can provide realistic execution but often require modification, understanding of the programming language it’s written in and lack standardisation in output.
Atomic Red Team is an open source collection of shell scripts for executing malware techniques. It’s relatively comprehensive and simple to use however it usually doesn’t represent real world execution and lacks debug output and customisation.
Mitre Caldera functions essentially as a C2 framework with more granular control, however it requires a large amount of setup and maintenance with significant prep work required to test certain activity.
Testing real malware samples in a sandbox sounds perfect in theory but finding samples that perform the techniques you want to test is a challenge. A lot of the time the malware won’t execute the target functionality without a command from it’s C2 which adds extra complexity.
All of these approaches have their place but common problems emerge:
- The need to internally host and maintain infrastructure
- Poor visibility or lack of detailed execution logs
- Difficulty customising behavior
- Time consuming setup and execution
What’s missing is a tool built specifically for defenders. One that focuses on emulating individual behaviors in a controlled, repeatable, and observable way without the overhead or ambiguity introduced by red team focused tooling.
Combat Theater
Combat Theater was built specifically to address the challenges faced in the detection engineering and defensive research space. Rather than simulating full attack chains, it focuses on what defenders actually need: the ability to isolate, execute, and clearly observe individual behaviors in a safe and deterministic way.
You can think of it as a “malware technique execution framework” which provides a large library of techniques that can be quickly customised and executed.

Features
The framework provides a set of capabilities and features designed specifically for detection research workflows:
Script Based Architecture
Malware techniques are implemented as individual scripts but executed by our native C++ “technique engine”. This allows for easy development and implementations of new techniques but with the realism of executing as native compiled code.

Payload Flexibility
We provide a library of “run safe” payloads in a vast array of formats that perform benign operations. These payloads exists on disk and are read in at the point of execution. Each one is provided with a clear description so you know exactly what you’re about to execute. Should you want to add your own it’s as simple as dropping them into a folder, Combat Theater will import them for you automatically.

Deep Observability
The “technique engine” automatically logs low level details on each function used as the technique executes allowing defenders to easily correlate the executed activity with telemetry.

Customisation
Techniques can be configured and adapted easily with our no code GUI “pick’n’mix” system.

Configurations can be saved, loaded and executed with just a few clicks.

Playbooks
Chain technique executions together with our playbook builder and runner. We provide a collection of repeatable scenarios for you to choose from, including our “Challenge Books” which are designed to stress test your visibility and capabilities starting from easy to detect to obscure and stealthy.

Reporting
After running your tests, easily export your logs into multiple formats to suit your existing workflow, documentation tools and reporting systems.

Closing Thoughts
Detection engineering at its core is about the confidence in knowing that your detections work as intended.
Without a reliable way to generate telemetry and validate behaviour that confidence is difficult to achieve. Combat Theater was built to provide that capability by enabling defenders to move from assumption to verification.
While this is its primary focus the ability to execute and observe techniques in a deterministic way has broader applications. The same approach can be used to support analyst training, evaluate security tooling and help teams better understand how systems respond to specific behaviours.
At Combat Theater we want to achieve one goal: to make behaviour driven testing accessible, reliable, and practical for defenders.
If you have any questions about Combat Theater or your use case for it, reach out to us using our contact form.