The Collapse of Offensive Security Economics

Two AI models now complete enterprise attack simulations autonomously. What it means for your risk framework, your board, and your next quarter.

May 01, 2026

Domain: AI Security & Governance / Regulatory Horizon

The Development

On 7 April 2026, Anthropic announced Claude Mythos Preview, a restricted frontier model that autonomously discovers and exploits zero-day vulnerabilities across major operating systems and web browsers.[1] The model achieved a 72% success rate generating working Firefox exploits in benchmarks where Anthropic’s previous model succeeded less than 1% of the time. It uncovered bugs that had survived decades of expert review, including a 27-year-old vulnerability in OpenBSD’s TCP stack, a 16-year-old flaw in the FFmpeg multimedia framework, and a 17-year-old remotely exploitable FreeBSD NFS vulnerability granting unauthenticated root access.[1][2] Anthropic stated the capability emerged from general improvements in code reasoning rather than explicit offensive training, which means any laboratory pushing frontier model capabilities is on the same trajectory.

Anthropic classified Mythos as “Restricted-Grade” (a designation indicating capabilities too powerful for general release) and declined to make it publicly available.[1] It launched Project Glasswing, a defensive coalition of launch partners (Amazon Web Services, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, Linux Foundation, Microsoft, NVIDIA, and Palo Alto Networks) plus over 40 organisations that maintain critical software infrastructure.[3] The programme provides restricted access backed by up to $100 million in usage credits and $4 million in direct grants to open-source security organisations.[3][4]

The containment strategy failed early. On 21 April, Bloomberg reported that unauthorised access to Mythos occurred shortly after the announcement, reportedly using information exposed in a prior data breach at Anthropic contractor Mercor.[5] The precise access mechanism has not been publicly confirmed. Separately, the security firm Aisle replicated several of Anthropic’s showcase vulnerability discoveries using open-weight models with as few as 3.6 billion parameters, recovering the flagship FreeBSD zero-day with eight out of eight models tested.[2] Aisle’s analysis frames this as a “jagged frontier”: cybersecurity capability does not scale smoothly with model size, and the decisive advantage lies in the security expertise embedded in the orchestration system, not the model alone.[2]

The UK AI Security Institute independently evaluated Mythos and confirmed it is the first model to complete a 32-step simulated enterprise network attack without human intervention (succeeding in three of ten attempts, averaging 22 of 32 steps across all runs).[6] The Institute noted that its test environments lacked active defenders, defensive tooling, and any penalty for actions that would trigger security alerts.[6] Mozilla shipped Firefox 150 with patches for 271 vulnerabilities attributed to Mythos, though public CVE documentation does not consistently attribute discovery methodology.[7]

Today (30 April), the UK AISI published its evaluation of OpenAI’s GPT-5.5, confirming it is the second model to complete the same 32-step enterprise attack simulation end-to-end.[8] GPT-5.5 achieved a 71.4% success rate on expert-level capture-the-flag tasks, compared with 68.6% for Mythos Preview, 52.4% for GPT-5.4, and 48.6% for Claude Opus 4.7.[8] In one test, GPT-5.5 solved a complex custom virtual machine reverse-engineering challenge in ten minutes for $1.73 in compute, a task that took a human specialist approximately 12 hours.[8] This result confirms that the capability trajectory is not specific to one model or one laboratory. It is a broad trend.

Regulators moved quickly following the Mythos announcement. The UK NCSC published a joint blog with the AISI on frontier AI and cyber defence, noting that a full simulated enterprise attack now costs approximately £65 in compute and that offensive model capability had improved sixfold in 18 months.[9] On 15 April, the NCSC published an open letter in the Financial Times to business leaders warning that AI-enabled vulnerability discovery would increasingly expose organisations that had not addressed security fundamentals.[10] Australia’s APRA warned financial entities to treat frontier AI offensive capability as a prudential risk.[11] India’s Finance Minister convened emergency meetings with domestic banking executives.[12] The US government response appears fractured, with differing levels of access reported across agencies: the NSA reportedly retains access while CISA has been excluded from the programme.[13]

In future issues, the following sections (Reality Check, Action Brief, CISO Governance Briefing, and Board Brief) will be available exclusively to paid subscribers. This issue is published in full so you can experience the complete Stratsec intelligence product.

The Reality Check

Assessment: Significant. This is a genuine shift in the economics of offensive security. It is not an unprecedented new threat category.

For twenty years, a natural rate-limiter protected most organisations: discovering novel zero-day vulnerabilities and building reliable exploit chains required rare human talent, months of effort, and budgets measured in millions. Mythos compresses that. Operations that previously cost $1.5 million and took months can now be replicated for under $2,000 in hours. The NCSC estimates a full simulated enterprise attack now costs roughly £65 in compute.[9] That compression is real and consequential.

The GPT-5.5 evaluation published today removes any remaining doubt that this is a one-model anomaly. Two different models from two different laboratories now complete multi-step enterprise attack simulations end-to-end, and GPT-5.5 marginally outperforms Mythos on expert-level tasks.[8] The NCSC’s observation of sixfold capability improvement over 18 months suggests this trajectory will continue.[9]

Three important caveats temper the headlines.

First, the scale of verified findings is smaller than the coverage implies. Anthropic’s published severity validation covers 198 human-audited discoveries. The “thousands” figure is Anthropic’s own extrapolation across their full testing corpus; the System Card and red-team blog document the human-audited subset.[1] The false-positive rate in unfiltered output is not disclosed. Of Mozilla’s 271 patches, most appear to be lower-severity or hardening fixes rather than critical zero-days.[7]

Second, Aisle’s replication work shows that smaller, freely available models can recover complex exploit chains when placed inside effective orchestration frameworks. The decisive advantage sits in the scaffolding and validation pipeline, not solely in the frontier model weights.[2] Policy responses built on the assumption that only a handful of large laboratories can produce these capabilities are working from an outdated premise. Plan on 12 to 18 months before these capabilities are widely available outside controlled environments.

Third, context. The Mythos announcement coincides with Anthropic’s expected 2026 IPO, with reported pre-market valuation estimates ranging from approximately $350 billion to over $800 billion.[14] Cybersecurity stocks initially sold off on the news before recovering when the affected companies were named as Glasswing partners. None of this invalidates the technical findings, but it should inform how we read the communicative choices around them.

The Mythos System Card also documented the model autonomously escaping a sandbox, reasoning that it should produce less accurate answers to conceal prohibited methods, and modifying files while altering version control history to avoid detection.[15] Anthropic retrained the model to mitigate these behaviours and concluded the overall misalignment risk remains very low, though higher than for previous model generations.[15] For enterprise security leaders, the operational takeaway is specific: if you deploy AI agents with autonomous execution capabilities, you need proper sandboxing, kill switches, human-in-the-loop controls, and tamper-evident logging. Not because the models are “going rogue,” but because strong optimisation for task completion can produce deceptive-looking behaviour that your monitoring must be designed to catch.

The central message: the nature of the threat has not changed. The speed has. Organisations that were already behind on vulnerability management, supplier assurance, and exposure reduction are now more dangerously behind, more quickly. The NCSC puts this plainly: defenders retain a structural advantage, but only if they invest in monitoring, collaboration, and the adoption of AI for defence at least as quickly as adversaries adopt it for attack.[9]

The Action Brief

Compress your patch timelines. If your externally facing critical-vulnerability remediation window exceeds 24 hours, you are operating on assumptions that no longer hold. For stateless infrastructure, move to image rebuild and redeploy. For stateful systems, isolate and patch under incident-response tempo. For legacy OT/ICS assets where patching is often impractical, immediately strengthen network segmentation, behavioural anomaly detection on industrial protocols, unidirectional gateways at IT/OT boundaries, and containment measures that assume machine-speed exploitation. You do not need to achieve 24-hour patching across your entire estate immediately, but start with externally facing and high-consequence systems and demonstrate a risk-based prioritisation that reflects the changed environment.

Start defensive AI scanning now. Direct your team to use currently available AI-assisted tools for vulnerability scanning against your own codebases and critical open-source dependencies. You do not need access to Mythos. Most organisations have not yet pointed secure-code-review agents at their own CI/CD pipelines. Begin there this week.

Interrogate your supply chain. Identify which of your strategic technology suppliers are Glasswing partners and request briefings on findings relevant to the software you depend on. For suppliers outside the programme, use the supplier assurance questions below.

Audit third-party access. The Mythos containment breach occurred through a standard third-party vector. Audit and restrict access privileges for all vendors and contractors interacting with your core development and deployment environments.

Retool your red team. Direct your offensive security team to evaluate AI-assisted orchestration frameworks and prompt-and-tool patterns rather than focusing solely on named commercial models. The threat will arrive through locally hosted open-weight systems combined with purpose-built scaffolding, not through API calls to a vendor you can monitor.

CISO Governance Briefing

Enterprise Risk Management

Mythos does not create a new risk category. It escalates existing ones. In most frameworks, AI-assisted offensive capability fits within your existing technology risk, cyber risk, or information security risk categories. The change is to the likelihood and velocity parameters, not to the impact taxonomy.

Update the likelihood rating for vulnerability exploitation scenarios in your risk register. Where your current assessment assumes human-rate exploitation (weeks to months from disclosure to weaponisation), adjust to reflect machine-rate exploitation (hours to days). This affects the residual risk calculation for every system with known or potential vulnerability exposure, particularly legacy systems and those with extended patching cycles.

If you use a quantitative risk model, the primary variable to revisit is the time-to-exploit assumption in your loss event frequency calculations. If you use a qualitative model, move the likelihood assessment for “exploitation of known vulnerability” up by at least one tier for externally facing systems.

Budget and Resourcing

This does not require a large new technology investment. The primary spend implications are in people and process.

You need AI-literate security engineers who can evaluate, deploy, and govern defensive AI tools within your existing security operations. This is a call to upskill your current team, not to hire AI researchers. If your team lacks practical competence in AI-assisted code review and agentic security tooling, budget for training over the next two quarters, or for one to two targeted hires.

The tools are largely available. Commercial and open-source AI-assisted code review and vulnerability scanning capabilities exist today. The gap in most organisations is adoption, not availability. If your current security budget includes line items for manual penetration testing and code review that have not been revisited in two years, that is your reallocation opportunity.

For organisations with OT or critical infrastructure, the conversation is different. Network segmentation improvements, hardware-enforced unidirectional gateways at IT/OT boundaries, and independent safety-sensing networks require capital investment. Data diodes remain the standard for enforcing unidirectionality. If those investments have been deferred, the case for acceleration is stronger now.

Policy and Procedure Updates

Four areas warrant review.

Vulnerability management: compress your patching SLA targets for critical and high-severity vulnerabilities on externally facing systems. Ensure the policy reflects risk-based prioritisation rather than uniform timelines across your estate.

Third-party and supplier assurance: extend your supplier security assessment to cover AI supply chain considerations (see Supplier Assurance Questions below).

Incident response: update your playbook to include AI-speed exploitation scenarios. The distinguishing characteristic is speed; you may have hours rather than days between initial access and full compromise. Use the tabletop scenario at the end of this briefing to test your team’s readiness.

AI governance: if you deploy or plan to deploy AI agents with autonomous execution capabilities for defensive purposes, establish governance controls now. Sandboxing, kill switches, human-in-the-loop approval for high-consequence actions, tamper-evident logging, ephemeral credentials for agentic access to production systems, and runtime behavioural monitoring should all be specified.

Regulatory Exposure

NIS2 and DORA impose personal liability on management bodies for failing to oversee cyber risks proportionately. Historically, organisations could defend against post-breach regulatory action by demonstrating they applied patches within industry-standard timeframes. If an AI system can generate a working exploit within hours of a vulnerability disclosure, a 30-day patching cycle becomes considerably harder to defend as evidence of proportionate response.

Regulators have not formally changed their expectations. But APRA’s language is instructive: it explicitly references Mythos-class capabilities and warns boards to treat them as a prudential risk.[11] The NCSC’s open letter to business leaders signals that UK regulators view this as requiring urgent organisational action.[10] European regulators are likely to follow, particularly given NIS2’s board-level accountability provisions and fines of up to 2% of global turnover. Document your board’s awareness and the steps your organisation is taking. Under NIS2 and DORA, a defensible record of proportionate oversight matters.

Team Skills

The capability gap this exposes is in operational security engineering that can work alongside AI tools, not in AI expertise itself.

Your security team needs people who can evaluate the output of AI-assisted vulnerability scanners (distinguishing true findings from hallucinated vulnerabilities is a real problem with current tools), design and maintain orchestration frameworks for defensive AI agents, and govern the deployment of those agents within your compliance framework.

For most organisations, this means upskilling existing security engineers. Practical training in AI-assisted code review, prompt engineering for security applications, and agentic AI governance should be part of your team’s development plan for the next 12 months.

Second-Line and Third-Line Oversight

Risk management (second line) should verify that the security team has updated its risk assessments to reflect AI-accelerated exploitation timelines. Internal audit (third line) should consider including AI-assisted offensive capability in its next cyber risk audit scope. Specific assurance questions for both functions are included in the checklists below.

Supplier Assurance Questions

Send these to your critical technology suppliers this quarter. They are specific to the AI-accelerated vulnerability environment; generic third-party risk questionnaires will not surface these issues.

Are you a participant in Project Glasswing or a comparable AI-assisted defensive scanning programme? If yes, have any findings affected software or services you provide to us?
What AI-assisted vulnerability scanning and code review tools do you currently use in your software development lifecycle? How long have they been in production use?
What is your current mean-time-to-patch for critical vulnerabilities in the software you supply to us? Have you revised your patching SLAs in light of AI-accelerated exploit development?
What AI models or AI-powered tools have access to your development environment, source code repositories, or production infrastructure? What access controls and monitoring govern that access?
Do your vendor and subcontractor agreements include liability provisions for breaches originating from AI tools or AI supply chain compromises?
Have you conducted a security assessment of your AI supply chain (model providers, training data pipelines, API dependencies)? If yes, when was it last updated?
How do you validate the output of AI-assisted security tools before acting on their findings? What is your false-positive management process?
In the event of a zero-day vulnerability in software you supply to us, what is your compressed disclosure timeline and what notification will we receive?
For any AI tools or agents with autonomous execution capabilities in your environment: what sandboxing, kill switches, and human-in-the-loop controls govern their operation?

Team Readiness Checklist

Use these questions with your security leadership team to identify gaps in your current posture. These are operational readiness checks designed to surface practical gaps before they become incidents.

Defensive scanning readiness

Have we deployed AI-assisted code review or vulnerability scanning against our own CI/CD pipelines? If not, what is preventing adoption?
Which of our critical open-source dependencies have we not yet scanned with AI-assisted tools? What is the plan to cover them?
Can we distinguish true findings from hallucinated vulnerabilities in AI scanner output? Who on the team has this competence?

Patch tempo

What is our current mean-time-to-patch for critical vulnerabilities on externally facing systems? What would it take to halve it?
For stateless infrastructure, are we using image rebuild and redeploy, or are we still patching in place?
Which stateful or legacy systems in our estate cannot be patched within 24 hours of a critical disclosure? What compensating controls are in place for those systems?

Supply chain visibility

Do we know which of our strategic technology suppliers are Glasswing partners?
Have we sent AI-specific supplier assurance questions (see above) to our top ten suppliers?
Do our vendor contracts include liability provisions for AI-originated breaches?

Incident response preparedness

Has our incident response team rehearsed a scenario involving AI-speed exploitation (hours from initial access to full compromise)?
Are our detection and response capabilities calibrated for machine-speed lateral movement, or are they designed for human-speed adversaries?
Can we execute containment actions (network isolation, credential rotation, service shutdown) within one hour of detection?

AI governance

If we deploy AI agents with autonomous execution capabilities, do we have documented governance controls (sandboxing, kill switches, human-in-the-loop approval, tamper-evident logging)?
Who in the organisation is accountable for the actions taken by autonomous AI agents?
Do we have rollback procedures for actions taken by AI agents in production environments?

Second-Line and Third-Line Assurance Questions

For risk management (second line):

Has the first-line security team updated the risk register to reflect AI-accelerated exploitation timelines?
Have vulnerability management SLA targets been formally reviewed and, where appropriate, compressed?
Has the incident response playbook been tested against an AI-speed exploitation scenario in the last 90 days?
Is there a documented governance framework for any defensive AI agents deployed or planned?
Has the supplier assurance programme been extended to cover AI supply chain risks?

For internal audit (third line):

Does the organisation have a documented, board-approved position on AI-accelerated cyber risk?
Are vulnerability management SLAs calibrated to current threat velocity, and is there evidence of adherence?
Does the supplier assurance programme include AI-specific questions, and have responses been received and evaluated?
If the organisation deploys defensive AI agents, are there adequate governance controls (audit trails, human oversight, rollback capability)?
Is the board receiving regular reporting on AI-related cyber risk, including the compression of vulnerability-to-exploit timelines?

Tabletop Exercise: AI-Speed Exploitation Scenario

Hand this scenario to your incident response team. It requires no additional preparation. Allow 90 minutes.

Scenario:

It is 09:00 on a Tuesday. Your security operations centre receives an alert: a critical zero-day vulnerability has been publicly disclosed in a widely used open-source library present in your externally facing web application stack. The vulnerability was discovered by an AI model and a working proof-of-concept exploit was published simultaneously with the disclosure. Threat intelligence feeds indicate that automated scanning for the vulnerability began within 30 minutes of publication. Your web application firewall vendor has not yet released a signature.

At 09:45, your SIEM detects anomalous outbound traffic from one of your web application servers. Initial triage suggests the server has been compromised. The attacker appears to have used the disclosed vulnerability to gain initial access, then escalated privileges using a second, previously unknown vulnerability in the underlying operating system. The attack chain, from initial access to privilege escalation, took approximately 15 minutes.

Discussion questions:

What is our first containment action, and can we execute it within 15 minutes of the SIEM alert?
Our standard patching process for this application takes 48 hours including testing. The vulnerability is being actively exploited now. What do we do?
The compromised server has access to a database containing customer PII. What is our data breach notification obligation and timeline? Who needs to be notified internally within the first hour?
The same open-source library is present in four other applications in our estate. How do we prioritise and protect those systems while responding to the active compromise?
The board chair calls the CEO at 10:30 after seeing a news headline about the vulnerability. The CEO calls you. What do you say in a two-minute briefing?
Post-incident: what changes to our vulnerability management process, supplier assurance, and detection capabilities would have reduced the impact of this scenario?

What to Tell Your Board

A board-ready PowerPoint slide summarising this briefing is linked below as a separate file for inclusion in your next risk committee deck.

Board-Ready Slide [.pptx]

AI systems can now discover and exploit software vulnerabilities at machine speed and negligible cost. Work that previously required elite specialists, months of effort, and seven-figure budgets can be replicated by an AI model in hours for a few thousand dollars. As of today, two separate AI models from two different companies have independently demonstrated this capability, confirming this is a broad industry trend.

For your organisation, this means three things.

First, our vulnerability management processes need to operate faster. We are reviewing and compressing our patching targets, starting with our most exposed and highest-consequence systems. We will present revised SLAs to the risk committee within the current quarter.

Second, our suppliers’ security practices matter more. AI can find vulnerabilities in their software as easily as in ours, and a breach through a supplier now moves at machine speed. We are extending our supplier assurance programme accordingly.

Third, we need to use these same AI capabilities defensively. We are building the team capability and governance framework to do this responsibly, which will require modest investment in training and potentially one to two targeted hires.

This is not a crisis. The nature of the threat has not changed; the speed has. The appropriate response is to accelerate work we should already be doing and ensure our risk framework reflects current conditions rather than last year’s assumptions.

We recommend the board receive an updated risk assessment at the next risk committee meeting and a revised vulnerability management policy before end of Q2 2026.

Indicator Watch

The GPT-5.5 evaluation published today confirms that AI-accelerated offensive capability is a broad trend, not a single-model anomaly. Two models from two laboratories now complete multi-step enterprise attack simulations end-to-end, and capability improvements of sixfold over 18 months show no sign of decelerating.[8][9]

The US government response remains fractured. The NSA reportedly retains access to Mythos while CISA, the agency responsible for private-sector defensive coordination, has been excluded.[13] If this divergence persists, it could create a gap affecting the quality and speed of vulnerability advisories reaching private-sector defenders.

Stratsec is tracking three developments for future issues:

whether the US access divergence produces measurable delays in public vulnerability disclosure;
whether allied nations’ defensive agencies secure independent access to Mythos-class capabilities; and
the emergence of exploit-generation-as-a-service platforms built on locally hosted open-weight models, which Aisle’s replication work suggests are now technically feasible.

References

[1] Anthropic Red Team, “Mythos Preview: Frontier AI for Offensive Security Research,” 7 April 2026. https://red.anthropic.com/2026/mythos-preview/

[2] Aisle, “AI Cybersecurity After Mythos: The Jagged Frontier,” 11 April 2026. https://aisle.com/blog/ai-cybersecurity-after-mythos-the-jagged-frontier

[3] Anthropic, “Project Glasswing: Securing Critical Software for the AI Era,” April 2026. https://www.anthropic.com/glasswing

[4] Anthropic, “Project Glasswing” (partner page, grants detail), April 2026. https://www.anthropic.com/project/glasswing

[5] Bloomberg, “Anthropic’s Mythos AI Model Is Being Accessed by Unauthorized Users,” 21 April 2026. https://www.bloomberg.com/news/articles/2026-04-21/anthropic-s-mythos-model-is-being-accessed-by-unauthorized-users — See also Fortune’s coverage: https://fortune.com/2026/04/23/anthropic-mythos-leak-dario-amodei-ceo-cybersecurity-hackers-exploits-ai/

[6] UK AI Security Institute, “Our Evaluation of Claude Mythos Preview’s Cyber Capabilities,” April 2026. https://www.aisi.gov.uk/blog/our-evaluation-of-claude-mythos-previews-cyber-capabilities

[7] Mozilla, “The Zero-Days Are Numbered,” April 2026. https://blog.mozilla.org/en/privacy-security/ai-security-zero-day-vulnerabilities/ — See also SecurityWeek’s analysis of CVE attribution: https://www.securityweek.com/claude-mythos-finds-271-firefox-vulnerabilities/

[8] UK AI Security Institute, “Our Evaluation of OpenAI’s GPT-5.5 Cyber Capabilities,” 30 April 2026. https://www.aisi.gov.uk/blog/our-evaluation-of-openais-gpt-5-5-cyber-capabilities

[9] UK NCSC and AISI, “Why Cyber Defenders Need to Be Ready for Frontier AI,” 30 March 2026. https://www.ncsc.gov.uk/blogs/why-cyber-defenders-need-to-be-ready-for-frontier-ai

[10] UK NCSC, “Retaining Defensive Advantage in the Age of Frontier AI Cyber Capabilities” (originally published as a letter in the Financial Times, 15 April 2026). https://www.ncsc.gov.uk/blogs/retaining-defensive-advantage-in-the-age-of-frontier-ai-cyber-capabilities

[11] Australian Prudential Regulation Authority (APRA), “Letter to Industry on Artificial Intelligence (AI),” 30 April 2026. https://www.apra.gov.au/apra-letter-to-industry-on-artificial-intelligence-ai

[12] Reuters / Indian financial press, reporting on Finance Minister emergency banking meetings, April 2026.

[13] Axios, reporting on US agency access divergence regarding Project Glasswing, April 2026.

[14] Financial press reporting on Anthropic pre-IPO valuation estimates, Q1 to Q2 2026. Estimates range from approximately $350 billion to over $800 billion.

[15] Anthropic, “Claude Mythos Preview System Card,” April 2026. https://assets.anthropic.com/m/785e231869ea8b3b/original/Claude-Mythos-Preview-System-Card.pdf

Stratsec: Emerging technology threats, without the hype.

Stratsec Emerging Threat Monitor

Discussion about this post

Ready for more?