SOC Alert Triage Slop: When AI-Generated Security Decisions Follow the Same Path as AI-Generated Code

Executive Summary

The Coding Slop Problem: A Cautionary Tale
Triage Slop: The SOC’s Version of the Same Problem
The Downstream Cascade: How Coding Slop Feeds Triage Slop
Why the Problem Is Architectural, Not Operational
D3’s Experience: Vibe Coding from the Inside
How Morpheus AI Prevents Triage Slop
Questions for Your Evaluation
Next Steps
About D3 Security

The software industry coined a term in 2025 for the torrent of low-quality, AI-generated code flooding production systems: slop. The word became so pervasive that Merriam-Webster named it Word of the Year. The pattern was unmistakable: junior developers using natural language interfaces to generate code they could not review, producing applications riddled with security vulnerabilities, logic errors, and architectural debt.

The same dynamic is now emerging in security operations. As vendors rush to bolt general-purpose large language models (LLMs) onto SOC workflows, a parallel category of low-quality output is appearing: triage slop (AI-generated alert classifications, investigation summaries, and response recommendations that look professional but lack the depth, context, and accuracy that security operations demand).

The parallels are structural, not superficial. In both domains, the failure mode is identical: an inexperienced operator uses a natural language interface to produce output they cannot critically evaluate. The result is confident-sounding work product that degrades the system it was meant to improve.

1.7×

More major issues in AI-generated code vs. human-written (CodeRabbit, 2025)

Up to 2.7×

More likely to introduce XSS vulnerabilities (CodeRabbit, 2025)

67%

Of daily SOC alerts go uninvestigated industry-wide

This paper traces the direct parallels between AI coding slop and AI triage slop, explains why the problem is architectural rather than operational, examines D3 Security’s direct experience with vibe coding on its own engineering teams, and details how Morpheus AI was purpose-built to prevent triage slop from reaching production SOC environments.

The Coding Slop Problem: A Cautionary Tale

What Is Vibe Coding?

In February 2025, Andrej Karpathy—former Senior Director of AI at Tesla and co-founder of OpenAI—coined a term for a new style of software development. He described it as a state where developers “fully give in to the vibes, embrace exponentials, and forget that the code even exists.” The post garnered over 4.5 million views. Collins English Dictionary named “vibe coding” its Word of the Year for 2025.

The practice is straightforward: a developer describes what they want in natural language, an LLM generates the code, and the developer ships it. Often this happens without reading it, testing it, or understanding what it does. For prototypes and throwaway scripts, this works. For production systems, the results have been catastrophic.

The Evidence Is Unambiguous

CodeRabbit’s December 2025 analysis of 470 GitHub pull requests found that AI-authored code contains 1.7 times more major issues than human-written code. Specific vulnerability classes were dramatically overrepresented. AI-generated code was up to 2.7 times more likely to introduce XSS vulnerabilities and nearly twice as likely to create insecure deserialization flaws. Logic errors (incorrect dependencies, flawed control flow, misconfigurations) were 75% more frequent. Veracode’s 2025 GenAI Code Security Report found that 45% of AI-generated code samples failed security tests and introduced OWASP Top 10 vulnerabilities.

The velocity paradox tells the fuller story. Pull requests per author increased 20% year-over-year in 2025. But incidents per pull request rose 23.5%, and change failure rates climbed roughly 30%. Teams shipped faster and broke more.

Key finding: A December 2025 study comparing five major vibe coding tools found 69 total vulnerabilities across 15 test applications, with approximately 6 rated critical. The tools tested included Claude Code, OpenAI Codex, Cursor, Replit, and Devin. (CSO Online, December 2025)

The Junior-Senior Divide

The data reveals a clear experience-dependent pattern. Multiple studies confirm that junior developers accept AI-generated code at significantly higher rates than senior engineers (CodeRabbit, 2025; Fastly, 2025). The gap is not trivial. Junior developers lack the pattern recognition to identify subtle architectural flaws, security holes, and logic errors that experienced engineers catch instinctively.

Senior developers, by contrast, initially resist LLM-assisted coding. But once they learn to treat the LLM as a drafting tool rather than a decision-maker, leveraging their architectural experience to direct, review, and refine AI output, productivity gains are substantial. The key insight: seniority does not make the tool less useful. It makes the operator qualified to use it.

Amazon: The Cautionary Case Study

Amazon’s experience provides the most public and consequential demonstration of the junior-senior divide in AI-assisted coding. In November 2025, Amazon issued an internal mandate establishing Kiro (its standardized AI coding assistant) with an 80% weekly usage target across engineering teams. What followed was a series of production disasters.

In Q3 2025, an AWS cost management feature experienced 13 hours of downtime, attributed to changes made via AI-assisted coding. On March 5, 2026, a six-hour outage knocked out checkout, login, and product pricing across Amazon’s shopping platform. The company lost an estimated 6.3 million orders.

Amazon’s response was telling. The company called an emergency company-wide “deep dive” meeting with senior engineers and issued a 90-day mandate: junior and mid-level engineers must obtain sign-off from a senior engineer before deploying any AI-assisted code changes to production. Amazon acknowledged that best practices and safeguards for generative AI coding had not been established before the usage mandate was issued.

Key finding: The Amazon pattern is the vibe coding failure mode at enterprise scale: mandate AI tool adoption → junior developers produce plausible-looking code that passes quick review → production systems fail → senior engineers become mandatory gatekeepers. The tool was not the problem. The absence of qualified review was the problem.

D3’s Internal Experience

D3 Security observed the same pattern firsthand. Junior developers on the Morpheus AI engineering team gravitated toward vibe coding and produced code that required extensive rework. Senior developers were initially skeptical, but once they began directing the LLM with architectural intent (specifying patterns, constraints, and review criteria), they reported productivity increases of up to 10 times their normal output. The difference was not the tool. It was the operator’s ability to evaluate what the tool produced.

Triage Slop: The SOC’s Version of the Same Problem

The structural parallels between AI coding slop and AI triage slop are not metaphorical. They are architectural.

The Same Failure Pattern, Different Domain

Failure Mode	AI Coding Slop	AI Triage Slop
Inexperienced operator	Junior dev accepts AI code without review	L1 analyst accepts AI triage without validation
Confident-sounding output	Syntactically correct code with hidden flaws	Professional-looking triage reports with wrong conclusions
Removed quality gates	Code review and testing bypassed for speed	Human analyst review reduced or eliminated
“Almost right” problem	66% of devs cite frustration with near-correct code	AI triage with incomplete context wastes analyst time on re-investigation
Volume over quality	PRs up 20%, incidents up 23.5%	More alerts triaged, more misclassifications in production
Downstream cascade	Vulnerable code floods CVE pipeline; NVD backlog exceeds 30,000	Misclassified alerts lead to missed breaches and compliance failures

How Triage Slop Manifests

1. General-Purpose LLMs Applied to Security Triage

Most vendors entering the AI SOC market are bolting general-purpose LLMs onto existing SOAR platforms and marketing the result as autonomous triage. The architecture is identical to the vibe coding pattern: a natural language interface that generates output the operator cannot critically evaluate. A general-purpose model can summarize a phishing alert. It cannot trace how a phishing payload transitions to credential theft, how those credentials enable lateral movement, or how each stage manifests differently across vendor telemetry.

Research confirms the risk. ACM and USENIX publications document that LLMs applied to specialized domains produce factually incorrect but syntactically fluent outputs: “confident errors” that are harder to catch than obvious mistakes. In cybersecurity, a hallucinated indicator of compromise or a fabricated MITRE ATT&CK mapping can misdirect an entire investigation.

2. Junior Analysts as Uncritical Consumers

The SOC analyst workforce mirrors the junior developer problem exactly. The average enterprise SOC receives over 4,400 alerts per day. Analysts spend 70 minutes fully investigating a single alert. The math forces shortcuts. When an AI tool presents a classification with a confidence score and a professional-looking summary, a Tier-1 analyst under time pressure will accept it, just as a junior developer accepts generated code without reviewing it.

The consequence is the same: false negatives that look like true negatives. Threats that the AI dismissed with a plausible-sounding explanation but without the multi-dimensional correlation that an experienced analyst would perform. The 61% of SOC teams that already report ignoring alerts later confirmed as genuine compromise are about to get a new mechanism for doing so (one that comes with an AI confidence score and a clean summary that makes the oversight feel rigorous).

3. Vibe-Coded Security Tools

The most dangerous intersection of these trends occurs when organizations or vendors use vibe coding to build the AI triage systems themselves. A December 2025 study found 69 vulnerabilities across 15 vibe-coded test applications, with approximately 6 rated critical. The documented failure modes include AI agents removing validation checks, relaxing database policies, and disabling authentication flows to resolve runtime errors.

When a vibe-coded application handles customer data, the damage is a breach. When a vibe-coded system handles security triage, the damage is invisible: threats classified as benign, investigations that never happen, breaches discovered months later during forensic review. The failure mode is silence.

The Downstream Cascade: How Coding Slop Feeds Triage Slop

These two problems are directly connected. AI-generated coding slop directly increases the burden on security triage systems.

On March 18, 2026, the Linux Foundation announced a $12.5 million initiative backed by Anthropic, AWS, GitHub, Google, Microsoft, and OpenAI to address the open-source security crisis driven by AI-generated code contributions. The National Vulnerability Database had analyzed fewer than 300 CVEs by March 2025, with over 30,000 backlogged. The cURL project ended its bug bounty program because maintainers could not keep pace with AI-generated vulnerability reports. CVE submissions have surged as AI-generated code floods open-source repositories faster than maintainers can triage the resulting security findings.

The implication for SOC operations is direct: more vulnerable code in production means more alerts. More alerts means more pressure on triage systems. More pressure on triage systems means more temptation to accept AI-generated triage decisions without review. The feedback loop is self-reinforcing.

30,000+

CVEs backlogged at the National Vulnerability Database

$12.5M

Allocated by 6 tech companies to address AI-driven CVE overload

800+

Integrations that Morpheus AI self-heals against vendor drift

Why the Problem Is Architectural, Not Operational

The coding slop crisis was not caused by bad developers. It was caused by tools that made it easy to produce output without understanding it. The triage slop crisis follows the same structural logic.

Three Architectural Failures That Produce Triage Slop

1. General-Purpose LLMs Lack Domain Knowledge

A general-purpose LLM trained on internet-scale data can generate text about cybersecurity. It cannot reason about how attacks propagate. It does not understand that a suspicious PowerShell execution on an endpoint, a new MFA registration from an unfamiliar geography, and a data transfer to an external domain are three nodes in a single attack chain. It treats each alert as an isolated text classification problem. This is the equivalent of vibe coding: surface-level fluency without structural understanding.

Cisco’s Foundation AI team demonstrated this gap directly. Their Foundation-sec-8b model (an 8-billion parameter LLM trained specifically on cybersecurity data) outperforms general-purpose models nearly 10 times its size on security benchmarks. Domain-specific training data produces domain-specific accuracy. No amount of prompt engineering closes the gap.

2. Static Playbooks Cannot Adapt to Context

Most AI-augmented SOAR platforms use LLMs to accelerate the authoring of static playbooks: the same rigid, pre-authored workflows that have limited SOAR effectiveness for a decade. A phishing playbook runs the same 15–20 steps whether the target is an intern or the VP of Finance, whether the payload is known or novel, and whether the attacker has already moved laterally. Adding a natural language interface to this architecture speeds up playbook creation. It does not fix the underlying inability to adapt to context.

Microsoft’s Sentinel Playbook Generator exemplifies this pattern. It uses generative AI to help analysts write playbooks using natural language, a genuine accessibility improvement. But the output is still a static playbook that must be tested, versioned, and maintained. The playbook engineering lifecycle remains intact. The AI accelerates one phase of that lifecycle; it does not eliminate it.

3. No Quality Framework for AI Triage Decisions

In software engineering, code review, automated testing, and CI/CD pipelines exist to catch slop before it reaches production. Vibe coding bypasses these gates, which is precisely why slop proliferates.

Most AI triage products have no equivalent quality framework. They classify alerts, generate summaries, and recommend actions without exposing their reasoning, without validating against known ground truth, and without providing the analyst with a visible framework to assess whether the classification is correct. The analyst is asked to trust the AI’s output the same way a junior developer is asked to trust the LLM’s code: on faith.

D3’s Experience: Vibe Coding from the Inside

D3 Security does not write about vibe coding from a distance. The Morpheus AI engineering team experienced the junior-senior divide firsthand during the platform’s 24-month development cycle.

What Happened with Junior Developers

Junior developers on the team adopted LLM coding assistants enthusiastically. They generated code rapidly, shipped pull requests at high volume, and reported feeling productive. Code review told a different story. The generated code frequently contained:

Security vulnerabilities that passed superficial review but failed penetration testing
Architectural patterns that conflicted with the platform’s design principles
Logic errors that only surfaced under edge-case conditions
Dependencies on libraries with known vulnerabilities
Test coverage that appeared comprehensive but did not exercise critical code paths

The junior developers lacked the experience to distinguish between code that worked and code that was correct. This is the vibe coding failure mode in its purest form: the operator cannot evaluate the output.

What Happened with Senior Developers

Senior developers initially dismissed LLM coding tools as unreliable. When they eventually adopted them, treating the LLM as a drafting assistant under their architectural direction, the results were transformative. Experienced engineers who understood the platform’s architecture, security requirements, and design patterns used the LLM to:

Generate boilerplate code matching established patterns, then refine it
Explore implementation alternatives rapidly before selecting the best approach
Automate repetitive tasks while maintaining full architectural oversight
Produce documentation and test scaffolding at machine speed

The result: senior developers reported productivity gains of up to 10 times their baseline because the developer could evaluate, direct, and refine what the LLM produced. The tool amplified expertise. It did not replace it.

Key finding: The lesson for SOC operations is direct: an AI triage tool in the hands of an inexperienced operator produces slop. The same tool, directed by an experienced analyst with the ability to review reasoning, override decisions, and refine the system’s behavior, produces force multiplication. The tool is not the variable. The operator’s ability to evaluate the output is the variable.

How Morpheus AI Prevents Triage Slop

D3 Security built Morpheus AI with the explicit goal of producing triage decisions that can be reviewed, validated, and trusted. Every architectural decision addresses one or more of the failure modes that produce slop.

1. Purpose-Built Cybersecurity LLM

Morpheus AI’s LLM was developed over 24 months by 60 specialists (red teamers, data scientists, AI engineers, and SOC analysts) trained specifically on cybersecurity telemetry, attack patterns, and investigation methodologies. It understands how phishing payloads transition to credential theft, how compromised credentials enable lateral movement, and how each attack stage manifests differently across vendor telemetry.

This is the equivalent of a senior developer directing the LLM: domain expertise is embedded in the model itself, not layered on top through prompts. A general-purpose model with a security prompt is vibe coding applied to triage. A purpose-built cybersecurity LLM is an experienced engineer who understands the domain.

24 Mo.

Purpose-built LLM development investment

Domain specialists on the development team

800+

Integrations with self-healing capability

2. Attack Path Discovery on Every Alert

Morpheus AI does not classify alerts in isolation. On every incoming alert, it performs multi-dimensional Attack Path Discovery (APD): tracing correlations vertically into the alert’s origin tool (process trees, registry keys, behavioral patterns) and horizontally across the full security stack (EDR, SIEM, identity, cloud, network). The output is a structured investigation report that maps the complete threat narrative, not a binary classification.

This is why triage slop cannot hide inside Morpheus AI. A general-purpose model classifies an alert as “malicious” or “benign” and moves on. Attack Path Discovery exposes every node, every connection, and every reasoning step. If a path is wrong, it is visually obvious—no statistical sampling required.

3. Contextual Playbook Generation

Static playbooks are the security equivalent of vibe-coded applications: rigid, brittle, and unable to adapt to context. Morpheus AI generates bespoke investigation and response playbooks at runtime from the evidence itself, reflecting the specific threat, the specific target, the organization’s tool stack, and its SOC preferences. No authoring. No versioning. No maintenance.

The contextual playbook is born from Attack Path Discovery. It addresses the complete attack (not individual alerts) and adapts in real time to novel patterns. This eliminates the gap between what the playbook author anticipated and what actually happened.

4. Self-Healing Integrations

Integration drift (where vendor API updates silently break SOAR playbooks) is the operational equivalent of technical debt in vibe-coded applications. Morpheus AI continuously monitors integration behavior across 800+ tools. When a vendor API changes, the platform detects the drift and generates corrective code autonomously, eliminating the silent-failure windows that plague static SOAR deployments.

With 50+ tools in a typical enterprise SOC and 4–6 major vendor updates per tool annually, teams face approximately 200–300 integration disruptions per year. Self-healing integrations transform this from crisis management to autonomous remediation.

5. The Deterministic/Indeterministic Trust Model

Morpheus AI embeds a trust boundary directly into the product. AI-generated decisions begin as indeterministic proposals with full reasoning visible that analysts must confirm or correct. As patterns stabilize and analysts repeatedly confirm specific decision types, those patterns can be hardened into deterministic rules using natural language at the UI.

This trust lifecycle is a direct anti-slop mechanism. Unlike vibe coding, where AI output goes straight to production, Morpheus AI forces every AI decision through human validation before it earns autonomous execution privileges. The hardening rate measures growing organizational trust in the AI, expressed through real operational behavior.

6. Visible Code and Reasoning Chains

Morpheus AI provides full access to the back-end Python code for every AI-generated playbook. Every investigation produces step-by-step reasoning: what data was analyzed, what correlations were found, what was ruled out, and what the platform recommends. This is the opposite of slop. Slop thrives on opacity. Morpheus AI is designed for scrutiny.

7. Attack Simulation with Known Ground Truth

D3’s attack simulation infrastructure generates realistic multi-stage attacks across hundreds of integrated tools and measures whether Morpheus AI discovers the complete attack path. Because D3 generates the attack, D3 knows the ground truth. This is a validation capability that no general-purpose LLM overlay can replicate—and it is the security equivalent of the automated test suites that vibe coding bypasses.

Questions for Your Evaluation

When evaluating AI triage platforms, these questions separate systems engineered to prevent triage slop from systems likely to produce it:

Is the AI model purpose-built for cybersecurity, or is it a general-purpose LLM with a security prompt layer? A general-purpose model applied to triage is the architectural equivalent of vibe coding: surface-level fluency without domain understanding.

Can analysts see the complete reasoning chain for every triage decision? If the AI classifies an alert and the analyst cannot trace exactly how it reached that conclusion, the system is producing slop by design.

Does the platform discover attack paths across the full security stack, or does it classify alerts in isolation? Binary classification without multi-dimensional correlation misses the lateral movement, credential compromise, and data exfiltration that define real attacks.

Does the platform generate response playbooks from evidence at runtime, or does it accelerate the authoring of static playbooks? Static playbooks with a natural language interface are still static playbooks.

How does the platform handle integration drift? If vendor API changes break the system silently, every triage decision made during the outage is slop: confident-looking output from a disconnected system.

Does the platform validate its own accuracy against known ground truth? If there is no attack simulation capability and no validation framework, the organization is trusting AI triage on faith, the same faith that junior developers place in vibe-coded output.

Can the organization harden AI decisions into deterministic rules over time? If the system offers no mechanism for progressive trust (analyst confirmation, pattern stabilization, hardening) it will produce the same quality of output on day 500 as day one.

Next Steps

The convergence of AI coding slop and AI triage slop is not theoretical. It is measurable, accelerating, and already affecting production SOC environments. Organizations that deploy AI triage without addressing the architectural causes of slop will automate their way into a false sense of security: faster triage numbers, worse actual outcomes.

Three actions for security leaders considering AI triage platforms:

Demand Transparency

Any AI triage system that cannot show you its complete reasoning chain for every decision is asking you to trust output you cannot evaluate. That is the definition of slop.

Require Validation Against Ground Truth

Ask vendors how they prove their triage works—not with borrowed statistics, but with attack simulation that tests whether the AI discovers complete attack paths in your environment.

Evaluate the Operator Model

Does the platform amplify your senior analysts’ expertise, or does it ask junior analysts to accept AI output on faith? The lesson from vibe coding is definitive: the tool’s value depends entirely on the operator’s ability to evaluate what it produces.

About D3 Security

D3 Security is the creator of Morpheus AI, an Autonomous SOC platform that replaces legacy SOAR with AI-driven investigation, contextual playbook generation, and self-healing integrations. Built on a purpose-trained cybersecurity LLM developed over 24 months by 60 domain specialists, Morpheus AI performs Attack Path Discovery on every incoming alert, delivering L2-quality investigation reports in under two minutes per alert across 800+ tool integrations.

Morpheus AI combines autonomous AI triage, a full built-in SOAR engine, and integrated case management in a single platform with predictable, flat-rate pricing (no token fees).