Top 10 OWASP Vulnerabilities in LLM (and How to Avoid Them in Development)

Top 10 OWASP-2300×1294

OWASP vulnerabilities in LLM are no longer theoretical. As teams wire large language models into products and workflows, the OWASP Top 10 for LLM Applications has become the baseline taxonomy for risks you’re expected to recognize and mitigate.  

In this article, we’ll walk you through the OWASP list item by item, explain the failure mode, and then show you what to implement to prevent/minimize disruption.  

Let’s put on our risk-identifying hat and let’s get going! 

Why LLM Security Matters 

Modern LLM systems aren’t just models; they’re applications with data flows, tools, plugins, and users. That means they inherit traditional app risks plus LLM-specific ones (e.g., prompt injection, insecure output handling). The right way to manage this is to treat LLM safety as an ongoing risk program, not a one-time checklist.  

NIST’s AI Risk Management Framework (AI RMF 1.0) provides that structure:

  

GOVERN → MAP → MEASURE → MANAGE 

And the Generative AI Profile (NIST.AI.600-1) adds concrete actions tailored to genAI. Together, they give product and security teams a shared language for policy, engineering controls, and measurement. 

MITRE’s ATLAS catalogs real adversary tactics against AI systems like data poisoning, model extraction, and prompt-based manipulation, so that you can do threat modeling with concrete techniques instead of hypotheticals. Pairing ATLAS with OWASP’s LLM Top 10 helps you connect what attackers do with what builders should defend. 

Finally, enterprise guardrails are maturing. Google’s Secure AI Framework (SAIF) distills AI-specific controls (identity, data, supply chain, monitoring) into a practitioner playbook you can map to your SDLC and cloud stack—useful when turning policy into repeatable engineering work. 

End-to-End Secure LLMs From OWASP-aligned design to evaluations and monitoring, Svitla builds LLM apps that are auditable and safe to scale. Talk to an expert

At-a-Glance: The Top 10 OWASP Vulnerabilities 

OWASP ID  Risk  One-line mitigation 
LLM01  Prompt injection Treat external content as untrusted, isolate from control prompts, enforce allowlisted tool scopes, and validate outputs before actions. 
LLM02 Insecure output handling Treat output as untrusted; require schemas/validation; sanitize/escape; gate side effects behind a policy layer. 
LLM03 Training data poisoning Prove provenance; allowlist sources; DLP/moderation before tuning/indexing; anomaly checks; threat-informed evals. 
LLM04 Model DoS Bound work (tokens/steps/time); rate-limit by tokens; preflight token estimates; circuit breakers; cache; monitor cost/latency. 
LLM05 Supply chain vulnerabilities Apply SSDF/SLSA; sign artifacts; SBOM/AI-BOM; pin/scan dependencies; least-privilege plugins; verify providers. 
LLM06 Sensitive information disclosure Minimize/ACL retrieval; DLP/redaction pre-inference; output filters; scrub logs; provider data policies; encrypt/isolate tenants. 
LLM07 Insecure plugin design Narrow scopes; schema validation at the boundary; isolate execution; harden against indirect injection; audit tool calls. 
LLM08 Excessive agency Limit autonomy by design; separate decide vs. do; approval gates for risky actions; evaluate agent behavior continuously. 
LLM09 Overreliance on output Ground answers; verify before acting; measure groundedness; continuous evaluations; keep output handling strict. 
LLM10 Model theft (extraction) Lock down registries/artifacts; quotas/rate limits; minimize high-information outputs; watermark forensics; monitor for extraction patterns. 

LLM01: Prompt Injection 

What it is. Prompt injection is when untrusted input from a user or any external source that the system reads alters the model’s behavior, overriding instructions, exfiltrating data, or triggering unintended tool or API calls. OWASP defines it as inputs that cause the LLM to produce harmful or unexpected actions or outputs, and it tops the OWASP Top 10 for LLM Applications

Why it happens. LLMs are optimized to follow instructions and incorporate context. When that context contains hidden or adversarial instructions (e.g., in PDFs, web pages, emails, retrieved documents), the model may prioritize them over your system prompt or policies. 

What it breaks. At minimum, it degrades answer quality; at worst, it executes unintended actions via tools (sending emails, changing records), leaks sensitive data, or runs up cost/latency by forcing long contexts and unnecessary tool loops. 

How to avoid it 

Treat all external content as untrusted. Separate “instructions” from “data.” Never mix retrieved content (RAG, web pages, PDFs, tickets) directly into the model’s control channel. Maintain strict context boundaries and use metadata to label untrusted segments so the system prompt can consistently down-rank or ignore instructions that originate from data

Constrain actions with allowlists and least-privilege scopes.  Agents should only call approved tools, with minimal scopes (e.g., read-only by default, write-only in narrow domains). Gate high-risk functions behind human approval (HITL). This reduces the blast radius even if the injection occurs. 

Validate and sanitize both input and output. 

  • Input: strip or neutralize instruction-like patterns in untrusted content before inclusion. 
  • Output: treat model output as untrusted until validated. Enforce strict schemas for tool/function calling, and reject outputs that attempt to escalate privileges or alter policy. 

Isolate tool execution and enforce policy at the boundary. Run tools in sandboxes or isolated microservices; verify that every tool invocation passes a policy check, not just a prompt check. Log all tool calls with inputs/outputs for forensics and rollback. 

Monitor for anomaly patterns (and cost spikes). Add observability for chain traces, token usage, and tool call sequences. Sudden context bloat, repeated tool retries, or unusual API targets are injection indicators.  

Test against known attack patterns before and after release. Incorporate red-team prompts and indirect-injection test corpora into CI/CD. Use adversarial examples embedded in web pages/docs to verify the guardrails. 

LLM02: Insecure Output Handling 

What it is. Passing an LLM’s raw output to downstream systems without validation or sanitization. Because model output can be influenced by prompts and retrieved data, it’s akin to giving users indirect access to functionality. That’s why OWASP classifies it as a top risk for LLM apps.  

Why it happens. Teams treat model output as “trusted,” then let it drive actions (e.g., tool calls, SQL, HTML/Markdown rendering) or persist it unfiltered. Long chains and agents amplify this: one unsafe step can cascade into others. 

What it breaks. Command/SQL injection via generated strings, XSS/HTML injection when rendering, unsafe tool/API calls (fund transfers, ticket edits), data corruption, and escalation to excessive agency or DoS via recursive actions.

How to avoid it 

  1. Treat model output as untrusted.  Enforce strict schemas for tool and function calls. Reject or repair malformed output. Gate any side effects behind a policy layer (allowlists, scopes, human approval for high-risk actions) 
  1. Sanitize before you render or execute. Escape/strip HTML, block dangerous protocols, parameterize queries, and never execute generated code or commands without a sandbox.  
  1. Separate “what to do” from “how to do it.” Don’t let free-text flow straight into interpreters, shells, or plugins. 
  1. Instrument & evaluate. Trace every step, add quality/safety checks, and alert on policy violations. 

LLM03: Training Data Poisoning 

What it is. Training data poisoning is when attackers (or poor curation) manipulate data used for pre-training, fine-tuning, or embedding so the model learns harmful patterns, biased behaviors, or latent backdoors that can be triggered later. OWASP lists it as LLM03 in the Top 10 for LLM Applications. 

Why it happens. LLM pipelines ingest vast, dynamic, and often external data. That creates opportunities to seed malicious content upstream, or to slip tainted samples into fine-tuning or RAG (retrieval-augmented generation) ingestion flows. Open ecosystems and adapter-based tuning add supply-chain exposure if weights, datasets, or loaders are not verified. 

What it breaks. Integrity and trust. Poisoned models can degrade accuracy, amplify bias, leak toxic content under triggers, or execute backdoored behaviors on specific prompts, problems that can remain invisible until deployment. 

How to avoid it 

Establish provenance for all data and weights. You should maintain dataset versioning, hashes, lineage, and approvals. You also need toerify that adapters and base weights come from trusted publishers, and pin exact versions. 

Curate and gate what enters training or tuning. It’s recommended that you use allowlists for data sources, block open web scraping for training without review, and require content moderation before data reaches the pipeline for enterprise RAG or fine-tuning. 

Detect tainted samples with automated checks. Run deduplication, perform outlier/anomaly detection, enforce label-consistency checks, and toxic/bias filters. Keep gold set and backdoor-trigger tests to catch targeted manipulation. 

Continuously evaluate models for poisoning symptoms. Before and after training, run threat-informed tests: trigger phrases, class-flip probes, and targeted behavior checks. Keep regression suites to spot sudden accuracy drops or biased outputs after data updates. 

Instrument post-training monitoring with rollback. Track drift, distribution shifts, accuracy/quality metrics, and unusual failure modes. If you detect issues, be able to reproduce the data slice, revert to a clean checkpoint, and quarantine the suspect contribution.

LLM04: Model Denial of Service 

What it is. Model DoS happens when an attacker (or just a bad prompt pattern) drives the LLM to consume excessive resources (very large inputs, long generations, deep tool loops), causing slowdowns, outages, or runaway bills. OWASP explicitly calls out how overloading LLMs with resource-heavy operations causes service disruption and increased costs. 

Why it happens. LLM inference is compute-intensive. Without hard limits, adversaries can force context bloat, trigger recursive actions, or spam high-cost operations.  

What it breaks. Availability SLAs, latency SLOs, multi-tenant fairness, and your budget. Because spend scales with tokens and calls, DoS can be both an uptime and a financial incident. 

How to avoid it 

Bound work per request. Enforce steps per task, tool calls per task, and wall-time. Reject or truncate oversize inputs before the model runs. This turns worst-case prompts into bounded cost/latency. 

Rate-limit and quota by user/tenant on tokens, not just requests. Apply RPM/TPM limits and sliding-window throttles; queue with backpressure.  

Pre-flight token estimation and early rejection. Estimate tokens before inference; reject or ask to summarize when predicted size exceeds policy. 

Circuit breakers, timeouts, and recursion guards. Add per-step timeouts, cumulative wall-time caps, and a maximum depth for agent loops. Trip a breaker on repeated tool failures or abnormal token growth and surface a safe fallback. 

Cache and deduplicate expensive work. 

Introduce response caching for idempotent queries and retrieval caching to avoid re-embedding/re-indexing hot content.  

Observability with cost & token telemetry. Trace every step (inputs, outputs, latency, token usage, tool calls) and alert on spikes (e.g., tokens/request, retries, recursion depth). 

Budget and per-project safeguards. Enforce monthly and/or weekly budget caps at the provider and project layer, where available, and mirror them in your gateway.  

Test for pathological inputs before and after release. Add load tests and adversarial “prompt bombs” (oversize inputs, excessive nested tasks, tool-loop triggers) to CI/CD. Keep regression suites so changes in prompts, routing, or model choice don’t silently re-open DoS paths.  

Anchor to governance and platform controls. Treat DoS as part of your AI risk program, not a one-off fix. 

LLM05: Supply Chain Vulnerabilities 

What it is. Supply chain in LLM apps spans models, datasets, adapters, plugins/tools, frameworks, containers, and build/inference infrastructure. A compromise anywhere in that chain (a poisoned dataset, a tampered model, a vulnerable plugin, an unsigned container) can subvert your system.  

Why it happens. Modern LLM stacks are assembled from third-party components and rapidly changing packages. Without provenance and hardening, you inherit unknown licenses, outdated or vulnerable dependencies, and the risk of pretrained model tampering or dataset poisoning.  

What it breaks. Integrity (silent model backdoors), confidentiality (malicious plugins exfiltrating secrets), availability (compromised dependencies), and compliance (unlicensed data/models). Because agentic systems act via tools, a single compromised connector can become a blast radius multiplier across CRMs/ERPs or cloud resources. 

How to avoid it 

  1. Adopt a secure SDLC baseline for AI components. Require the same controls you use for software:  
  • threat modeling 
  • secure design reviews 
  • dependency hygiene 
  • code reviews 
  • reproducible builds 
  • vulnerability management 
  1. Provenance & attestation. 
  • Implement SLSA levels for training and inference pipelines so artifacts carry tamper-evident provenance 
  • Sign containers and model bundles and verify on deploy 
  • Store checksums for weights and datasets, defending against model or image replacement in transit or at rest.  
  1. Curate, pin, and scan dependencies. 
    Maintain an allowlist of sources, pin exact versions, and continuously scan for CVEs. Treat plugins, connectors, and tools as untrusted code with least-privilege scopes, and isolate them in separate processes and namespaces with strict egress controls.  
  1. Verify provider posture. 
    When using managed model endpoints or hosted RAG, require attestations about data retention, isolation, logging, and update processes. 
  1. Defend the data pipeline. 
    Gate ingestion with allowlisted sources, run DLP and classification, and apply anomaly or outlier checks before fine-tuning or indexing. Keep versioned datasets with hashes and approvals, while quarantining contributions that fail checks. 
  1. Continuously evaluate and monitor for supply-chain indicators. 
  • Add tests for suspicious triggers and backdoors 
  • Track sudden behavior drift after updates 
  • Keep rollback plans for models and containers 
  • Use threat-informed techniques to design evaluations that mimic real adversaries 

LLM06: Sensitive Information Disclosure 

What it is. Sensitive Information Disclosure occurs when an LLM-enabled system exposes confidential or personal data, PII/PHI, secrets, proprietary content, or internal policies through its outputs or logs. OWASP lists it as LLM06, noting legal, competitive, and trust impacts when sensitive data leaks from prompts, retrieved knowledge, chain-of-thought, tools, or telemetry.  

Why it happens. 

  • Over-sharing context. Excessive or unfiltered retrieval injects sensitive documents into the prompt window, which the model may paraphrase back to users.  
  • Memorization/extraction risk. Models can regurgitate rare training examples under certain conditions. 
  • Leaky telemetry and storage. Prompts and outputs with PII or secrets end up in logs, analytics, or vendor traces if not redacted. 
  • Provider/data-policy mismatches. If you don’t verify vendor data handling, inference, and fine-tuning data might be retained or used in ways your policy forbids. 

What it breaks. Privacy regulations (e.g., GDPR), contractual NDAs, and IP protection. GDPR penalties can reach €20M or 4% of global turnover for severe violations, elevating disclosure from a “bug” to an enterprise risk. 

How to avoid it 

Minimize and scope context. Send only the data needed per request. Enforce retrieval filters (identity, role, tenant, document ACLs) before augmentation; avoid broad indexes.

Redact before inference; de-identify at source. Apply classification to prompts and retrieved passages before they reach the model. Use managed services to find and mask PII/PHI with custom identifiers for your org-specific secrets. 

Harden output handling (pair with LLM02). Treat model output as untrusted until post-processed: enforce allowlisted fields, validate schemas, and run policy filters that block sensitive categories from leaving the system or appearing to unauthorized users. 

Tighten provider data policies and residency. Choose providers with enterprise guarantees:  

  • Data not used to train by default 
  • Encrypted at rest 
  • Regional residency 
  • Short retention 
  • Deletion APIs 

Scrub telemetry and logs. Don’t log raw prompts and outputs that may contain PII/secrets. If logging is required, tokenize or hash sensitive fields and restrict access.  

Encrypt data and isolate tenants. Enforce encryption in transit and at rest; isolate tenants at the index and vector-store level; scope keys and roles to the minimal surface.  

Test for extraction and leakage before release. Add red-team prompts and evaluation suites that probe for verbatim regurgitation and sensitive-field leakage. Use threat-informed tests and research practical data extraction to calibrate your defenses. 

LLM07: Insecure Plugin (Tool) Design 

What it is. Defined by OWASP as LLM07, insecure plugins and tools extend an LLM app with real capabilities, turning them into high-impact attack paths. 

Why it happens. Plugins are often treated like helpers, not untrusted integration points. Indirect prompt injection can steer tools; missing least-privilege and weak policy checks let an agent perform unintended actions.  

What it breaks. Data exfiltration through connectors, unauthorized transactions and updates, cross-tenant access, and lateral movement through cloud and enterprise APIs. 

How to avoid it 

  1. Design for least privilege and explicit scopes. Plugins and tools should expose narrow capabilities with role-based access, short-lived tokens, and tenant isolation. Require human approval for high-risk methods (write/delete/transfer). 
  1. Validate inputs/outputs at the boundary. Treat both as untrusted: enforce schemas, parameterize downstream calls, sanitize anything rendered or executed, and block dangerous patterns.  
  1. Isolate execution. Run plugins in separate processes/namespaces with egress controls and audit logs; never let a plugin write beyond its data domain.  
  1. Harden against indirect prompt injection. Separate retrieved content from control channels; ignore/strip instruction-like tokens in untrusted data; deny tool calls that fall outside policy. 
  1. Monitor and test continuously. Trace tool invocations, scopes used, and unusual sequences. Add adversarial tests for tool misuse before release and after updates.  

LLM08: Excessive Agency

What it is. Excessive agency is when an LLM system is granted too much functional power, permission, or autonomy so the model can take actions beyond what’s intended or safe. OWASP defines it as unchecked autonomy that can lead to damaging actions, and attributes root causes to excessive functionality, excessive permissions, or excessive autonomy

Why it happens. Modern agents can plan multi-step tasks and call tools or APIs. If you couple that power with broad scopes, weak policy checks, or no human approval for risky steps, they can escalate into real-world side effects.  

What it breaks. Reliability, privacy, and trust. Excessive agency jeopardizes reliability, privacy, and user confidence precisely because actions are automated, not merely suggested.

How to avoid it 

Design for “least autonomy,” not just least privilege. Keep workflows simple and deterministic; prefer recommend → approve → execute over fire-and-forget. Constrain what the agent is allowed to do and when, even if credentials would technically permit more.

Scope tools narrowly with explicit policies. Expose only the methods the agent truly needs; bind each to allowlists, rate limits, and tenant-aware scopes. Block cross-system actions unless pre-approved. 

Separate “decide” from “do.” Treat model output as intent, then translate it into validated, policy-checked commands before any side effects. 

Put humans in the loop for risky actions. Require approvals on high-impact steps and then track who approved what, when, and why; retain auditable logs. 

Continuously evaluate agent behavior before and after release. Use standard eval suites that specifically probe agentic tasks and tool use.  

Harden against indirect prompt injection. Treat all external content as untrusted; strip or ignore instruction-like tokens in retrieved pages and files; deny tool calls that fall outside policy. 

LLM09: Overreliance on LLM Output 

What it is. Overreliance is trusting an LLM’s answer (or action plan) without independent verification. OWASP flags it as LLM09, noting that uncritically accepting model output can lead to bad decisions, security issues, and legal exposure. 

Why it happens. LLMs can present confident but wrong content (hallucinations/confabulations), and that fluency encourages users and systems to over-trust results. 

What it breaks. Decision quality, downstream system safety (when outputs drive tools), and compliance posture, especially if unverified content is stored, sent to customers, or used to modify records. 

How to avoid it 

  1. Make “verify-then-act” your default. Build workflows that require evidence before side effects. RAG and grounding reduce hallucinations by tying answers to trusted sources. For customer-facing or high-risk actions, require humans to review either the citations or the retrieved snippets. 
  1. Measure groundedness and factuality at runtime. Detect when responses don’t match their sources. 
  1. Separate “decide” from “do.” Treat model output as intent that must be translated into validated, policy-checked commands. This prevents free text from directly driving SQL, shell, plugins, or UI automations and reduces the damage even if a claim is wrong. 
  1. Continuously evaluate with standard suites. Don’t rely on one-time red-teaming. 

LLM10: Model Theft (Extraction) 

What it is. Model theft is the unauthorized copying or exfiltration of your model, either by stealing weights/artifacts outright or recreating your model’s functionality via large-scale queries. OWASP lists this as LLM10 and calls out both insider and infrastructure compromise, along with black-box extraction, as common routes. 

Why it happens. Two broad paths: 

  • Black-box extraction: research shows attackers can approximate proprietary models simply by querying an API and training a surrogate on input–output pairs.  
  • Direct theft of artifacts: weak access control on registries, storage, CI/CD, or training clusters allows exfiltration of weights, datasets, and adapters. 

What it breaks. Intellectual property, competitive advantage, compliance/licensing, and, if the training data is memorized, training-data extraction risks. 

How to avoid it  

Lock down artifacts and the MLOps surface.

  • Store models and datasets in a registry behind strong authentication (RBAC) 
  • Enforce short-lived credentials 
  • Use network allowlists 
  • Encrypt at rest 
  • Require code reviews and approvals for pull and push 
  • Log every artifact access 

Harden the API to raise the extraction cost. Enforce authN, per-account quotas and rate limits, and anomaly detection on query volume/entropy; disable or restrict high-information outputs that accelerate distillation. 

Watermark for deterrence and evidence (with caveats). Text watermarking can help detect mass-generated content used to train knockoffs or to prove provenance, but it is not a silver bullet and can be targeted. 

Secure the AI supply chain. Apply software-supply-chain controls to models, adapters, and datasets; pin versions; verify checksums at deploy time; isolate plugins/connectors with least privilege. 

Govern and measure continuously. Treat model theft as an ongoing risk with policy, monitoring, and incident playbooks. Use NIST’s AI RMF and the Generative AI Profile to formalize roles, controls, along with TEVV (test/eval/verification/validation) across the lifecycle. 

Threat-informed testing. Before and after releases, run extraction drills and watch for telltale patterns like spikes in token volume, unusual input distributions, repeated classifications, and more.  

Your Readiness Assessment Baseline your risks, prioritize fixes, and ship a secure pilot with Svitla’s engineering team. Start my assessment

From Theoretical to Operational: Treat LLM Safety as an Engineering Discipline 

Our word of advice? Always anchor your LLM program in well-recognized frameworks, building guardrails into architecture and validating/sanitizing every boundary. Equip your systems so that quality, safety, and cost are observable. Make evaluation and red-teaming part of the lifecycle, and keep your AI supply chain as rigorously governed as your software supply chain. 

How Svitla Systems helps: we design and build LLM applications that map directly to OWASP guidance and leading standards. Our teams implement secure SDLC practices for AI, threat modeling, alignment to OWASP and MITRE, governance with NIST AI RMF, data protection and retrieval hygiene, agent/tool isolation with least privilege, and production observability for quality, safety, latency, and token spend. We also set up continuous evaluations and red-team exercises, so defenses improve as your use cases grow. 

Let us scope a quick assessment for your specific workflows or products to baseline risks against OWASP, prioritize fixes, and design a secure pilot that is measurable, auditable, and ready to scale.  

FAQ

Is the OWASP LLM Top 10 the same as the classic OWASP Top 10?

No. It’s a dedicated list for LLM applications, covering risks like prompt injection, insecure output handling, and excessive agency that are unique to model-driven apps.

What’s the fastest way to reduce real risk in production?

Isolate untrusted content, gate tool use with least-privilege scopes, enforce schema validation on outputs, and add observability (traces, token usage, policy checks). You’ll eliminate several high-impact failure modes at once. 

Where should governance live: security or product?

Both. You can use the NIST AI RMF (Govern–Map–Measure–Manage) framework to divide responsibilities. Security sets policy and guardrails while product and engineering implement controls and continuous evaluations.