AI Security · Learn

What is RAG poisoning?

Planting bad content in the knowledge base an AI reads from, so the AI later retrieves it and acts on it as if it were trusted. Most enterprise AI features built on company documents are exposed.

AI Security · LearnAI Penetration Testing
TL;DR

RAG stands for retrieval-augmented generation, the standard way modern AI products answer questions from a company's own documents. RAG poisoning is the attack where someone plants malicious content in the corpus the AI reads from (a user-uploaded PDF, a scraped web page, a shared wiki). Two failure modes: the planted content instructs the AI to do something wrong, or it feeds the AI false facts the AI then confidently repeats.

By Rohit Hatagale, AI Security Lead, SecureLayer7Updated

What exactly is being poisoned in RAG poisoning?

The corpus. A retrieval-augmented generation system has three moving parts: a vector store of indexed chunks, a retriever that fetches relevant chunks for a query, and an LLM that answers the query using the retrieved chunks as context. RAG poisoning targets the first part.

Attacker goals fall into two buckets. The first is instruction injection, plant a chunk whose content is really a payload that hijacks the model when retrieved (a special case of indirect prompt injection, see our indirect-PI article). The second is knowledge corruption, plant chunks that contain false facts the model then asserts as ground truth in its answer.

How does attacker content end up in a production RAG corpus?

More easily than most teams expect. The vectors we see most often on engagements:

  • User-uploaded documents. Help-desk attachments, support-ticket bodies, customer-submitted PDFs. Any user input that ends up in the corpus.
  • Auto-ingested third-party feeds. RSS, partner APIs, scraped web content, public ticket boards.
  • Public web crawling. If the retriever fetches from the public web on the fly, every site the user mentions is a potential injection source.
  • Internal-but-untrusted sources. A shared wiki where any employee can edit, a Slack export, a code-comment harvest. The boundary you thought you had inside the org is often porous.
  • Embedding-space manipulation. Carefully crafted chunks designed to rank high for specific queries, sometimes through token-level adversarial methods against the embedding model.

What actually reduces RAG poisoning risk?

  • Ingestion-time provenance. Tag every chunk with its source, ingestion path, and trust level. Surface that provenance in the prompt so downstream defenses can reason about it.
  • Source-aware retrieval. Restrict which sources are eligible for which queries. A consumer-facing assistant should probably not retrieve from auto-ingested third-party feeds without an authority check.
  • Adversarial corpus review. Run scheduled scans for known payload patterns, anomalous embeddings, and chunks with suspicious instruction-language density.
  • Output verification against retrieved sources. When the model cites a fact, programmatically check that the cited chunk actually supports the claim. Catches a large fraction of knowledge-corruption attacks.
  • Least-privilege downstream actions. If the RAG system feeds an agent that can take actions, the action layer should not trust the retrieval result transitively.

How does SecureLayer7 test for RAG poisoning?

We enumerate every ingestion path into the corpus, classify each path's attacker reachability, and plant adversarial chunks of varying subtlety. We measure retrieval rank (do our chunks come back?) and impact (does the model honor the planted instruction, or assert the planted fact?). For systems with both retrieval and tool access, we chain the poisoned retrieval into a downstream action and prove blast radius. Every confirmed finding ships with the planted chunk, the query that retrieved it, the resulting model output, and an architectural recommendation.

References

  1. [1]OWASP LLM06:2025, Sensitive Information Disclosure(OWASP)
  2. [2]OWASP LLM08:2025, Vector and Embedding Weaknesses(OWASP)
  3. [3]Greshake et al., More Than You've Asked For(arXiv 2302.12173, 2023)
  4. [4]Zou et al., PoisonedRAG(arXiv 2402.07867, 2024)
Related terms

If your RAG corpus accepts content from anywhere outside your trust boundary, RAG poisoning is in your threat model.

Common questions

RAG poisoning, asked often

Need a RAG security review?

Scope an engagement

Audit your RAG corpus before someone else does.

We map every place untrusted content can reach your AI's knowledge base, plant adversarial test cases of varying subtlety, and document where retrieval, ranking, or output validation failed.

See our AI testing methodology30-min scoping call, fixed-price proposal in 48 hours.