Is RAG safe if I sanitize uploaded documents?

Sanitization helps for obvious payloads. It does not cover hidden Unicode, paraphrased instructions, multi-document chains, or third-party-feed content. Sanitization is one layer, not a perimeter.

We do not give the model tool access. Does this still apply?

Yes, but the blast radius is smaller. Without tools, indirect injection can still cause data leakage, deceive the user, or bypass content policies. With tools, it becomes remote action execution.

Can we detect indirect prompt injection in production?

Partially. You can detect known payload signatures, anomalous tool-call shapes, and output containing exfil markers. Detection is a tripwire, not prevention.

Do image OCR pipelines really get injected?

Yes. A payload printed into an uploaded image, lifted by OCR, and fed back to the model is treated as a legitimate instruction. We demonstrate this on client engagements regularly.

Does indirect prompt injection affect SOC 2 or ISO 27001 audits?

Indirectly. If your LLM feature touches customer data and is compromisable, that is an unaddressed risk under most security frameworks. Auditors increasingly ask for documented testing of LLM-backed features.

What is Indirect Prompt Injection? Definition, Real Cases, and Defenses

AI Security · LearnAI Penetration Testing Download PDF

TL;DR

Indirect prompt injection is the version of prompt injection where the attacker plants instructions in content the AI will read on someone else's behalf, like a web page the AI summarizes, a support ticket it reads, or a file it ingests. The user never sees the attack. This is the failure pattern that turns AI assistants from a user-experience risk into a real security risk.

By Rohit Hatagale, AI Security Lead, SecureLayer7Updated June 9, 2026

How is indirect prompt injection different from direct?

In direct prompt injection, the attacker types the payload into a field the model reads (chat, search, form). They are present at the keyboard. In indirect prompt injection, the attacker plants the payload somewhere the model will later read on someone else's behalf. The victim is a different person, often an internal employee or another customer, and the payload arrives through a channel that was never user-visible.

Greshake et al. introduced the term in 2023 and demonstrated the first end-to-end exploit against Bing Chat: a hostile web page told the model to extract the user's chat history and exfiltrate it through a markdown image URL. The user saw a normal answer. Every modern indirect-injection technique still reuses some variant of that pattern (Greshake et al., 2023).

Which channels carry indirect prompt-injection payloads in production?

The list grows every quarter. The ones we test on every engagement:

RAG-indexed documents. Anything the retrieval pipeline pulls into context. If users upload PDFs, scrape web content, or sync from third-party SaaS, every one of those is an instruction channel.
Tool / function responses. A search tool that returns attacker-controlled snippets, an email tool that returns inbox bodies, a database tool that returns user-supplied notes.
Markdown and HTML rendered to the model. Alt text, link labels, hidden Unicode, CSS-hidden spans, comments.
Image content with OCR. A payload printed inside the image bytes that the OCR pipeline lifts out and feeds back to the model.
JSON or YAML fields. A nested string the model is supposed to summarize, where the string itself is an instruction.
Calendar invites, contact card notes, file names. Anywhere the agent processes structured data that originated outside your trust boundary.

What real-world indirect prompt-injection cases should I read?

Greshake et al., 2023, the foundational paper. Demonstrated exfil through a markdown image link rendered by Bing Chat. Reading list day one.
Cloak and Honey Trap (USENIX Security '25), Ben-Gurion researchers classified 7 LLM-agent vulnerability classes, 6 attacker strategies, and 15 attack techniques targeting agentic systems. The CHeaT testbed reproduces every one of them.
Google Bard email leak (2024), indirect injection through a shared Google Doc caused the assistant to leak unrelated Gmail content.
Bing Chat history exfil (2023), the canonical exfil-via-rendered-link case, still relevant as a template for every UI that turns model output into a network request.

What actually reduces indirect-injection risk?

No single control closes the gap. A defensible stack combines:

Provenance tagging of every chunk in the prompt (system vs operator vs user vs retrieved), explicitly marked so downstream defenses can reason about source.
Render-boundary hardening: strip auto-resolving markdown links, sandbox image rendering, disallow inline scripts in model output that the UI honors.
Least-privilege tool wiring so that even a fully compromised model cannot perform actions the user has not authorized.
Second-model verification before any high-impact action (sending money, granting access, mutating production). The verifier sees only the action and a structured summary, not the original user input.
Adversarial monitoring that alerts on tool-call shapes you have never seen, or on output that contains exfil-channel markers (long base64 strings inside URLs, for example).

Each of these is partial. Combining them moves the cost of a successful exploit up, not to infinity.

How does SecureLayer7 test for indirect prompt injection?

We map every channel the model reads from before sending payloads: RAG corpus, tool response shapes, attached document parsers, OCR pipelines. We plant payloads in each channel and observe whether the model follows them. For agentic systems, the success criterion is not 'did the model say something it should not' but 'did the model perform an action it should not, or reach a resource outside its authorized scope.' Every confirmed finding ships with a reproducible transcript, the trust boundary that was crossed, and an architectural recommendation, not just a filter rule.