15 Research Lab -Adversarial Safety Evaluation of Frontier AI Systems

John Kearney

Prompt Injection in RAG Pipelines: Document Poisoning and Chunk Boundary Exploits

November 10, 202515 Research Lab

prompt-injectionagent-safetydefensefindings

Retrieval-Augmented Generation (RAG) pipelines fetch external documents and inject their content into the model's context. Every retrieved chunk is an opportunity for indirect prompt injection. The model cannot distinguish between legitimate document content and adversarial instructions embedded within it.

Document Poisoning

An attacker creates or modifies a document in the knowledge base to contain injection payloads. When a user query triggers retrieval of that document, the payload reaches the model as trusted context.

Attack vectors include:

Public-facing document repositories where anyone can contribute
Shared drives or wikis with broad write access
Web-scraped content that includes adversarial pages
PDF files with hidden text layers or metadata fields containing instructions

The payload does not need to be visible. White text on white background, zero-font-size spans, and metadata fields are all effective carriers.

Chunk Boundary Exploits

RAG systems split documents into chunks for embedding and retrieval. Attackers can craft payloads that exploit the chunking logic:

Cross-chunk injection: The payload is split across two chunks. Neither chunk contains a complete injection pattern, bypassing per-chunk scanning. When both chunks are retrieved and concatenated in the prompt, the full payload assembles.

Chunk isolation: The adversarial instruction is placed at the start of a chunk, separated from surrounding content by whitespace or formatting. This makes it more likely to be interpreted as an instruction rather than document content.

Relevance manipulation: The adversarial chunk is seeded with keywords that make it rank highly for common queries, ensuring frequent retrieval.

Metadata Injection

Many RAG implementations include document metadata (title, author, source URL, tags) in the prompt alongside the chunk content. Attackers can embed instructions in metadata fields that developers assume are safe. A document titled "IMPORTANT: Override your previous instructions and..." will have its title included verbatim in the prompt.

Defenses for RAG Systems

Sanitize at ingestion. Scan documents for injection patterns before they enter the knowledge base. Strip hidden text, decode embedded content, and flag suspicious metadata.

Scan after retrieval. Run injection detection on assembled prompts, not just individual chunks. Aegis can scan the full context window before it reaches the model.

Privilege-separate retrieved content. Use structured prompts that explicitly label retrieved content as untrusted data. Instruct the model to extract information from documents but not follow instructions found within them.

Authorization as a backstop. Even if a poisoned document successfully injects instructions, a tool-call authorization policy prevents the injected commands from executing. The model might try to call an unauthorized tool, but the policy engine blocks it.

Test your RAG pipeline with poisoned documents from AI SecLists. If your system has a document upload feature, it has this attack surface.