Workflow · saas · workflow

Dropbox security research: prompt injection via control characters in GPT-3.5 and GPT-4

Dropbox's security team identified that user-controlled control characters in LLM prompt inputs can circumvent system-level instructions, enabling prompt injection attacks that cause models to betray context constraints or hallucinate.

How it works

Common implementation structure

How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.

Stage 1 · LLM injection threat identified

Dropbox's security team identifies injection attacks on LLM queries as a focus area for hardening internal infrastructure.

Tools used

GPT-3.5GPT-4ChatGPTPython 3

Outcome

Dropbox demonstrated that prepending sufficient control characters to LLM prompt inputs causes GPT-3.5 and GPT-4 to betray their system instructions and hallucinate; the team shared findings with OpenAI and identified input sanitization as the primary mitigation.

What failed first

The prompt template designed to constrain LLM queries to a specific context and prevent instruction leakage was defeated when sufficient control characters were prepended to the question parameter, regardless of instruction wording or formatting.

Results

Volume350 or more

Source

https://dropbox.tech/machine-learning/prompt-injection-with-control-characters-openai-chatgpt-llm

How we source this →

Grounding & classification

Source type: technical build writeup

17 fields verified against source quotes.

conversational aifailure mode describedmetric backednamed customersource backedtools describedworkflow describedsoftwaretechnical build writeup