Dropbox security research: prompt injection via control characters in GPT-3.5 and GPT-4
Dropbox's security team identified that user-controlled control characters in LLM prompt inputs can circumvent system-level instructions, enabling prompt injection attacks that cause models to betray context constraints or hallucinate.
Dropbox demonstrated that prepending sufficient control characters to LLM prompt inputs causes GPT-3.5 and GPT-4 to betray their system instructions and hallucinate; the team shared findings with OpenAI and identified input sanitization as the primary mitigation.
The prompt template designed to constrain LLM queries to a specific context and prevent instruction leakage was defeated when sufficient control characters were prepended to the question parameter, regardless of instruction wording or formatting.
https://dropbox.tech/machine-learning/prompt-injection-with-control-characters-openai-chatgpt-llm