Skip to content

Proposal: Audit PII-handling system prompts for defense gaps #1933

@ppcvote

Description

@ppcvote

Context

Presidio excels at detecting and redacting PII. But when Presidio is used with LLMs (e.g., the PII de-identification pipeline), the system prompts that instruct the LLM how to handle PII are themselves a security surface.

If the system prompt lacks explicit data protection instructions, the LLM may:

  • Leak the PII it's supposed to redact
  • Follow injected instructions in the PII-containing text
  • Output the original sensitive data when asked

Proposal

Add a sample/recipe that audits PII-handling system prompts for defense gaps before deployment:

import subprocess, json

# The prompt you use to instruct the LLM for PII handling
pii_prompt = "You are a PII redaction assistant. Replace all personal information with [REDACTED]."

result = subprocess.run(
    ["npx", "prompt-defense-audit", "--json", pii_prompt],
    capture_output=True, text=True
)
audit = json.loads(result.stdout)

# Check specifically for data protection and indirect injection defenses
for check in audit["checks"]:
    if check["id"] in ["data-leakage", "indirect-injection"] and not check["defended"]:
        print(f"WARNING: {check['name']} defense missing in PII handling prompt")

Why this matters

We scanned 1,646 production system prompts and found 94.9% lack indirect injection defense. For PII-handling prompts, this is especially dangerous — an attacker could embed instructions in a document containing PII, causing the LLM to output the PII instead of redacting it.

Tool

prompt-defense-audit — MIT, zero dependencies, <5ms. Available on npm.

Happy to contribute a sample notebook if this direction is useful.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions