Context
Presidio excels at detecting and redacting PII. But when Presidio is used with LLMs (e.g., the PII de-identification pipeline), the system prompts that instruct the LLM how to handle PII are themselves a security surface.
If the system prompt lacks explicit data protection instructions, the LLM may:
- Leak the PII it's supposed to redact
- Follow injected instructions in the PII-containing text
- Output the original sensitive data when asked
Proposal
Add a sample/recipe that audits PII-handling system prompts for defense gaps before deployment:
import subprocess, json
# The prompt you use to instruct the LLM for PII handling
pii_prompt = "You are a PII redaction assistant. Replace all personal information with [REDACTED]."
result = subprocess.run(
["npx", "prompt-defense-audit", "--json", pii_prompt],
capture_output=True, text=True
)
audit = json.loads(result.stdout)
# Check specifically for data protection and indirect injection defenses
for check in audit["checks"]:
if check["id"] in ["data-leakage", "indirect-injection"] and not check["defended"]:
print(f"WARNING: {check['name']} defense missing in PII handling prompt")
Why this matters
We scanned 1,646 production system prompts and found 94.9% lack indirect injection defense. For PII-handling prompts, this is especially dangerous — an attacker could embed instructions in a document containing PII, causing the LLM to output the PII instead of redacting it.
Tool
prompt-defense-audit — MIT, zero dependencies, <5ms. Available on npm.
Happy to contribute a sample notebook if this direction is useful.
Context
Presidio excels at detecting and redacting PII. But when Presidio is used with LLMs (e.g., the PII de-identification pipeline), the system prompts that instruct the LLM how to handle PII are themselves a security surface.
If the system prompt lacks explicit data protection instructions, the LLM may:
Proposal
Add a sample/recipe that audits PII-handling system prompts for defense gaps before deployment:
Why this matters
We scanned 1,646 production system prompts and found 94.9% lack indirect injection defense. For PII-handling prompts, this is especially dangerous — an attacker could embed instructions in a document containing PII, causing the LLM to output the PII instead of redacting it.
Tool
prompt-defense-audit — MIT, zero dependencies, <5ms. Available on npm.
Happy to contribute a sample notebook if this direction is useful.