Important
The source code for this repository was generated by claude.ai [Sonnet4.6] **This is strictly to review codes generated by AI providers, which may follow with modifications in the future.
Warning
This project is provided as-is without warranty of any kind. Review the code before use in any environment you care about. Use at your own risk.
Cross-platform, self-contained binary that scans files for sensitive data and produces three outputs: a full-detail report, a redacted management report, and a structured JSON dataset for security team triage.
- In scope: Detection of API keys, tokens, passwords, secrets, SSNs, credit card numbers, private keys, and connection strings via a user-configurable pattern library (
patterns.json) - In scope: Three outputs — full report (raw values, restricted), redacted report, and structured JSON dataset
- In scope: Native binaries for Linux and Windows; containerised execution via Docker
- Out of scope: Automatic remediation or secret rotation — detection only
- Out of scope: Real-time file system monitoring; this is a point-in-time scan
.
├── src/ # Go source code
│ ├── go.mod
│ ├── main.go # CLI parsing, orchestration
│ ├── scanner.go # File walking, pattern matching, entropy detection
│ ├── worker.go # Parallel worker pool
│ ├── reporter.go # Text report + JSON dataset writers
│ ├── reporter_html.go # Self-contained HTML report writer
│ └── patterns.go # Pattern struct, JSON loader, regex compiler
├── node/ # Original Node.js implementation (archived)
│ ├── scanner.js
│ ├── patterns.js
│ ├── scan.sh
│ ├── scan.ps1
│ └── Dockerfile
├── fixtures/ # Test data (fake credentials)
├── patterns.json # User-configurable pattern library
├── Dockerfile # Multi-stage Go build + runtime image
├── build.sh # Linux/macOS build script (uses Docker)
├── build.ps1 # Windows build script (uses Docker)
├── scan.sh # Linux/macOS run wrapper
└── scan.ps1 # Windows run wrapper
build/ is created by the build scripts and is git-ignored.
Docker is the only build requirement — no Go installation needed on the host.
chmod +x build.sh scan.sh
./build.sh # builds all targets
./build.sh linux # Linux amd64 only
./build.sh linux-arm64 # Linux arm64 only
./build.sh windows # Windows amd64 only.\build.ps1 # builds all targets
.\build.ps1 -Target linux # Linux amd64 only
.\build.ps1 -Target linux-arm64 # Linux arm64 only
.\build.ps1 -Target windows # Windows amd64 onlyBinaries are written to build/:
| File | Platform |
|---|---|
build/scanner-linux-amd64 |
Linux x86-64 |
build/scanner-linux-arm64 |
Linux ARM64 (Raspberry Pi, AWS Graviton, Apple Silicon via Rosetta) |
build/scanner-windows-amd64.exe |
Windows x86-64 |
chmod +x scan.sh
./scan.sh /path/to/scan
./scan.sh /path/to/scan --ext .js,.env --exclude vendor,tmp.\scan.ps1 C:\path\to\scan
.\scan.ps1 C:\path\to\scan -Ext ".js,.env" -Exclude "vendor,tmp" -Out "C:\reports"Windows output files are automatically suffixed with -win to distinguish them from Linux/macOS runs.
./build/scanner-linux-amd64 /path/to/scan --patterns ./patterns.json# Demo run against bundled fixtures:
docker build -t sensitive-data-scanner .
docker run --rm sensitive-data-scanner
# Scan a directory on the host:
docker run --rm \
-v /host/path/to/scan:/target:ro \
-v /host/output:/out \
sensitive-data-scanner /target --patterns /app/patterns.json --out /out| Flag | Description |
|---|---|
--ext .js,.env,... |
Only scan files with these extensions (comma-separated; dot optional) |
--exclude dir1,dir2 |
Additional directories to exclude |
--suffix <str> |
Append a suffix to all output filenames |
--out <path> |
Custom output directory (default: ./scan-output-<timestamp>) |
--patterns <path> |
Path to the patterns JSON file (default: patterns.json in working dir) |
--summary |
Print finding counts by type to stdout; skip writing output files |
--entropy |
Enable high-entropy string detection (catches secrets with no known prefix) |
--entropy-threshold <float> |
Entropy threshold in bits/char (default 4.5). Also enables --entropy. |
--entropy-min-len <int> |
Minimum token length for entropy check (default 20). Also enables --entropy. |
--threads <int> |
Parallel scan workers (default 1). Output order is always deterministic. |
-h, --help |
Show usage |
Each scan creates a timestamped directory scan-output-<timestamp>/ containing:
| File | Description | Access |
|---|---|---|
full-report.txt |
All findings with raw secret values | chmod 600 — RESTRICTED |
redacted-report.txt |
Findings with partially redacted values (safe for management) | Unrestricted |
redacted-report.html |
Same as above in self-contained HTML (print-friendly) | Unrestricted |
findings.json |
Structured dataset for triage (no raw secrets) | Unrestricted |
skipped.log |
Binary or unreadable files (if any) | Unrestricted |
- SSN:
***-**-6789(last 4 digits visible) - Credit Card:
****-****-****-1111(last 4 digits visible) - Private Key:
[PRIVATE KEY DETECTED — see full report] - Everything else:
ABCD****WXYZ(first 4 + last 4 chars)
Patterns are defined in patterns.json at the project root. Each entry follows this schema:
{
"id": "aws-access-key-id",
"name": "AWS Access Key ID",
"description": "...",
"pattern": "\\bAKIA[0-9A-Z]{16}\\b",
"caseInsensitive": false,
"captureGroup": 0
}| Field | Type | Description |
|---|---|---|
id |
string | Unique identifier |
name |
string | Human-readable label used in reports |
description |
string | Documents what the pattern targets |
pattern |
string | RE2-compatible regex (JSON-escaped) |
caseInsensitive |
bool | Prepends (?i) when true |
captureGroup |
int | 0 = use full match; 1 = use first capture group |
validator |
string | (optional) Post-match validator: "ssn" (rejects invalid area/group/serial ranges) or "luhn" (Luhn check-digit for credit cards) |
Add, remove, or modify entries in patterns.json to customise detection without recompiling.
Note: Go uses RE2 syntax. Lookahead/lookbehind assertions are not supported. Use the
validatorfield for post-match filtering instead (as done for SSN and credit cards).
| Pattern | Example |
|---|---|
| AWS Access Key ID | AKIA... |
| AWS Secret Access Key | aws_secret_access_key = ... |
| GCP API Key | AIza... |
| GCP Service Account Key | client_email: [email protected] |
| GitHub PAT (classic + fine-grained) | ghp_..., github_pat_... |
| Slack Bot/User/App Token | xoxb-... |
| Stripe Secret/Publishable Key | sk_live_..., pk_test_... |
| SendGrid API Key | SG.... |
| Twilio Auth Token | TWILIO_AUTH_TOKEN = ... |
| Bearer Token | Authorization: Bearer ... |
| JSON Web Token (JWT) | eyJ....eyJ.... |
| Generic API Key | api_key = ... |
| Generic Secret / Token | secret = ..., access_token = ... |
| Generic Password Field | password = ... |
| Private Key (PEM header) | -----BEGIN RSA PRIVATE KEY----- |
| Database Connection String | postgresql://user:pass@host/db |
| Azure Storage Connection String | DefaultEndpointsProtocol=...AccountKey=... |
| Social Security Number (SSN) | 123-45-6789 |
| Credit Card Number | Visa, Mastercard, Amex, Discover (Luhn-validated) |
| High-Entropy String | Any token ≥ 20 chars scoring ≥ 4.5 bits/char (requires --entropy) |
.git, node_modules, .cache, dist, build, vendor, __pycache__, .yarn, .next, .nuxt, target, .venv, venv, .tox, coverage, .nyc_output, .parcel-cache, .turbo, .svelte-kit, out, .output
Additional directories can be excluded with --exclude.
The full report contains raw, unredacted secret values.
- Do NOT commit it to version control
- Do NOT email or share it without encryption
chmod 600is applied automatically on Linux/macOS
The original Node.js implementation lives in the node/ directory and requires Node.js 18+.
# Linux/macOS
node/scan.sh /path/to/scan
# Windows
node/scan.ps1 C:\path\to\scan
# Docker (built from node/ context)
docker build -t scanner-node -f node/Dockerfile node/
docker run --rm scanner-node- Expand Azure patterns: SAS tokens, Cosmos DB connection strings
- Add unit tests for the pattern library (one fixture per pattern)