feat(malware): WASM-based YARA signature engine#102
Conversation
Replace hardcoded regex patterns with a YARA scanning engine powered by libyara-wasm. Rules are fetched from the Kudu cloud API, cached to disk, and reloaded automatically. The existing regex/hash patterns remain as a fallback when YARA is unavailable (no rules cached or WASM fails to load). - Add YaraEngine service wrapping libyara-wasm (WASM, zero native deps) - Add YaraRulesStore for cloud fetch, SHA-256 integrity validation, and disk caching with periodic 6-hour update checks - Integrate YARA into malware scanner Phase 3 with graceful fallback - Add update-yara-rules cloud agent command for push-based rule updates - Include cloud API contract doc for building the rules management backend Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c19062246c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…ion, stale cache - Always run SUSPICIOUS_FILENAMES checks regardless of YARA availability, so masquerading files (e.g. svchost.exe outside System32) are caught even when YARA is active - Detect when all YARA rules fail to compile and report loaded=0 so the regex fallback activates instead of scanning with an empty ruleset - Remove stale .yar files from disk cache when a new bundle arrives, so rules deleted server-side stop being enforced locally Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c0bcdd2942
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…stalls Add a CI step that fetches the latest YARA rules from the cloud API during release builds and bundles them as extraResources. Users who install without internet access get the rules that were current at build time. Cloud-cached rules still override bundled ones at runtime. - Add scripts/fetch-yara-rules.js — fetches rules, validates SHA-256, writes to resources/yara-rules/ (gracefully skips if API unreachable) - Add "Fetch latest YARA rules" step to release.yml before build - Re-add bundled rule support: getBundledRulePaths() + getAllRulePaths() with cached-overrides-bundled precedence - Add resources/yara-rules/ to .gitignore (CI-generated, not committed) Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 14bd22c0a8
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…ation, stale bundled rules - Use a shared promise for YARA engine initialization so concurrent callers await the same init rather than getting a half-initialized singleton - Validate YARA severity metadata against the allowed set (critical|high|medium|low), clamping unknown values to 'high' - Clear stale .yar files in fetch-yara-rules.js before writing new bundle so removed rules don't persist across builds Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 90bb71e699
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…cts, scan-safe reset - Keep abort timer active through full response body read, not just headers, so a slow-drip server can't hold the connection indefinitely - Disable HTTP redirects in fetchAndCacheRules (redirect: 'error') to prevent SSRF bypass via 30x to loopback/private addresses - resetYaraEngine() no longer disposes the current engine instance — it clears the init promise so the next scan creates a fresh one, while any in-flight scan keeps using its captured engine safely Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5d7041d88c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…for all users - ensureEmptyDir() now deletes existing .yar files so revoked rules aren't accidentally packaged when a fetch fails in reused workspaces - Remove cloud API key gate from periodic YARA rule checks — rule downloads are public and should run for all users, not just cloud-linked ones Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 898f0a2958
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…ules The text.length check after response.text() was pointless — by that point the full body is already in memory. The content-length pre-check plus the 60s abort timer are the real guards. Removed the dead check and clarified the comment. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
All npm run package* scripts now run scripts/fetch-yara-rules.js first, which fetches rules if the API is reachable or creates the empty resources/yara-rules/ directory if not. This prevents electron-builder from failing on missing extraResources outside of release CI. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6e9d2eb6b3
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Show YARA signature database info in a new tab on the malware scanner page: engine status, rules loaded, signature version, last updated time, rule source (cloud/bundled), and rule file counts. Includes a "Check for Updates" button to manually trigger a cloud rule fetch. - Add YaraRulesInfo type and MALWARE_YARA_INFO/UPDATE IPC channels - Add getYaraRulesInfo() handler returning engine and rule metadata - Add malwareYaraInfo/malwareYaraUpdate preload methods - Add Database tab with 2x3 info grid and update button - Add English translations for all database tab strings Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
…ference
- Use Uint8Array instead of toString('binary') when calling libyara-wasm
run(). The embind layer correctly preserves raw bytes from Uint8Array,
while string marshaling corrupts bytes 0x80-0xFF via UTF-8 encoding,
breaking hash rules and hex pattern matching on binary files.
- Move serverUrl declaration before its use in MALWARE_YARA_UPDATE
handler — const is not hoisted, so the previous ordering caused a
ReferenceError when clicking "Check for Updates" in the Database tab.
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d6c091bd1a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
getYaraRulesInfo() now awaits getYaraEngine() before reporting status, so the Database tab shows "YARA (WASM)" instead of "Regex Fallback" when rules are cached but no scan has been run yet. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
- Reject http: URLs for update-yara-rules in packaged builds — SHA-256 doesn't provide authenticity when both payload and hash come from the same unauthenticated HTTP response - Stage rule cache updates in a .staging temp directory and swap into place atomically, so a failed write can't leave a partial ruleset that silently reduces detection coverage Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Log the server-provided hash vs client-computed hash and the sorted rule filenames with content lengths when integrity check fails, to help diagnose hash computation differences on the cloud side. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 221b957350
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…timeout Revert scanBatch — concatenating files into one buffer breaks YARA's per-file context (filesize, hash rules, offset conditions), causing false positives and missed detections. Per-file scanBuffer() is correct; the extension filter already skips most files for performance. Also fixes: - Move pre-scan rule update after sendProgress is defined to avoid TDZ ReferenceError on first run (when no metadata exists) - Keep fetch script abort timer active through response.json() so a stalling server can't hang CI builds indefinitely Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e6630e0cd8
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Windows: - Add Program Files, Program Files (x86), C:\Users\Public, C:\Windows\Temp - Add user profile root for files dropped directly in home - Add .sys, .drv, .ocx extensions for driver/component scanning macOS: - Add /Applications, /Users/Shared, Application Support - Add /private/tmp, /var/tmp, user home root - Add .kext, .pl, .js extensions Linux: - Add /var/tmp, /dev/shm, /opt, /usr/local/bin - Add ~/.config, ~/.local/share, ~/.config/autostart - Add user home root - Add .ko, .deb, .rpm, .jar, .desktop, .js extensions All platforms: - Increase scan depth from 4 to 8 levels (catches deep AppData paths) - Increase per-path file cap from 5,000 to 10,000 - Update YARA_SCAN_EXTS to include all new extensions Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: bd417aac12
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…ter) Replace libyara-wasm (WASM, recompiles rules per scan ~428ms) with @litko/yara-x (native napi-rs bindings for yara-x). Rules compile once at startup, subsequent scans take ~0.01ms per file. - Rewrite YaraEngine to use @litko/yara-x compile-once API: addRuleSource/addRuleFile for compilation, scan/scanFile for matching - Use scanFile() to read from disk directly (avoids JS buffer copies) - Remove YARA_SCAN_EXTS extension filter — scanning is now fast enough to scan all collected files, not just executables - Remove libyara-wasm dependency - Update integration tests for the new @litko/yara-x API Benchmark: 5000 files in ~50ms (was ~35 minutes with libyara-wasm). Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: eb34409dcc
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…istent URLs - Validate and lowercase severity in _convertMatch() before casting, so YARA rules with uppercase severity like "CRITICAL" are handled correctly instead of passing through as invalid values - Use unique staging directory names (timestamp + random suffix) to prevent races between concurrent fetchAndCacheRules calls. Rename old dir before swapping new one in, and clean up on failure. - Extract CLOUD_SERVER_URL constant — pre-scan update, manual update, and periodic checks all use the same URL instead of duplicated ternaries Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: cfea8bbe20
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
The YARA engine compiles 1400+ rule files at startup (~5-8ms each), which blocked the main Electron process for 10+ seconds, freezing the UI completely. Fix: loadRules() now yields to the event loop every 20 files via setImmediate, keeping the UI responsive during compilation. Progress is pushed to the renderer via IPC. - loadRules() is now async with chunked compilation + onProgress callback - getYaraRulesInfo() is non-blocking — reports current state without triggering compilation - ensureYaraEngineStarted() kicks off background compilation at app startup (in registerMalwareScannerIpc) - New IPC channel MALWARE_YARA_COMPILE_PROGRESS pushes progress to UI - Database tab shows a progress bar during compilation: "Compiling signature rules (345/1400)..." - Engine status shows "Compiling..." (blue dot) instead of freezing - Auto-refreshes info when compilation completes Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Remove ensureYaraEngineStarted() from app startup — was blocking the main process for 10+ seconds even with setImmediate yields, causing a black screen on launch. YARA compilation now happens lazily on first scan. The app loads instantly. Also reduce chunk size from 20 to 5 files per yield (~35ms per chunk) so the UI stays smooth during compilation. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
… compiled Engine status was showing "Regex Fallback" when rules exist on disk but haven't been compiled yet (compilation is lazy). Now shows "Kudu Cloud Signatures" when cached or bundled rules are present, regardless of compilation state. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0c1cf31ba6
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
The renderer was checking yaraInfo.available (false before compilation)
instead of yaraInfo.engine ('yara' when rules exist on disk). Now
correctly shows "Kudu Cloud Signatures" when cached rules are present.
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Add a prominent "Scan" button below the empty state description so users don't have to find the small button in the top-right corner. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8145dbb2ef
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
… init The pre-scan rule update had a 60-second timeout that could stall the scan for over a minute on slow/unreachable servers. Removed it entirely — rule freshness is handled by the periodic 6-hour check and the Database tab's manual update button. Phase 1 now shows "Compiling signature rules..." during YARA initialization so the user sees what's happening instead of a stuck "Initializing scan engines..." message. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6d43315dab
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…onds addRuleFile() recompiles the entire accumulated ruleset on every call, getting exponentially slower (1374 files took 8.4 minutes). Switch to concatenating all rule sources and calling compile() once (~2 seconds). If the bulk compile fails (bad rule syntax), falls back to per-file validation to find and exclude broken files (~6 seconds), then compiles the rest in a single call. Benchmark: 1374 rule files - Before: 502 seconds (8.4 minutes, blocked UI entirely) - After: ~2 seconds (fast path) or ~8 seconds (with bad rules) Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9673a422b0
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Replace the flat 10,000-file global cap with per-path depth and file limits tuned by risk level: - High-risk (Downloads, Desktop, Temp): depth 6, up to 10k files - Medium-risk (AppData, ProgramData): depth 4-5, up to 8k files - Lower-risk (Program Files): depth 2, up to 3k files - User home root: depth 1 only (catch dropped files, skip subdirs) Total scannable files across all paths: ~65k (was 10k hard cap). Safe locations scan shallow to avoid wasting time on known-good directories while high-risk locations get thorough deep scans. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Filter rule files by filename before compiling — files containing _windows_, _linux_, _macos_, or _darwin_ in their name are skipped if they don't match the current OS. This happens client-side after the integrity-checked bundle download, so no cloud changes needed. Impact: On Windows skips 266 Linux/macOS files (19%), on macOS skips 611 files (44%), on Linux skips 425 files (31%). Reduces both compilation time and memory usage proportionally. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8ed17dc5ac
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…6 SSRF - Fix getScanPaths() to return string[] for path validators (quarantine, delete, restore) while getScanDirs() returns MalwareScanDir[] for file discovery. Prevents ERR_INVALID_ARG_TYPE on remediation actions. - Handle filenameOnly YARA matches with path context (skip only in system dirs on Windows) instead of discarding them entirely. Cloud- pushed masquerade rules now produce detections when YARA is active. - Use deterministic sort (< >) instead of localeCompare in the build fetch script, matching computeBundleHash to prevent false SHA mismatches. - Block private IPv6 ranges (fc00::/7, fe80::/10, ::ffff: mapped) in the YARA update SSRF check, closing the IPv6 literal bypass. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 289c972406
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| engine: 'File Discovery' | ||
| }) | ||
| await collectFiles(scanPath, 4, 5000, files, scannableExts, exclusions) | ||
| await collectFiles(dir.path, dir.maxDepth, dir.maxFiles, files, scannableExts, exclusions) |
There was a problem hiding this comment.
Preserve per-root limits when collecting scan files
collectFiles stops when files.length >= maxFiles, but this call passes each directory’s own dir.maxFiles while reusing one shared files array across all roots. After an earlier root contributes more files than a later root’s cap (e.g., 600 files before a root capped at 500), subsequent roots short-circuit immediately and are never traversed, creating deterministic scan-coverage gaps in later locations.
Useful? React with 👍 / 👎.
| { path: 'C:\\Windows\\Temp', maxDepth: 3, maxFiles: 5000 }, | ||
| { path: 'C:\\Users\\Public', maxDepth: 4, maxFiles: 3000 }, |
There was a problem hiding this comment.
Keep remote malware allowlist in sync with added scan roots
Windows scan roots now include locations like C:\Windows\Temp, C:\Users\Public, and Program Files*, but remote remediation still validates paths through createWin32MalwarePaths().isAllowedMalwarePath, which only allows Downloads/Desktop/Documents/AppData/Temp/ProgramData. This means cloud malware-quarantine/malware-delete can reject threats found by the scanner in the newly added roots, so detections become non-actionable remotely until the allowlist is expanded.
Useful? React with 👍 / 👎.
Summary
libyara-wasm) to the malware scanner, replacing hardcoded regex patterns with extensible.yarrulesupdate-yara-rulescloud agent command for push-based rule updates via Pusherdocs/cloud-yara-rules-prompt.md) for building the server-side rules management systemNew files
src/main/services/yara-engine.ts— WASM engine wrapper (load, compile, scan)src/main/services/yara-rules-store.ts— cloud fetch, SHA-256 validation, disk caching, periodic checkssrc/main/services/yara-engine.test.ts— 10 tests (conversion logic + libyara-wasm integration)src/main/services/yara-rules-store.test.ts— 30 tests (validation, hashing, integrity, metadata)docs/cloud-yara-rules-prompt.md— full prompt for building the cloud-side API + rule managementModified files
malware-scanner.ipc.ts— Phase 3 uses YARA when available, falls back to regex if notcloud-agent-types.ts— newupdate-yara-rulescommand typecloud-agent.ts— handler with SSRF protection, fetches rules and resets enginepackage.json— addlibyara-wasmdependencyTest plan
npm run build)🤖 Generated with Claude Code