Skip to content

Reduce numeric-identifier noise in threat actor similarity report#1172

Merged
adulau merged 1 commit intomainfrom
codex/update-threat_actor_similarity_report.py-for-name-handling
Apr 12, 2026
Merged

Reduce numeric-identifier noise in threat actor similarity report#1172
adulau merged 1 commit intomainfrom
codex/update-threat_actor_similarity_report.py-for-name-handling

Conversation

@adulau
Copy link
Copy Markdown
Member

@adulau adulau commented Apr 12, 2026

Motivation

  • Threat-actor names frequently reuse a textual stem with different numeric identifiers (e.g., apt 28 vs apt 29), which creates noisy high-similarity matches; the change aims to reduce these false positives while keeping comparisons focused on non-numeric content.

Description

  • Add strip_numeric_tokens() to remove digit sequences from normalized names and numeric_tokens() to extract numeric token sequences.
  • Update find_similar_name_pairs() to skip candidate pairs when both names contain numeric tokens, share the same non-numeric stem, but have different numeric sequences.
  • Document the numeric-variant filtering behavior in the function docstring and keep all other similarity logic (length pre-filter, SequenceMatcher) unchanged.
  • No changes to the CLI or output formats were made; markdown and JSON outputs remain the same.

Testing

  • Ran python3 -m py_compile tools/threat_actor_similarity_report.py which succeeded.
  • Ran python3 tools/threat_actor_similarity_report.py --help | head -n 20 to validate CLI help which printed successfully.
  • Executed python3 tools/threat_actor_similarity_report.py --max-results 5 --markdown-output /tmp/threat_report.md --json-output /tmp/threat_report.json which produced a report and JSON file, printing analysis summary (reports written) with no errors.

Codex Task

@adulau adulau merged commit 30e4b8c into main Apr 12, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant