Add "inside loop" feature to basic blocks in Binexport2 extractor. by larchchen · Pull Request #3075 · mandiant/capa

larchchen · 2026-05-15T14:53:25Z

Detecting API happening in a loop is an effective approach for exploits leveraging racing conditions.

A classic example is DirtyCow (CVE-2016-5195)
By detecting madvise calls inside a loop with MADV_DONTNEED argument.

  scopes:
    static: basic block
  features:
    - and:
      - api: madvise
      - number: 4 # Constant for MADV_DONTNEED
      - characteristic: inside loop

A more recent example CVE-2024-50066 can be covered by

  - and:
    - api: madvise
    - number: 25 # MADV_COLLAPSE
    - characteristic: inside loop

Checklist

No CHANGELOG update needed

No new tests needed

No documentation update needed

This submission includes AI-generated code and I have provided details in the description.

github-actions

Please add bug fixes, new features, breaking changes and anything else you think is worthwhile mentioning to the master (unreleased) section of CHANGELOG.md. If no CHANGELOG update is needed add the following to the PR description: [x] No CHANGELOG update needed

CHANGELOG updated or no update needed, thanks! 😄

github-actions

Please add bug fixes, new features, breaking changes and anything else you think is worthwhile mentioning to the master (unreleased) section of CHANGELOG.md. If no CHANGELOG update is needed add the following to the PR description: [x] No CHANGELOG update needed

github-actions

Please add bug fixes, new features, breaking changes and anything else you think is worthwhile mentioning to the master (unreleased) section of CHANGELOG.md. If no CHANGELOG update is needed add the following to the PR description: [x] No CHANGELOG update needed

github-actions

Please add bug fixes, new features, breaking changes and anything else you think is worthwhile mentioning to the master (unreleased) section of CHANGELOG.md. If no CHANGELOG update is needed add the following to the PR description: [x] No CHANGELOG update needed

gemini-code-assist

Code Review

This pull request introduces the 'inside loop' characteristic for basic blocks in the BinExport2 extractor. It adds a new utility module, capa/features/extractors/loops.py, which uses networkx to identify vertices within cycles. Feedback identifies a structural issue in the new loops module where code precedes the license header and lacks necessary imports. Additionally, suggestions were made to move a local import to the top level for PEP 8 compliance and to simplify edge collection logic using a list comprehension.

CHANGELOG updated or no update needed, thanks! 😄

Detecting API happening in a loop is an effective approach for exploits leveraging racing conditions. A classic example is DirtyCow (CVE-2016-5195) By detecting madvise calls inside a loop with MADV_DONTNEED argument. ``` scopes: static: basic block features: - and: - api: madvise - number: 4 # Constant for MADV_DONTNEED - characteristic: inside loop ``` A more recent example CVE-2024-50066 can be covered by ``` - and: - api: madvise - number: 25 # MADV_COLLAPSE - characteristic: inside loop ```

mike-hunhoff · 2026-05-15T17:55:39Z

Thank you for iterating on this! After reviewing the implementation and thinking about how to best align this with capa's architecture and performance constraints, I suggest a unified path forward that simplifies the rule authoring experience and minimizes performance overhead.

Here is the suggested implementation strategy:

1. Reuse the existing `loop` characteristic

Instead of introducing a new name like inside loop at the basic block scope, let's just yield the existing Characteristic("loop") at the instruction scope for instructions contained within a cycle.

Why: This avoids inflating the rule vocabulary and maintains perfect backward compatibility. By extracting it at the narrowest scope (instruction), it will automatically bubble up to basic block and function scopes properly.

2. Perform Graph Analysis Once Per Function (Performance)

To avoid the heavy performance penalty of running networkx graph analysis repeatedly, I suggest computing the loop vertices once at the function level and sharing them with the instruction extractor:

In the function extractor (where we already build edges for the current has_loop check), use your SCC logic to extract the set of basic block addresses that participate in cycles.
Store this set in the transient FunctionHandle.ctx dictionary (e.g., fh.ctx["cyclic_blocks"]).
In the instruction extractor, simply check if the instruction's parent basic block address is in that set. This is a fast $O(1)$ lookup with no heavy object creation at the instruction level, and the memory is freed as soon as capa moves to the next function.

3. Cross-Backend Consistency

Since capa rules strive to be backend-agnostic, this pattern should be applied across all static backends (Vivisect, Ghidra, IDA, etc.) that already support the function-level loop characteristic.

This approach fits perfectly into capa's style of using transient handle contexts for performance optimization while keeping the rule language intuitive. Let me know what you think!

github-actions Bot previously requested changes May 15, 2026

View reviewed changes

gemini-code-assist Bot reviewed May 15, 2026

View reviewed changes

Comment thread capa/features/extractors/loops.py Outdated

Comment thread capa/features/extractors/binexport2/__init__.py Outdated

Comment thread capa/features/extractors/binexport2/__init__.py Outdated

larchchen force-pushed the inside-loop-characteristics branch from 4c071eb to 66d9ff9 Compare May 15, 2026 15:02

larchchen force-pushed the inside-loop-characteristics branch from 66d9ff9 to 9b38843 Compare May 15, 2026 15:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add "inside loop" feature to basic blocks in Binexport2 extractor.#3075

Add "inside loop" feature to basic blocks in Binexport2 extractor.#3075
larchchen wants to merge 1 commit into
mandiant:masterfrom
larchchen:inside-loop-characteristics

larchchen commented May 15, 2026 •

edited

Loading

Uh oh!

github-actions Bot left a comment

Uh oh!

github-actions Bot left a comment

Uh oh!

github-actions Bot left a comment

Uh oh!

github-actions Bot left a comment

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mike-hunhoff commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

larchchen commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checklist

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mike-hunhoff commented May 15, 2026

1. Reuse the existing loop characteristic

2. Perform Graph Analysis Once Per Function (Performance)

3. Cross-Backend Consistency

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

larchchen commented May 15, 2026 •

edited

Loading

1. Reuse the existing `loop` characteristic