Skip to content

Add "inside loop" feature to basic blocks in Binexport2 extractor.#3075

Open
larchchen wants to merge 1 commit into
mandiant:masterfrom
larchchen:inside-loop-characteristics
Open

Add "inside loop" feature to basic blocks in Binexport2 extractor.#3075
larchchen wants to merge 1 commit into
mandiant:masterfrom
larchchen:inside-loop-characteristics

Conversation

@larchchen
Copy link
Copy Markdown
Contributor

@larchchen larchchen commented May 15, 2026

Detecting API happening in a loop is an effective approach for exploits leveraging racing conditions.

A classic example is DirtyCow (CVE-2016-5195)
By detecting madvise calls inside a loop with MADV_DONTNEED argument.

  scopes:
    static: basic block
  features:
    - and:
      - api: madvise
      - number: 4 # Constant for MADV_DONTNEED
      - characteristic: inside loop

A more recent example CVE-2024-50066 can be covered by

  - and:
    - api: madvise
    - number: 25 # MADV_COLLAPSE
    - characteristic: inside loop

Checklist

  • No CHANGELOG update needed
  • No new tests needed
  • No documentation update needed
  • This submission includes AI-generated code and I have provided details in the description.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add bug fixes, new features, breaking changes and anything else you think is worthwhile mentioning to the master (unreleased) section of CHANGELOG.md. If no CHANGELOG update is needed add the following to the PR description: [x] No CHANGELOG update needed

@github-actions github-actions Bot dismissed their stale review May 15, 2026 14:53

CHANGELOG updated or no update needed, thanks! 😄

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add bug fixes, new features, breaking changes and anything else you think is worthwhile mentioning to the master (unreleased) section of CHANGELOG.md. If no CHANGELOG update is needed add the following to the PR description: [x] No CHANGELOG update needed

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add bug fixes, new features, breaking changes and anything else you think is worthwhile mentioning to the master (unreleased) section of CHANGELOG.md. If no CHANGELOG update is needed add the following to the PR description: [x] No CHANGELOG update needed

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add bug fixes, new features, breaking changes and anything else you think is worthwhile mentioning to the master (unreleased) section of CHANGELOG.md. If no CHANGELOG update is needed add the following to the PR description: [x] No CHANGELOG update needed

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the 'inside loop' characteristic for basic blocks in the BinExport2 extractor. It adds a new utility module, capa/features/extractors/loops.py, which uses networkx to identify vertices within cycles. Feedback identifies a structural issue in the new loops module where code precedes the license header and lacks necessary imports. Additionally, suggestions were made to move a local import to the top level for PEP 8 compliance and to simplify edge collection logic using a list comprehension.

Comment thread capa/features/extractors/loops.py Outdated
Comment thread capa/features/extractors/binexport2/__init__.py Outdated
Comment thread capa/features/extractors/binexport2/__init__.py Outdated
@github-actions github-actions Bot dismissed stale reviews from themself May 15, 2026 14:57

CHANGELOG updated or no update needed, thanks! 😄

@larchchen larchchen force-pushed the inside-loop-characteristics branch from 4c071eb to 66d9ff9 Compare May 15, 2026 15:02
Detecting API happening in a loop is an effective approach for exploits leveraging racing conditions.

A classic example is DirtyCow (CVE-2016-5195)
By detecting madvise calls inside a loop with MADV_DONTNEED argument.

```
  scopes:
    static: basic block
  features:
    - and:
      - api: madvise
      - number: 4 # Constant for MADV_DONTNEED
      - characteristic: inside loop
```

A more recent example CVE-2024-50066 can be covered by

```
  - and:
    - api: madvise
    - number: 25 # MADV_COLLAPSE
    - characteristic: inside loop
```
@larchchen larchchen force-pushed the inside-loop-characteristics branch from 66d9ff9 to 9b38843 Compare May 15, 2026 15:17
@mike-hunhoff
Copy link
Copy Markdown
Collaborator

Thank you for iterating on this! After reviewing the implementation and thinking about how to best align this with capa's architecture and performance constraints, I suggest a unified path forward that simplifies the rule authoring experience and minimizes performance overhead.

Here is the suggested implementation strategy:

1. Reuse the existing loop characteristic

Instead of introducing a new name like inside loop at the basic block scope, let's just yield the existing Characteristic("loop") at the instruction scope for instructions contained within a cycle.

  • Why: This avoids inflating the rule vocabulary and maintains perfect backward compatibility. By extracting it at the narrowest scope (instruction), it will automatically bubble up to basic block and function scopes properly.

2. Perform Graph Analysis Once Per Function (Performance)

To avoid the heavy performance penalty of running networkx graph analysis repeatedly, I suggest computing the loop vertices once at the function level and sharing them with the instruction extractor:

  • In the function extractor (where we already build edges for the current has_loop check), use your SCC logic to extract the set of basic block addresses that participate in cycles.
  • Store this set in the transient FunctionHandle.ctx dictionary (e.g., fh.ctx["cyclic_blocks"]).
  • In the instruction extractor, simply check if the instruction's parent basic block address is in that set. This is a fast $O(1)$ lookup with no heavy object creation at the instruction level, and the memory is freed as soon as capa moves to the next function.

3. Cross-Backend Consistency

Since capa rules strive to be backend-agnostic, this pattern should be applied across all static backends (Vivisect, Ghidra, IDA, etc.) that already support the function-level loop characteristic.

This approach fits perfectly into capa's style of using transient handle contexts for performance optimization while keeping the rule language intuitive. Let me know what you think!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants