Skip to content

Which version of swebench is using by sb-cli? #14

@daa233

Description

@daa233

I run SWE-Bench Multimodal both by OpenHands and sb-cli. However, I got diffrent results:

  • By OpenHands eval_infer.py, the final result is 25 / 94. (26.60%)
  • By sb-cli submit according to here, the final result is 14 / 94. (14.89%)

The differences are also described by this issue OpenHands/OpenHands#10452

Could you please tell the which version of swebench is using by sb-cli?

Thanks a lot!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions