I run SWE-Bench Multimodal both by OpenHands and sb-cli. However, I got diffrent results:
- By OpenHands
eval_infer.py, the final result is 25 / 94. (26.60%)
- By sb-cli submit according to here, the final result is 14 / 94. (14.89%)
The differences are also described by this issue OpenHands/OpenHands#10452
Could you please tell the which version of swebench is using by sb-cli?
Thanks a lot!
I run SWE-Bench Multimodal both by OpenHands and sb-cli. However, I got diffrent results:
eval_infer.py, the final result is 25 / 94. (26.60%)swebench = "^3.0.8"linkThe differences are also described by this issue OpenHands/OpenHands#10452
Could you please tell the which version of
swebenchis using by sb-cli?Thanks a lot!