benchmarks: harden longmemeval runner for Windows encoding#1204
benchmarks: harden longmemeval runner for Windows encoding#1204mschultheiss83 wants to merge 1 commit intoMemPalace:developfrom
Conversation
|
Nice fix on both layers. The same dual pattern (bare
The issue framing ("audit benchmark runners", plural) puts them in scope, FWIW. |
|
Audited the other benchmark runners for the same Same failure shape on any non-ASCII content. Up to you whether to expand scope here or leave them for a follow-up — both are reasonable, and the LongMemEval fix is a clean unit on its own. Thanks for the explicit-UTF-8 + ASCII-console-separator hardening; that's the right pattern. |
|
Thanks, this is helpful. I intentionally kept this PR narrow to the reproducible LongMemEval Windows failure from #1203 so the change stayed easy to review and validate. I agree the same pattern exists in |
|
Follow-up PR is the right call — keeps this one focused on the reproducible failure you opened it for, and the other three runners can land separately without complicating review here. If it's useful, I can take a swing at the follow-up since the audit was already done — but happy to leave it to you if you'd rather keep ownership of the benchmark hardening thread. Either works. |
|
Thanks, @jphein — agree a follow-up PR is the right call. If you’re up for it, please take a swing at the follow-up and own the changes for |
Summary
Narrow Windows fix for the LongMemEval benchmark runner.
This PR addresses two reproducible failures in
benchmarks/longmemeval_bench.pyon native Windows:UnicodeEncodeErroron a defaultcp1252consoleChanges
tests/benchmarks/test_longmemeval_bench.pyReproduction
Before this change on Windows:
could fail with a
UnicodeDecodeErrorwhile reading the dataset, and after fixing file I/O the same runner could still fail withUnicodeEncodeErrorwhen printing separators to a non-UTF-8 console.After this change, the runner completes on the same machine without forcing UTF-8 mode.
Validation
Ran:
Result:
Ran:
Result:
Scope / Notes
benchmarks/longmemeval_bench.py