Console logs on stdout corrupt JSON output in benchmark scripts

When running benchmarks (e.g., swebenchmultimodal), scripts often expect pure JSON output on stdout to be piped to tools like `jq`.

However, `benchmarks/utils/console_logging.py` currently configures the console handler to write to `stdout` when rich logging is enabled (or defaulted).

This causes issues when libraries (like OpenTelemetry) log errors or warnings. For example, a `Failed to detach context` error from OpenTelemetry is printed to stdout, which then causes `jq` to fail with:
`jq: parse error: Invalid numeric literal`

This happens because the log message is mixed with the JSON output.

**Fix**: Ensure all console logs are written to `stderr`, leaving `stdout` exclusively for the script's intended output (JSON).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Console logs on stdout corrupt JSON output in benchmark scripts #498

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Console logs on stdout corrupt JSON output in benchmark scripts #498

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions