fix: unify RPC error response JSON key to "error" across server and s… by HT-Yuan · Pull Request #1019 · inclusionAI/AReaL

HT-Yuan · 2026-03-11T14:55:25Z

Description

Unify RPC error response JSON key to "error" across server and schedulers.

The rpc_server.py uses "error" as the JSON key in error responses (42 out of 45 places), but 3 places in the /configure endpoint incorrectly use "detail". On the consumer side, local.py and slurm.py read the error message using .get("detail", "Unknown error") in most places, which fails to extract the actual error message from the server, always falling back to "Unknown error".

This mismatch makes it impossible to debug RPC failures — the real error message from the worker is silently lost.

Changes:

rpc_server.py: Changed 3 occurrences of "detail" → "error" in /configure endpoint (aligning with the other 42 uses of "error")
local.py: Changed 8 occurrences of .get("detail", ...) → .get("error", ...)
slurm.py: Changed 8 occurrences of .get("detail", ...) → .get("error", ...)
test_local_scheduler.py: Updated 3 mock response payloads to use "error" key

Note: proxy_gateway.py (FastAPI) intentionally uses "detail" following FastAPI/OpenAPI conventions and is not affected by this change.

Related Issue

N/A

Type of Change

Bug fix (non-breaking change that fixes an issue)
New feature (non-breaking change that adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update
Code refactoring (no functional changes)
Performance improvement
Test coverage improvement

Checklist

I have read the Contributing Guide
I have run formatting tools (pre-commit or manual)
I have run relevant unit tests and they pass
I have added tests for new functionality
I have updated documentation if needed
My branch is up to date with main
This PR introduces breaking changes (if yes, fill out details below)
If this PR changes documentation, I have built and previewed it locally with jb build docs
No critical issues raised by AI reviewers (/gemini review)

Additional Context

Before this fix, any RPC error (e.g., engine onload failure) would show as "Unknown error" in the traceback because local.py/slurm.py read .get("detail") while rpc_server.py returns {"error": "..."}. This made distributed debugging extremely difficult.

…chedulers

gemini-code-assist · 2026-03-11T15:02:33Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses an inconsistency in how RPC error responses are formatted, specifically unifying the JSON key used for error messages to 'error'. This change ensures that error messages are correctly propagated and improves the debuggability of RPC failures.

Highlights

Error Response Unification: This PR unifies the JSON key used for error responses across the server and schedulers to consistently use 'error' instead of the inconsistent 'detail'.
Improved Debugging: The change resolves an issue where the actual error message from the worker was silently lost, making debugging RPC failures difficult.
Code Modifications: The PR modifies rpc_server.py, local.py, slurm.py, and test_local_scheduler.py to ensure consistent error key usage.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

local.py
- Standardized error message extraction using the 'error' key.
rpc_server.py
- Fixed inconsistent error key usage in /configure endpoint.
slurm.py
- Standardized error message extraction using the 'error' key.
test_local_scheduler.py
- Updated mock response payloads to use the 'error' key.

Activity

The PR author, HT-Yuan, has addressed an inconsistency in RPC error responses.
The changes involve modifications to rpc_server.py, local.py, slurm.py, and test_local_scheduler.py.
The PR includes unit tests and documentation updates.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

The code changes replace the string 'detail' with 'error' in the json responses for error messages. The code review identified critical insecure deserialization vulnerabilities in the RPC server and both the local and Slurm schedulers, where responses from worker processes are deserialized using deserialize_value, potentially leading to Remote Code Execution (RCE) if a compromised worker returns a malicious payload. Additionally, the review suggests consolidating error handling logic for status codes 400 and 500 in local.py and slurm.py to reduce code duplication, and consolidating the checks for required fields in rpc_server.py into a loop for better maintainability.

areal/infra/rpc/rpc_server.py

areal/infra/scheduler/local.py

areal/infra/scheduler/slurm.py

gemini-code-assist · 2026-03-11T15:08:16Z

areal/infra/rpc/rpc_server.py

        config = data.get("config")
        if config is None:
-            return jsonify({"detail": "Missing 'config' field in request"}), 400
+            return jsonify({"error": "Missing 'config' field in request"}), 400

        rank = data.get("rank")
        if rank is None:
-            return jsonify({"detail": "Missing 'rank' field in request"}), 400
+            return jsonify({"error": "Missing 'rank' field in request"}), 400


To improve maintainability and reduce repetition, you can consolidate the checks for required fields into a loop. This makes it easier to add more required fields in the future.

for field in ("config", "rank"): if data.get(field) is None: return jsonify({"error": f"Missing '{field}' field in request"}), 400 config = data["config"] rank = data["rank"]

gemini-code-assist · 2026-03-11T15:08:16Z

areal/infra/scheduler/local.py

            elif response.status_code == 400:
-                error_detail = response.json().get("detail", "Unknown error")
+                error_detail = response.json().get("error", "Unknown error")
                raise WorkerConfigurationError(worker_id, error_detail, str(400))
            elif response.status_code == 500:
-                error_detail = response.json().get("detail", "Unknown error")
+                error_detail = response.json().get("error", "Unknown error")
                raise WorkerConfigurationError(worker_id, error_detail, str(500))


The error handling logic for status codes 400 and 500 is identical. You can combine these elif blocks to reduce code duplication and improve readability.

Suggested change

elif response.status_code == 400:

error_detail = response.json().get("detail", "Unknown error")

error_detail = response.json().get("error", "Unknown error")

raise WorkerConfigurationError(worker_id, error_detail, str(400))

elif response.status_code == 500:

error_detail = response.json().get("detail", "Unknown error")

error_detail = response.json().get("error", "Unknown error")

raise WorkerConfigurationError(worker_id, error_detail, str(500))

elif response.status_code in (400, 500):

error_detail = response.json().get("error", "Unknown error")

raise WorkerConfigurationError(worker_id, error_detail, str(response.status_code))

gemini-code-assist · 2026-03-11T15:08:16Z

areal/infra/scheduler/slurm.py

            elif response.status_code == 400:
-                error_detail = response.json().get("detail", "Unknown error")
+                error_detail = response.json().get("error", "Unknown error")
                raise WorkerConfigurationError(worker_id, error_detail, str(400))
            elif response.status_code == 500:
-                error_detail = response.json().get("detail", "Unknown error")
+                error_detail = response.json().get("error", "Unknown error")
                raise WorkerConfigurationError(worker_id, error_detail, str(500))


The error handling for status codes 400 and 500 is identical. To improve conciseness and maintainability, you can combine these elif blocks.

Suggested change

elif response.status_code == 400:

error_detail = response.json().get("detail", "Unknown error")

error_detail = response.json().get("error", "Unknown error")

raise WorkerConfigurationError(worker_id, error_detail, str(400))

elif response.status_code == 500:

error_detail = response.json().get("detail", "Unknown error")

error_detail = response.json().get("error", "Unknown error")

raise WorkerConfigurationError(worker_id, error_detail, str(500))

elif response.status_code in (400, 500):

error_detail = response.json().get("error", "Unknown error")

raise WorkerConfigurationError(worker_id, error_detail, str(response.status_code))

HT-Yuan · 2026-03-11T15:17:14Z

Thanks for the review. These comments are about pre-existing code
(deserialize_value), not the changes in this PR. This PR only fixes
the JSON key mismatch ("detail" → "error") between the RPC server
and schedulers. The deserialization logic is unchanged.

Regarding the security concern: the RPC communication is internal
(scheduler ↔ workers it spawned), within a trusted boundary.

garrett4wade

LGTM

fix: unify RPC error response JSON key to "error" across server and s…

50929e0

…chedulers

gemini-code-assist bot reviewed Mar 11, 2026

View reviewed changes

Merge branch 'main' into bugfix/unify-rpc-error-response-key

5c90806

garrett4wade approved these changes Mar 13, 2026

View reviewed changes

garrett4wade merged commit 7cad4da into inclusionAI:main Mar 13, 2026
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: unify RPC error response JSON key to "error" across server and s…#1019

fix: unify RPC error response JSON key to "error" across server and s…#1019
garrett4wade merged 2 commits intoinclusionAI:mainfrom
HT-Yuan:bugfix/unify-rpc-error-response-key

HT-Yuan commented Mar 11, 2026

Uh oh!

gemini-code-assist bot commented Mar 11, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot Mar 11, 2026

Uh oh!

gemini-code-assist bot Mar 11, 2026

Uh oh!

gemini-code-assist bot Mar 11, 2026

Uh oh!

HT-Yuan commented Mar 11, 2026

Uh oh!

garrett4wade left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

HT-Yuan commented Mar 11, 2026

Description

Related Issue

Type of Change

Checklist

Additional Context

Uh oh!

gemini-code-assist bot commented Mar 11, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

HT-Yuan commented Mar 11, 2026

Uh oh!

garrett4wade left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants