fix: unify RPC error response JSON key to "error" across server and s…#1019
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses an inconsistency in how RPC error responses are formatted, specifically unifying the JSON key used for error messages to 'error'. This change ensures that error messages are correctly propagated and improves the debuggability of RPC failures. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
The code changes replace the string 'detail' with 'error' in the json responses for error messages. The code review identified critical insecure deserialization vulnerabilities in the RPC server and both the local and Slurm schedulers, where responses from worker processes are deserialized using deserialize_value, potentially leading to Remote Code Execution (RCE) if a compromised worker returns a malicious payload. Additionally, the review suggests consolidating error handling logic for status codes 400 and 500 in local.py and slurm.py to reduce code duplication, and consolidating the checks for required fields in rpc_server.py into a loop for better maintainability.
| config = data.get("config") | ||
| if config is None: | ||
| return jsonify({"detail": "Missing 'config' field in request"}), 400 | ||
| return jsonify({"error": "Missing 'config' field in request"}), 400 | ||
|
|
||
| rank = data.get("rank") | ||
| if rank is None: | ||
| return jsonify({"detail": "Missing 'rank' field in request"}), 400 | ||
| return jsonify({"error": "Missing 'rank' field in request"}), 400 |
There was a problem hiding this comment.
To improve maintainability and reduce repetition, you can consolidate the checks for required fields into a loop. This makes it easier to add more required fields in the future.
for field in ("config", "rank"):
if data.get(field) is None:
return jsonify({"error": f"Missing '{field}' field in request"}), 400
config = data["config"]
rank = data["rank"]| elif response.status_code == 400: | ||
| error_detail = response.json().get("detail", "Unknown error") | ||
| error_detail = response.json().get("error", "Unknown error") | ||
| raise WorkerConfigurationError(worker_id, error_detail, str(400)) | ||
| elif response.status_code == 500: | ||
| error_detail = response.json().get("detail", "Unknown error") | ||
| error_detail = response.json().get("error", "Unknown error") | ||
| raise WorkerConfigurationError(worker_id, error_detail, str(500)) |
There was a problem hiding this comment.
The error handling logic for status codes 400 and 500 is identical. You can combine these elif blocks to reduce code duplication and improve readability.
| elif response.status_code == 400: | |
| error_detail = response.json().get("detail", "Unknown error") | |
| error_detail = response.json().get("error", "Unknown error") | |
| raise WorkerConfigurationError(worker_id, error_detail, str(400)) | |
| elif response.status_code == 500: | |
| error_detail = response.json().get("detail", "Unknown error") | |
| error_detail = response.json().get("error", "Unknown error") | |
| raise WorkerConfigurationError(worker_id, error_detail, str(500)) | |
| elif response.status_code in (400, 500): | |
| error_detail = response.json().get("error", "Unknown error") | |
| raise WorkerConfigurationError(worker_id, error_detail, str(response.status_code)) |
| elif response.status_code == 400: | ||
| error_detail = response.json().get("detail", "Unknown error") | ||
| error_detail = response.json().get("error", "Unknown error") | ||
| raise WorkerConfigurationError(worker_id, error_detail, str(400)) | ||
| elif response.status_code == 500: | ||
| error_detail = response.json().get("detail", "Unknown error") | ||
| error_detail = response.json().get("error", "Unknown error") | ||
| raise WorkerConfigurationError(worker_id, error_detail, str(500)) |
There was a problem hiding this comment.
The error handling for status codes 400 and 500 is identical. To improve conciseness and maintainability, you can combine these elif blocks.
| elif response.status_code == 400: | |
| error_detail = response.json().get("detail", "Unknown error") | |
| error_detail = response.json().get("error", "Unknown error") | |
| raise WorkerConfigurationError(worker_id, error_detail, str(400)) | |
| elif response.status_code == 500: | |
| error_detail = response.json().get("detail", "Unknown error") | |
| error_detail = response.json().get("error", "Unknown error") | |
| raise WorkerConfigurationError(worker_id, error_detail, str(500)) | |
| elif response.status_code in (400, 500): | |
| error_detail = response.json().get("error", "Unknown error") | |
| raise WorkerConfigurationError(worker_id, error_detail, str(response.status_code)) |
|
Thanks for the review. These comments are about pre-existing code Regarding the security concern: the RPC communication is internal |
Description
Unify RPC error response JSON key to
"error"across server and schedulers.The
rpc_server.pyuses"error"as the JSON key in error responses (42 out of 45 places), but 3 places in the/configureendpoint incorrectly use"detail". On the consumer side,local.pyandslurm.pyread the error message using.get("detail", "Unknown error")in most places, which fails to extract the actual error message from the server, always falling back to"Unknown error".This mismatch makes it impossible to debug RPC failures — the real error message from the worker is silently lost.
Changes:
rpc_server.py: Changed 3 occurrences of"detail"→"error"in/configureendpoint (aligning with the other 42 uses of"error")local.py: Changed 8 occurrences of.get("detail", ...)→.get("error", ...)slurm.py: Changed 8 occurrences of.get("detail", ...)→.get("error", ...)test_local_scheduler.py: Updated 3 mock response payloads to use"error"keyRelated Issue
N/A
Type of Change
Checklist
jb build docs/gemini review)Additional Context
Before this fix, any RPC error (e.g., engine
onloadfailure) would show as"Unknown error"in the traceback becauselocal.py/slurm.pyread.get("detail")whilerpc_server.pyreturns{"error": "..."}. This made distributed debugging extremely difficult.