Skip to content

Set maximum length for fields#1235

Open
samdoran wants to merge 4 commits intolightspeed-core:mainfrom
samdoran:infer-max
Open

Set maximum length for fields#1235
samdoran wants to merge 4 commits intolightspeed-core:mainfrom
samdoran:infer-max

Conversation

@samdoran
Copy link
Contributor

@samdoran samdoran commented Feb 27, 2026

Description

The /v1/infer API accepts arbitrary data in the post body. Set maximum limits on the question, stdin, attachment contents, and terminal output fields to avoid overwhelming the server.

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • Integration tests improvement
  • End to end tests improvement
  • Benchmarks improvement

Tools used to create PR

  • Assisted-by: Claude

Related Tickets & Documents

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

pytest tests/integration/endpoints/test_rlsapi_v1_integration.py::test_infer_size_limit
pytest tests/unit/models/rlsapi/test_requests.py::test_value_max_length

Summary by CodeRabbit

  • New Features

    • Enforced input size limits: inference questions (10,240 chars); attachments, terminal output, and context stdin (65,536 chars). Oversized requests return 422 validation errors.
  • Tests

    • Added and updated integration and unit tests to verify these size limits and boundary conditions are enforced.

We do not want to accept unbounded amounts of input.
Use base 2 numbers because they are cool and nerdy.
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 27, 2026

Walkthrough

Added max_length string validators to four Rlsapi V1 request fields (attachment.contents, terminal.output, context.stdin: 65,536; infer.question: 10,240). New unit and integration tests cover boundary and oversized payload cases. Removed pytest.mark.asyncio decorators from many async unit tests.

Changes

Cohort / File(s) Summary
Model Validation Constraints
src/models/rlsapi/requests.py
Added max_length constraints to four string fields: RlsapiV1Attachment.contents (65,536), RlsapiV1Terminal.output (65,536), RlsapiV1Context.stdin (65,536), RlsapiV1InferRequest.question (10,240).
Unit Test Validation
tests/unit/models/rlsapi/test_requests.py
Added parameterized test_value_max_length to assert boundary acceptance at max length and ValidationError when exceeding max length for the four constrained fields.
Integration Test Coverage
tests/integration/endpoints/test_rlsapi_v1_integration.py
Added test_infer_size_limit integration test POSTing oversized question, stdin, attachment contents, and terminal output to /v1/infer, asserting 422 with string_too_long detail.
Async Test Marker Cleanup
tests/unit/app/endpoints/test_rlsapi_v1.py
Removed pytest.mark.asyncio decorators from multiple async test functions; tests remain async def and rely on test runner's async handling.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title 'Set maximum length for fields' accurately summarizes the main objective of the changeset, which adds max_length validation constraints to four string fields across multiple model classes.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
src/models/rlsapi/requests.py (1)

32-35: Centralize max-length values into shared constants.

The limits are correct, but repeating raw numbers across fields makes drift likely. Move these values to shared constants (preferably in constants.py) and reference them here and in tests.

Suggested refactor
+import constants
+
 class RlsapiV1Attachment(ConfigurationBase):
@@
     contents: str = Field(
         default="",
-        max_length=65_536,
+        max_length=constants.RLSAPI_CONTEXT_MAX_LENGTH,
@@
 class RlsapiV1Terminal(ConfigurationBase):
@@
     output: str = Field(
         default="",
-        max_length=65_536,
+        max_length=constants.RLSAPI_CONTEXT_MAX_LENGTH,
@@
 class RlsapiV1Context(ConfigurationBase):
@@
     stdin: str = Field(
         default="",
-        max_length=65_536,
+        max_length=constants.RLSAPI_CONTEXT_MAX_LENGTH,
@@
 class RlsapiV1InferRequest(ConfigurationBase):
@@
     question: str = Field(
         ...,
         min_length=1,
-        max_length=10_240,
+        max_length=constants.RLSAPI_QUESTION_MAX_LENGTH,

As per coding guidelines "Check constants.py for shared constants before defining new ones".

Also applies to: 52-55, 132-135, 176-180

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/models/rlsapi/requests.py` around lines 32 - 35, Several Pydantic Field
max_length literals (e.g., the contents Field with max_length=65_536 and the
other occurrences at the noted ranges) should be replaced with shared constants:
add appropriately named constants (e.g., MAX_FILE_CONTENTS_LENGTH,
MAX_TITLE_LENGTH, MAX_SUMMARY_LENGTH, etc.) to constants.py, import those
symbols into src/models/rlsapi/requests.py, and use them in the Field(...,
max_length=...) calls (update the Field declarations for identifiers like
contents and the other max_length fields referenced at 52-55, 132-135, 176-180);
also update any tests to reference the new constants from constants.py instead
of the raw numbers.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/integration/endpoints/test_rlsapi_v1_integration.py`:
- Around line 14-17: Reorder the imports so all FastAPI-related imports are
grouped together: make the lines with TestClient and HTTPException/status
adjacent (e.g., place from fastapi import HTTPException, status next to from
fastapi.testclient import TestClient) so TestClient, HTTPException, and status
are in the same import block and no other imports separate them; this will
satisfy the pylint C0412 import grouping rule.

In `@tests/unit/models/rlsapi/test_requests.py`:
- Around line 604-633: Add a local pylint suppression and missing docstring:
annotate the TestMaxLengthValidation class with a pylint-disable comment for
R0903 (too-few-public-methods) and add a short method docstring to
test_value_at_max_length describing the test purpose; reference the class name
TestMaxLengthValidation and the method test_value_at_max_length so the linter
warnings (R0903 and C0116) are suppressed and the CI will pass.

---

Nitpick comments:
In `@src/models/rlsapi/requests.py`:
- Around line 32-35: Several Pydantic Field max_length literals (e.g., the
contents Field with max_length=65_536 and the other occurrences at the noted
ranges) should be replaced with shared constants: add appropriately named
constants (e.g., MAX_FILE_CONTENTS_LENGTH, MAX_TITLE_LENGTH, MAX_SUMMARY_LENGTH,
etc.) to constants.py, import those symbols into src/models/rlsapi/requests.py,
and use them in the Field(..., max_length=...) calls (update the Field
declarations for identifiers like contents and the other max_length fields
referenced at 52-55, 132-135, 176-180); also update any tests to reference the
new constants from constants.py instead of the raw numbers.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9a6024a and fcdee35.

📒 Files selected for processing (4)
  • src/models/rlsapi/requests.py
  • tests/integration/endpoints/test_rlsapi_v1_integration.py
  • tests/unit/app/endpoints/test_rlsapi_v1.py
  • tests/unit/models/rlsapi/test_requests.py
💤 Files with no reviewable changes (1)
  • tests/unit/app/endpoints/test_rlsapi_v1.py

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
tests/integration/endpoints/test_rlsapi_v1_integration.py (1)

512-516: Assert status before parsing detail to improve failure clarity.

Line 513 dereferences response.json()["detail"] before Line 515 validates the expected status, which can hide the root cause when the response shape differs.

Proposed change
 def test_infer_size_limit(integration_http_client: TestClient, json) -> None:
     """Test that a field exceeding limit is rejected."""
     response = integration_http_client.post("/v1/infer", json=json)
-    detail = response.json()["detail"]
-
     assert response.status_code == status.HTTP_422_UNPROCESSABLE_CONTENT
+    detail = response.json()["detail"]
     assert "string_too_long" in {item["type"] for item in detail}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/integration/endpoints/test_rlsapi_v1_integration.py` around lines 512 -
516, The test reads response.json()["detail"] into detail before asserting the
response.status_code, which can mask schema errors; change the order in the test
around the integration_http_client.post call so you first assert
response.status_code == status.HTTP_422_UNPROCESSABLE_CONTENT and only then
parse response.json() to extract detail, and keep the existing assertion that
"string_too_long" is in {item["type"] for item in detail}.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/unit/models/rlsapi/test_requests.py`:
- Around line 607-612: The parametrized test ids array contains a truncated id
"infer-request-questio"; update the ids list in
tests/unit/models/rlsapi/test_requests.py (the ids used for the parametrized
test) to the full, descriptive value "infer-request-question" so test reports
and -k filtering work correctly.

---

Nitpick comments:
In `@tests/integration/endpoints/test_rlsapi_v1_integration.py`:
- Around line 512-516: The test reads response.json()["detail"] into detail
before asserting the response.status_code, which can mask schema errors; change
the order in the test around the integration_http_client.post call so you first
assert response.status_code == status.HTTP_422_UNPROCESSABLE_CONTENT and only
then parse response.json() to extract detail, and keep the existing assertion
that "string_too_long" is in {item["type"] for item in detail}.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fcdee35 and 91ff83c.

📒 Files selected for processing (3)
  • tests/integration/endpoints/test_rlsapi_v1_integration.py
  • tests/unit/app/endpoints/test_rlsapi_v1.py
  • tests/unit/models/rlsapi/test_requests.py
💤 Files with no reviewable changes (1)
  • tests/unit/app/endpoints/test_rlsapi_v1.py

Copy link
Contributor

@major major left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No objections here other than the typo that coderabbit found

Copy link
Contributor

@tisnik tisnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, LGTM, but please fix the typo.

The project configures pytest asyncio mode to auto so it is unnecessary
to marke individual tests as async.
@samdoran
Copy link
Contributor Author

I corrected the test ID.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests/integration/endpoints/test_rlsapi_v1_integration.py (1)

500-516: Tighten failure matching to the specific oversized field.
Good boundary test. To prevent false positives, assert the loc for the expected field in each case (not just type == "string_too_long").

Proposed refinement
 `@pytest.mark.parametrize`(
-    "json",
+    ("json", "expected_loc"),
     (
-        ({"question": "?" * 10_241}),
-        ({"question": "Q", "context": {"stdin": "a" * 65_537}}),
-        ({"question": "Q", "context": {"attachments": {"contents": "A" * 65_537}}}),
-        ({"question": "Q", "context": {"terminal": {"output": "T" * 65_537}}}),
+        ({"question": "?" * 10_241}, ["body", "question"]),
+        ({"question": "Q", "context": {"stdin": "a" * 65_537}}, ["body", "context", "stdin"]),
+        (
+            {"question": "Q", "context": {"attachments": {"contents": "A" * 65_537}}},
+            ["body", "context", "attachments", "contents"],
+        ),
+        (
+            {"question": "Q", "context": {"terminal": {"output": "T" * 65_537}}},
+            ["body", "context", "terminal", "output"],
+        ),
     ),
     ids=["question", "stdin", "attachment_contents", "terminal_output"],
 )
-def test_infer_size_limit(integration_http_client: TestClient, json) -> None:
+def test_infer_size_limit(
+    integration_http_client: TestClient, json, expected_loc
+) -> None:
     """Test that a field exceeding limit is rejected."""
     response = integration_http_client.post("/v1/infer", json=json)
     detail = response.json()["detail"]

     assert response.status_code == status.HTTP_422_UNPROCESSABLE_CONTENT
-    assert "string_too_long" in {item["type"] for item in detail}
+    assert any(
+        item.get("type") == "string_too_long" and item.get("loc") == expected_loc
+        for item in detail
+    )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/integration/endpoints/test_rlsapi_v1_integration.py` around lines 500 -
516, The test_infer_size_limit currently only checks for type ==
"string_too_long" which can produce false positives; update the parametrization
to include the expected location for each payload case (e.g., ("question",
["body", "question"]), ("context.stdin", ["body", "context", "stdin"]),
("context.attachments.contents", ["body", "context", "attachments",
"contents"]), ("context.terminal.output", ["body", "context", "terminal",
"output"])), then send the request via integration_http_client.post("/v1/infer",
json=json) and assert that response.status_code is
HTTP_422_UNPROCESSABLE_CONTENT and that the response.json()["detail"] contains
an item whose "type" == "string_too_long" and whose "loc" equals the expected
location for that parametrized case.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tests/integration/endpoints/test_rlsapi_v1_integration.py`:
- Around line 500-516: The test_infer_size_limit currently only checks for type
== "string_too_long" which can produce false positives; update the
parametrization to include the expected location for each payload case (e.g.,
("question", ["body", "question"]), ("context.stdin", ["body", "context",
"stdin"]), ("context.attachments.contents", ["body", "context", "attachments",
"contents"]), ("context.terminal.output", ["body", "context", "terminal",
"output"])), then send the request via integration_http_client.post("/v1/infer",
json=json) and assert that response.status_code is
HTTP_422_UNPROCESSABLE_CONTENT and that the response.json()["detail"] contains
an item whose "type" == "string_too_long" and whose "loc" equals the expected
location for that parametrized case.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 91ff83c and d15b85d.

📒 Files selected for processing (3)
  • tests/integration/endpoints/test_rlsapi_v1_integration.py
  • tests/unit/app/endpoints/test_rlsapi_v1.py
  • tests/unit/models/rlsapi/test_requests.py
💤 Files with no reviewable changes (1)
  • tests/unit/app/endpoints/test_rlsapi_v1.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants