MLE-27077 Added fix for invalid header for empty doc#1899
Conversation
|
Copyright Validation Results ⏭️ Skipped (Excluded) Files
✅ Valid Files
✅ All files have valid copyright headers! |
There was a problem hiding this comment.
Pull request overview
Adds a client-side workaround for malformed Content-Disposition headers (notably on empty documents) and updates/extends existing tests around document paging and URI handling.
Changes:
- Refactors
ReadDocumentPageTestimports/class base and adds a new test for an empty text document. - Updates
OkHttpServices#getHeaderUrito recover fromjakarta.mailparsing failures caused by a trailingformat=parameter. - Adds a helper to parse
Content-Dispositionand extractfilenamesafely after cleaning.
Reviewed changes
Copilot reviewed 2 out of 3 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| marklogic-client-api/src/test/java/com/marklogic/client/test/document/ReadDocumentPageTest.java | Updates test structure and adds an empty text document test case related to the header parsing fix. |
| marklogic-client-api/src/main/java/com/marklogic/client/impl/OkHttpServices.java | Implements defensive parsing for malformed Content-Disposition to avoid ParseException and extract filenames reliably. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| * See MLE-15748, which pertains to issues with javax.mail only allowing US-ASCII characters. | ||
| */ | ||
| @Test | ||
| void test() { |
There was a problem hiding this comment.
The method name test is too generic for a test suite and makes failures harder to interpret. Rename it to something intent-revealing (e.g., readsUriWithNonAsciiCharacters or searchHandlesNonAsciiUriFilename) to reflect what the test is validating.
| void test() { | |
| void readsUriWithNonAsciiCharacters() { |
| void emptyTextDocument() { | ||
| final String uri = "/sample/empty-file.txt"; | ||
|
|
||
| try (DatabaseClient client = Common.newClient()) { | ||
| JSONDocumentManager documentManager = client.newJSONDocumentManager(); | ||
| StructuredQueryDefinition query = new StructuredQueryBuilder().document(uri); | ||
| DocumentRecord documentRecord; | ||
| try (DocumentPage documentPage = documentManager.search(query, 1)) { | ||
| assertTrue(documentPage.hasNext(), "Expected a document in the page, but none was found."); | ||
| documentRecord = documentPage.next(); | ||
| } | ||
| String actualUri = documentRecord.getUri(); | ||
| assertEquals(uri, actualUri, "The URI of the empty document should match the one written."); | ||
| } | ||
| } |
There was a problem hiding this comment.
This test asserts the document exists but does not create/write /sample/empty-file.txt within the test. That makes the test dependent on external state and likely flaky. Make the test self-contained by writing an empty document at uri before searching (and cleaning it up afterward), or by using a known fixture/setup that guarantees the document exists.
| void test() { | ||
| Common.deleteUrisWithPattern("/aaa-page/*"); | ||
|
|
||
| final String uri = "/aaa-page/太田佳伸のXMLファイル.xml"; | ||
| DocumentRecord documentRecord; | ||
| try (DatabaseClient client = Common.newClient()) { |
There was a problem hiding this comment.
The test writes/reads under the /aaa-page/ URI space, but there’s no cleanup visible in the updated method. To keep the suite repeatable and avoid cross-test interference, add a cleanup step (e.g., delete the created URI(s) in a finally block or in an @AfterEach) so reruns don’t depend on prior test state.
| static private String getHeaderUri(BodyPart part) { | ||
| try { | ||
| if (part != null) { | ||
| return part.getFileName(); | ||
| if (part == null) { | ||
| return null; | ||
| } | ||
| // if it's not found, just return null | ||
|
|
||
| try { | ||
| String filename = part.getFileName(); | ||
| if (filename != null) { | ||
| return filename; | ||
| } | ||
| } catch (ParseException e) { | ||
| // Jakarta Mail's parser failed due to malformed Content-Disposition header. | ||
| // Check if MarkLogic sent a malformed "format=" parameter at the end, which violates RFC 2183. | ||
| String contentDisposition = getHeader(part, "Content-Disposition"); | ||
| if (contentDisposition != null && contentDisposition.matches(".*;\\s*format\\s*=\\s*$")) { | ||
| // Remove the trailing "; format=" to fix the malformed header | ||
| String cleaned = contentDisposition.replaceFirst(";\\s*format\\s*=\\s*$", "").trim(); | ||
| logger.debug("Removed trailing 'format=' from malformed Content-Disposition header: {} -> {}", contentDisposition, cleaned); | ||
| return extractFilenameFromContentDisposition(cleaned); | ||
| } | ||
| throw e; | ||
| } |
There was a problem hiding this comment.
The new ParseException recovery path is important behavior (it changes failure into a best-effort filename extraction). Add a focused test that reproduces a malformed Content-Disposition ending in ; format= and asserts that getHeaderUri returns the expected filename (and does not throw).
| static private String extractFilenameFromContentDisposition(String contentDisposition) { | ||
| if (contentDisposition == null) { | ||
| return null; | ||
| } | ||
| try { | ||
| // Use Jakarta Mail's ContentDisposition parser to extract the filename parameter. This is the class | ||
| // that throws an error when "format=" exists in the value, but that has been removed already. | ||
| ContentDisposition cd = new ContentDisposition(contentDisposition); | ||
| return cd.getParameter("filename"); | ||
| } catch (ParseException e) { | ||
| logger.warn("Failed to parse cleaned Content-Disposition header: {}; cause: {}", | ||
| contentDisposition, e.getMessage()); | ||
| return null; | ||
| } | ||
| } |
There was a problem hiding this comment.
This fallback only checks the filename parameter. For valid Content-Disposition headers, servers may provide filename* (RFC 5987/2231) instead of filename. In that case this method will return null even though a filename exists. Consider checking for filename* as well (and decoding it appropriately) before returning null.
1a98321 to
4c6216a
Compare
4c6216a to
b0058d3
Compare
No description provided.