Skip to content

[FEATURE] Add a way to split Log4j audit log messages when resolve_indices is enabled for clusters with many indices to prevent audit_trace_indices/audit_trace_resolved_indices fields making messages too big #5976

@parislarkins

Description

@parislarkins

What is the bug?
For clusters with large numbers of indices (e.g. ~11,000 in the case I first observed this issue) with the plugins.security.audit.config.resolve_indices OpenSearch setting and appender.rolling_audit.layout.maxMessageLength=0 log4j setting enabled (to disable log message truncation), the audit_trace_indices and audit_trace_resolved_indices fields in audit logs that relate to all indices can be extremely long. For example, 11,000 indexes with 100 character long names requires 1,100,000 characters to just to represent the index names alone (plus even more characters for the surrounding quotes and commas between each of them).

These messages can get so large as to cause problems for downstream parts of your logging pipeline (for example, the default Apache Kafka maximum message is 1mb). In these cases, it is usually recommended to split large messages into smaller ones, as they are able to be handled more efficiently than giant messages.

It would be ideal if OpenSearch could be configured to split audit messages with huge numbers of index names (such as by specifying a maximum number of index name characters per log message) into multiple smaller messages, keeping all other audit message fields the same apart from audit_trace_indices and audit_trace_resolved_indices. For a simple example, if we could set the maximum index name characters per message to 18, this original message:

{
  "audit_trace_indices": [
    "index*"
  ],
  "audit_trace_resolved_indices": [
    "index1",
    "index2",
    "index3",
    "index4",
    "index5",
    "index6",
    "index7",
    "index8"
  ],
  "audit_category": "AUTHENTICATED"
}

Would then be split into the following 3 messages:

{
  "audit_trace_indices": [
    "index*"
  ]
  "audit_trace_resolved_indices": [
    "index2",
    "index3"
  ],
  "audit_category": "AUTHENTICATED"
}
{
  "audit_trace_resolved_indices": [
    "index4",
    "index5",
    "index6"
  ],
  "audit_category": "AUTHENTICATED"
}
{
  "audit_trace_resolved_indices": [
    "index7",
    "index8"
  ],
  "audit_category": "AUTHENTICATED"
}

These 3 split messages contain the exact same information as the source message, so no information is lost (although it is more effort to re-construct the original event).

How can one reproduce the bug?
Steps to reproduce the behavior:

  1. Enable log4j audit logging on your cluster, with an unlimited max message size, e.g. appender.rolling_audit.layout.maxMessageLength=0 and plugins.security.audit.config.resolve_indices enabled.
  2. Create 11,000 indexes with names at least 100 characters long.
  3. Perform a simple request that relates to every index, e.g. "GET /*"
  4. Observe the giant audit logs produced in the log4j output file.

What is the expected behavior?
It would be ideal if OpenSearch could be configured to split audit messages with huge numbers of index names (such as by specifying a maximum number of index name characters per log message) into multiple smaller messages, keeping all other audit message fields the same apart from audit_trace_indices and audit_trace_resolved_indices.

What is your host/environment?

  • OS: Debian 12

Do you have any screenshots?
If applicable, add screenshots to help explain your problem.

Do you have any additional context?
Add any other context about the problem.

Somewhat relates to #5363

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinguntriagedRequire the attention of the repository maintainers and may need to be prioritized

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions