Skip to content

Feature: Optionally Include Training Data in SklearnSerializer Output #47

@Gnpd

Description

@Gnpd

Feature: Optionally Include Training Data in SklearnSerializer Output

Summary:
Add support for optionally including training data (X, y) in the output of SklearnSerializer.serialize. This would allow users to pass the original training features and targets when serializing a model, resulting in a training_data key in the serialized dictionary.

Motivation:

  • Enables reproducibility and easier model inspection by storing the data used for training alongside the model parameters and attributes.
  • Facilitates downstream tasks such as model validation, auditing, and sharing, where access to the original training data is beneficial.

Proposed API Change:

  • Update the serialize method to accept optional X and y parameters.
  • If provided, include a training_data key in the output dictionary:
    {
        ...existing keys...,
        "training_data": {
            "X": <serialized X>,
            "y": <serialized y>
        }
    }
  • If not provided, omit the training_data key.

Example Usage:

serializer = SklearnSerializer()
serialized = serializer.serialize(model, X=X_train, y=y_train)

Notes:

  • Training data should be serialized using the existing conversion utilities to ensure compatibility with JSON and other formats.
  • This addition should be fully optional and backward compatible.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

Status

Backlog

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions