Conversation
|
|
There was a problem hiding this comment.
Pull request overview
Refactors ProblemDefinition to be a Pydantic model to improve parsing/validation, and updates documentation/changelog to reflect the new behavior.
Changes:
- Converted
ProblemDefinitionto inherit frompydantic.BaseModeland added field definitions/validators. - Replaced the
directory_pathdeprecation logger warning with aDeprecationWarning. - Updated docs and changelog to describe the Pydantic-based implementation.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 8 comments.
| File | Description |
|---|---|
src/plaid/problem_definition.py |
Migrates ProblemDefinition to Pydantic, adds model config/validators, and updates deprecated directory_path handling. |
docs/source/core_concepts/problem_definition.md |
Notes that ProblemDefinition is Pydantic-based and discusses validation behavior. |
CHANGELOG.md |
Records the Pydantic refactor and directory_path deprecation note. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| "`directory_path` is deprecated, use `path` instead.", | ||
| DeprecationWarning, | ||
| ) |
There was a problem hiding this comment.
The raise line is indented with an extra space, which violates the repo’s Ruff E111 indentation rule (indentation not a multiple of four). Align the indentation inside this if block.
| @field_validator("task") | ||
| @classmethod | ||
| def validate_task(cls, v: Optional[str]) -> Optional[str]: |
There was a problem hiding this comment.
Same indentation issue here: the raise statement has indentation that isn’t a multiple of four, which will be flagged by Ruff (E111).
| @classmethod | ||
| def validate_task(cls, v: Optional[str]) -> Optional[str]: | ||
| """Validate that the task is among the authorized tasks.""" |
There was a problem hiding this comment.
This error message refers to "authorized tasks" but this validator is for score_function. Update the message to mention authorized score functions (and ideally include AUTHORIZED_SCORE_FUNCTIONS).
| "Arguments `path` and `directory_path` cannot be both set. Use only `path` as `directory_path` is deprecated." | ||
| ) | ||
| else: | ||
| path = directory_path | ||
| logger.warning( | ||
| "DeprecationWarning: 'directory_path' is deprecated, use 'path' instead." | ||
| ) | ||
| warnings.warn( | ||
| "`directory_path` is deprecated, use `path` instead.", | ||
| DeprecationWarning, | ||
| ) | ||
| path = directory_path | ||
|
|
||
| if path is not None: | ||
| path = Path(path) | ||
| self._load_from_dir_(path) | ||
|
|
There was a problem hiding this comment.
New behavior: task and score_function can now be validated via Pydantic on model construction. There are existing tests for set_task/set_score_function, but none covering ProblemDefinition(task=...) / ProblemDefinition(score_function=...) success and failure cases. Add tests to lock in the new validation path.
| - outputs: list of FeatureIdentifiers | ||
| - split: arbitrary named splits (train/val/test, etc.) stored as JSON | ||
|
|
||
| `ProblemDefinition` is built on top of [Pydantic](https://docs.pydantic.dev/), providing robust data validation. Authorized tasks and score functions are strictly enforced. |
There was a problem hiding this comment.
The docs claim tasks/score functions are "strictly enforced", but the current implementation can bypass Pydantic validation when values are assigned after initialization (e.g., when loading from disk) unless assignment validation is enabled. Either adjust the wording or ensure the implementation validates assignments during load.
| `ProblemDefinition` is built on top of [Pydantic](https://docs.pydantic.dev/), providing robust data validation. Authorized tasks and score functions are strictly enforced. | |
| `ProblemDefinition` is built on top of [Pydantic](https://docs.pydantic.dev/), providing robust data validation when defining and updating problems through its public API. Authorized tasks and score functions are validated accordingly. |
| from typing import Any, Dict, List, Optional, Sequence, Union | ||
|
|
||
| import yaml |
There was a problem hiding this comment.
ruff is configured with Pyflakes (F401) and isort (I) (see ruff.toml). This import block is currently unsorted, and PrivateAttr/model_validator appear to be unused in this file; this will fail lint. Reorder the typing/pydantic imports and drop unused names (or start using them).
| class ProblemDefinition(BaseModel): | ||
| """Gathers all necessary informations to define a learning problem.""" | ||
|
|
||
| model_config = ConfigDict(arbitrary_types_allowed=True, extra="ignore") |
There was a problem hiding this comment.
ProblemDefinition is mutating validated fields during _load_from_dir_/_initialize_from_problem_infos_dict via property setters after super().__init__(). With the current model_config, assignments won’t trigger Pydantic validators, so invalid task/score_function values loaded from disk can slip through. Consider enabling validate_assignment=True (or refactor loading to use model_validate) to make validation effective beyond initial construction.
| model_config = ConfigDict(arbitrary_types_allowed=True, extra="ignore") | |
| model_config = ConfigDict( | |
| arbitrary_types_allowed=True, | |
| extra="ignore", | |
| validate_assignment=True, | |
| ) |
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
|
Hi, thank you for your contribution :-) Can you sign the CLA, look into the coverage issue and the suggestions from Copilot ? |
Thanks for contributing! Please make sure your PR title and content follow the guidelines.
Leave this checklist below in your PR description and tick the corresponding boxes.
Checklist