feat: Support for Multiple Calibrations in VA-Spec Model Exports#658
Open
bencap wants to merge 4 commits intorelease-2026.1.1from
Open
feat: Support for Multiple Calibrations in VA-Spec Model Exports#658bencap wants to merge 4 commits intorelease-2026.1.1from
bencap wants to merge 4 commits intorelease-2026.1.1from
Conversation
…ve test infrastructure This commit introduces major enhancements to the annotation system, including support for multiple score calibrations, improved test infrastructure, and alignment with VA-Spec standards. - Refactor annotation system to support multiple score calibrations per score set - Add calibration selection logic based on evidence strength and classification conflicts - Implement `select_strongest_functional_calibration()` and `select_strongest_pathogenicity_calibration()` - Update classification functions to accept explicit score_calibration parameter - Add `score_calibration_may_be_used_for_annotation()` utility for eligibility checks - Support both research-use-only and production calibrations with opt-in flag - Add `src/mavedb/lib/annotation/direction.py` for evidence direction determination - Implement `aggregate_direction_of_evidence()` for combining evidence lines - Add `direction_of_support_for_functional_classification()` mapping - Add `direction_of_support_for_pathogenicity_classification()` mapping - Create `tests/helpers/mocks/` package with comprehensive factory functions - Add `mock_utilities.py` with MockObjectWithPydanticFunctionality and MockVariantCollection - Add `factories.py` with 20+ factory functions for all MaveDB models - Add documentation in `tests/helpers/mocks/README.md` for usage patterns - Update statement/evidence line generation to use all eligible calibrations - Refactor contribution modules to remove `excalibr_calibration_agent()` - Add score calibration contributions with URN and metadata - Update datetime handling to use native datetime objects instead of strings - Add SPDX license support with `score_set_license_to_mappable_concept()` - Implement MODERATE_PLUS to MODERATE mapping for VA-Spec compatibility - Add `serialize_evidence_items()` for consistent evidence serialization - Add `sequence_feature_for_mapped_variant()` for gene/transcript extraction - Add `target_for_variant()` for multi-target score set handling - Add `SequenceFeature` named tuple for structured feature representation - Rename `/functional-impact` → `/functional-statement` - Rename `/clinical-evidence` → `/pathogenicity-statement` - Rename `/functional-study-result` → `/study-result` - Update response models from EvidenceLine to VariantPathogenicityStatement - Convert `ScoreCalibrationRelation` to str-based enum for JSON serialization - Update classification functions to return tuple with range and classification - Add gene context qualifier to pathogenicity propositions - Refactor all annotation tests with class-based structure using @pytest.mark.unit - Add comprehensive module docstrings explaining test purpose and scope - Add descriptive docstrings for all test methods - Organize tests into logical groups (Unit/Integration) - Add tests for direction.py: aggregation, functional, and pathogenicity mappings - Add tests for constants.py: GENERIC_DISEASE_MEDGEN_CODE and MEDGEN_SYSTEM - Add tests for exceptions.py: MappingDataDoesntExistException - Add tests for contribution.py: creator/modifier tests with dates and resource types - Add tests for classification.py: MODERATE_PLUS mapping validation - Update conftest.py with annotation-specific fixtures - Update `tests/lib/conftest.py` to use new factory functions - Add pytest.mark.integration marker to pyproject.toml - Create `tests/lib/annotation/conftest.py` with annotation fixtures - Update `mypy_stubs/ga4gh/va_spec/acmg_2015.pyi` with proper inheritance - Add VariantPathogenicityStatement and AcmgClassification classes - Fix EvidenceLine inheritance hierarchy - Add comprehensive inline documentation for MODERATE_PLUS mapping rationale - Add performance TODOs for ORM relationship optimization - Document calibration selection logic and conflict resolution strategies - Add usage examples in mock infrastructure README - Classification functions now require explicit score_calibration parameter - Contribution functions now use native datetime objects instead of formatted strings - API routes renamed to align with VA-Spec terminology - Add TODO comments for ORM optimization in classification.py - Document need to avoid eager loading of variant relationships - Suggest pre-resolving classification IDs for O(1) lookups
be01fd8 to
67e3aed
Compare
957d5cd to
6c94041
Compare
Collaborator
Author
|
Bumped Pandas to v2.2+ in these changes to avoid build failures from the removal of pkg_resources from setuptools. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request introduces significant enhancements and refactoring to the variant annotation logic, especially around the handling of functional and pathogenicity statements, calibration selection, and evidence strength mapping. The changes improve support for multiple calibrations, provide more accurate and VA-Spec-compliant evidence strength reporting, and streamline the annotation flow for both functional and pathogenicity contexts.
Key improvements include:
Annotation API and Logic Refactoring:
variant_functional_impact_statementand replacedvariant_pathogenicity_evidencewithvariant_pathogenicity_statement, now supporting multiple calibrations and returning richer, VA-Spec-compliant objects. Calibration selection logic now prioritizes the strongest available calibration, with optional inclusion of research-use-only calibrations. ([src/mavedb/lib/annotation/annotate.pyL34-R134])Evidence Strength and Classification Handling:
ScoreCalibrationand return both the matched functional classification and the VA-Spec classification, enabling precise mapping between internal and external evidence strength levels. [1]], [2]])MODERATE_PLUSevidence strength to the VA-SpecMODERATElevel, ensuring external compatibility while preserving internal granularity. ([src/mavedb/lib/annotation/classification.pyL90-R129])Type and Model Updates:
VariantPathogenicityStatementandAcmgClassification, and refactoredVariantPathogenicityEvidenceLineto inherit fromEvidenceLinefor improved type consistency. [1]], [2]])Validation and Error Handling:
mavedb_vrs_agentis notNone, improving robustness. ([src/mavedb/lib/annotation/agent.pyR35-R37])excalibr_calibration_agentfunction to clean up the codebase. ([src/mavedb/lib/annotation/agent.pyL58-L71])Testing and Documentation:
These changes collectively modernize the annotation pipeline, improve standards compliance, and set the stage for more flexible and accurate variant annotation workflows.