-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Labels
app: backendTask implementation touches the backendTask implementation touches the backendtype: enhancementEnhancement to an existing featureEnhancement to an existing featuretype: maintenanceMaintaining this projectMaintaining this project
Description
Problem
Score set API requests (GET /api/v1/score-sets/{urn}) are experiencing performance degradation due to the categorical calibration feature. The issue:
- Nested data loading: Score sets include
score_calibrations→functional_classifications→variants - Large variant lists: Each functional classification can contain thousands of variants
- Unnecessary overhead: Most API consumers only need calibration metadata, not full variant lists
- Serialization cost: Pydantic serializes all variants to JSON, creating massive response payloads (5+ second response times)
Current structure:
ScoreSet
└── score_calibrations []
└── functional_classifications []
└── variants [] ← THOUSANDS OF VARIANTS (performance bottleneck)
Proposed Solution
Move variants to dedicated endpoints while keeping calibration metadata in score set responses:
✅ Benefits:
- 5-10x+ performance improvement for score set endpoints (5000ms → 500ms)
- Clean separation of concerns (metadata vs. variant data)
- Scalable architecture (can add pagination/filtering later)
- Clients fetch only what they need
- Breaking change for clients relying on variants in calibrations
- Requires 2+ API calls for clients needing both metadata and variants
- Client migration effort required
Implementation Tasks
Phase 1: View Models
- Remove
variantsfield fromSavedFunctionalClassificationandFunctionalClassificationview models- File:
src/mavedb/view_models/score_calibration.py(lines 247, 262)
- File:
- Add
variant_count: intfield to provide summary information - Create new view model
FunctionalClassificationVariantsfor variant endpoint responses
Phase 2: Business Logic
- Create
count_variants_for_functional_classification()function- File:
src/mavedb/lib/score_calibrations.py(add after line 754) - Uses efficient SQL COUNT query instead of loading all variants
- File:
- Modify
create_functional_classification()to compute counts instead of loading variants (lines 30-94) - Keep
variants_for_functional_classification()unchanged (will be used by new endpoints)
Phase 3: New API Endpoints
- Add endpoint:
GET /api/v1/score-calibrations/{urn}/functional-classifications/{classification_id}/variants- Returns variants for a specific functional classification
- File:
src/mavedb/routers/score_calibrations.py
- Add endpoint:
GET /api/v1/score-calibrations/{urn}/variants- Convenience endpoint returning all variants across all classifications
- File:
src/mavedb/routers/score_calibrations.py
- Implement permission checks for both new endpoints
- Handle range-based and class-based classifications correctly
Phase 4: Testing
- Update existing tests that expect
variantsin calibration responses- File:
tests/routers/test_score_calibrations.py - File:
tests/routers/test_score_sets.py
- File:
- Add new tests for variant endpoints:
test_get_functional_classification_variants()test_get_calibration_all_variants()- Permission tests for new endpoints
- Verify all existing tests pass
Phase 5: Documentation
- Update API documentation (auto-generated from FastAPI docstrings)
- Create migration guide in CHANGELOG or docs:
- Explain what changed and why
- Provide before/after code examples
- Document new
variant_countfield - Show how to use new endpoints
Phase 6: Optional Database Optimization
- (Optional) Add
variant_countcolumn toscore_calibration_functional_classificationstable for caching- File:
src/mavedb/models/score_calibration_functional_classification.py - Create Alembic migration if implemented
- File:
Acceptance Criteria
- ✅ Score set API responses are 5-10x faster for score sets with large calibrations
- ✅ Functional classifications include
variant_countfield - ✅ Functional classifications do NOT include
variantsfield in default responses - ✅ New endpoints successfully return variant data with correct permissions
- ✅ All tests pass (after updates)
- ✅ New endpoint tests added and passing
- ✅ No regressions in other API functionality
- ✅ Migration guide published
Verification Steps
Manual Testing:
# 1. Verify score set response is fast and excludes variants
curl http://localhost:8000/api/v1/score-sets/{urn}
# Check: < 500ms response time, variant_count present, variants absent
# 2. Verify new variant endpoints work
curl http://localhost:8000/api/v1/score-calibrations/{urn}/functional-classifications/{id}/variants
# Check: variants returned correctly
# 3. Performance test
time curl http://localhost:8000/api/v1/score-sets/{urn_with_large_calibration}
# Expected: 5-10x improvementAutomated Testing:
poetry run pytest tests/routers/test_score_calibrations.py -v
poetry run pytest tests/routers/test_score_sets.py -v
poetry run pytest tests/ -v # Full test suiteFiles to Modify
| File | Changes |
|---|---|
src/mavedb/view_models/score_calibration.py |
Remove variants, add variant_count, create FunctionalClassificationVariants |
src/mavedb/lib/score_calibrations.py |
Add count function, modify create_functional_classification() |
src/mavedb/routers/score_calibrations.py |
Add 2 new variant endpoints |
tests/routers/test_score_calibrations.py |
Update tests, add new endpoint tests |
tests/routers/test_score_sets.py |
Update calibration structure tests |
Migration Example
Before (no longer works after this change):
score_set = client.get(f"/score-sets/{urn}")
variants = score_set["score_calibrations"][0]["functional_classifications"][0]["variants"]After:
# Get score set metadata (fast)
score_set = client.get(f"/score-sets/{urn}")
calibration_urn = score_set["score_calibrations"][0]["urn"]
classification_id = score_set["score_calibrations"][0]["functional_classifications"][0]["id"]
variant_count = score_set["score_calibrations"][0]["functional_classifications"][0]["variant_count"]
# Fetch variants only when needed
if variant_count > 0:
variants_response = client.get(
f"/score-calibrations/{calibration_urn}/functional-classifications/{classification_id}/variants"
)
variants = variants_response["variants"]Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
app: backendTask implementation touches the backendTask implementation touches the backendtype: enhancementEnhancement to an existing featureEnhancement to an existing featuretype: maintenanceMaintaining this projectMaintaining this project