Skip to content

Performance: Move Calibration Variants to Separate Endpoints #659

@bencap

Description

@bencap

Problem

Score set API requests (GET /api/v1/score-sets/{urn}) are experiencing performance degradation due to the categorical calibration feature. The issue:

  • Nested data loading: Score sets include score_calibrationsfunctional_classificationsvariants
  • Large variant lists: Each functional classification can contain thousands of variants
  • Unnecessary overhead: Most API consumers only need calibration metadata, not full variant lists
  • Serialization cost: Pydantic serializes all variants to JSON, creating massive response payloads (5+ second response times)

Current structure:

ScoreSet
└── score_calibrations []
    └── functional_classifications []
        └── variants [] ← THOUSANDS OF VARIANTS (performance bottleneck)

Proposed Solution

Move variants to dedicated endpoints while keeping calibration metadata in score set responses:

Benefits:

  • 5-10x+ performance improvement for score set endpoints (5000ms → 500ms)
  • Clean separation of concerns (metadata vs. variant data)
  • Scalable architecture (can add pagination/filtering later)
  • Clients fetch only what they need

⚠️ Trade-offs:

  • Breaking change for clients relying on variants in calibrations
  • Requires 2+ API calls for clients needing both metadata and variants
  • Client migration effort required

Implementation Tasks

Phase 1: View Models

  • Remove variants field from SavedFunctionalClassification and FunctionalClassification view models
    • File: src/mavedb/view_models/score_calibration.py (lines 247, 262)
  • Add variant_count: int field to provide summary information
  • Create new view model FunctionalClassificationVariants for variant endpoint responses

Phase 2: Business Logic

  • Create count_variants_for_functional_classification() function
    • File: src/mavedb/lib/score_calibrations.py (add after line 754)
    • Uses efficient SQL COUNT query instead of loading all variants
  • Modify create_functional_classification() to compute counts instead of loading variants (lines 30-94)
  • Keep variants_for_functional_classification() unchanged (will be used by new endpoints)

Phase 3: New API Endpoints

  • Add endpoint: GET /api/v1/score-calibrations/{urn}/functional-classifications/{classification_id}/variants
    • Returns variants for a specific functional classification
    • File: src/mavedb/routers/score_calibrations.py
  • Add endpoint: GET /api/v1/score-calibrations/{urn}/variants
    • Convenience endpoint returning all variants across all classifications
    • File: src/mavedb/routers/score_calibrations.py
  • Implement permission checks for both new endpoints
  • Handle range-based and class-based classifications correctly

Phase 4: Testing

  • Update existing tests that expect variants in calibration responses
    • File: tests/routers/test_score_calibrations.py
    • File: tests/routers/test_score_sets.py
  • Add new tests for variant endpoints:
    • test_get_functional_classification_variants()
    • test_get_calibration_all_variants()
    • Permission tests for new endpoints
  • Verify all existing tests pass

Phase 5: Documentation

  • Update API documentation (auto-generated from FastAPI docstrings)
  • Create migration guide in CHANGELOG or docs:
    • Explain what changed and why
    • Provide before/after code examples
    • Document new variant_count field
    • Show how to use new endpoints

Phase 6: Optional Database Optimization

  • (Optional) Add variant_count column to score_calibration_functional_classifications table for caching
    • File: src/mavedb/models/score_calibration_functional_classification.py
    • Create Alembic migration if implemented

Acceptance Criteria

  • ✅ Score set API responses are 5-10x faster for score sets with large calibrations
  • ✅ Functional classifications include variant_count field
  • ✅ Functional classifications do NOT include variants field in default responses
  • ✅ New endpoints successfully return variant data with correct permissions
  • ✅ All tests pass (after updates)
  • ✅ New endpoint tests added and passing
  • ✅ No regressions in other API functionality
  • ✅ Migration guide published

Verification Steps

Manual Testing:

# 1. Verify score set response is fast and excludes variants
curl http://localhost:8000/api/v1/score-sets/{urn}
# Check: < 500ms response time, variant_count present, variants absent

# 2. Verify new variant endpoints work
curl http://localhost:8000/api/v1/score-calibrations/{urn}/functional-classifications/{id}/variants
# Check: variants returned correctly

# 3. Performance test
time curl http://localhost:8000/api/v1/score-sets/{urn_with_large_calibration}
# Expected: 5-10x improvement

Automated Testing:

poetry run pytest tests/routers/test_score_calibrations.py -v
poetry run pytest tests/routers/test_score_sets.py -v
poetry run pytest tests/ -v  # Full test suite

Files to Modify

File Changes
src/mavedb/view_models/score_calibration.py Remove variants, add variant_count, create FunctionalClassificationVariants
src/mavedb/lib/score_calibrations.py Add count function, modify create_functional_classification()
src/mavedb/routers/score_calibrations.py Add 2 new variant endpoints
tests/routers/test_score_calibrations.py Update tests, add new endpoint tests
tests/routers/test_score_sets.py Update calibration structure tests

Migration Example

Before (no longer works after this change):

score_set = client.get(f"/score-sets/{urn}")
variants = score_set["score_calibrations"][0]["functional_classifications"][0]["variants"]

After:

# Get score set metadata (fast)
score_set = client.get(f"/score-sets/{urn}")
calibration_urn = score_set["score_calibrations"][0]["urn"]
classification_id = score_set["score_calibrations"][0]["functional_classifications"][0]["id"]
variant_count = score_set["score_calibrations"][0]["functional_classifications"][0]["variant_count"]

# Fetch variants only when needed
if variant_count > 0:
    variants_response = client.get(
        f"/score-calibrations/{calibration_urn}/functional-classifications/{classification_id}/variants"
    )
    variants = variants_response["variants"]

Metadata

Metadata

Assignees

No one assigned

    Labels

    app: backendTask implementation touches the backendtype: enhancementEnhancement to an existing featuretype: maintenanceMaintaining this project

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions