Skip to content

Comments

[DRAFT] Benchmark Platform for Nature Methods Paper#60

Draft
trissim wants to merge 25 commits intomainfrom
benchmark-platform
Draft

[DRAFT] Benchmark Platform for Nature Methods Paper#60
trissim wants to merge 25 commits intomainfrom
benchmark-platform

Conversation

@trissim
Copy link
Collaborator

@trissim trissim commented Dec 19, 2025

Benchmark Platform for Nature Methods Paper

Overview

Complete architectural plans for benchmarking OpenHCS against CellProfiler, ImageJ, and Python scripts for the Nature Methods publication.

Plans Included

✅ plan_01_benchmark_infrastructure.md

Orchestration layer and comparison engine

  • Declarative API (run_benchmark())
  • BenchmarkRunner orchestration
  • FileResultStorage (immutable, append-only)
  • ComparisonEngine with statistical analysis
  • TableGenerator (Nature Methods, LaTeX, Markdown)
  • PlotGenerator (bar charts, line plots, heatmaps)

✅ plan_02_dataset_acquisition.md

Automatic dataset download and validation

  • DatasetRegistry (declarative BBBC datasets)
  • CacheManager (atomic operations, disk space checks)
  • DownloadManager (resume, checksums, progress)
  • ExtractionManager (zip/tar support)
  • VerificationManager (file existence, image validation)
  • AcquisitionOrchestrator (ties everything together)
  • Complete fail-loud error hierarchy

✅ plan_03_tool_adapters.md

Normalize heterogeneous tools to uniform interface

  • ToolAdapter protocol
  • OpenHCSAdapter (native execution)
  • CellProfilerAdapter (subprocess + .cppipe generation)
  • ImageJAdapter (subprocess + macro generation)
  • PythonScriptAdapter (in-process execution)
  • PipelineGenerator (translates configs to tool formats)
  • ResultParser (normalizes tool outputs)

✅ plan_04_metric_collectors.md

Context manager metrics for transparent collection

  • MetricCollector protocol
  • TimeMetric (high-precision timing)
  • MemoryMetric (peak RAM via psutil)
  • GPUMetric (peak GPU memory via pynvml)
  • CorrectnessMetric (ground truth comparison)

✅ plan_05_pipeline_equivalence.md

Equivalent analysis pipelines across all tools

  • PipelineRegistry (abstract pipeline specs)
  • Nuclei segmentation pipeline (Gaussian → Otsu → measure)
  • Translation to OpenHCS, CellProfiler, ImageJ, Python
  • Ensures fair comparison (same algorithm, different tools)

Architecture Highlights

Orthogonal Concerns

Each plan solves ONE problem completely:

  • Dataset acquisition ≠ tool execution ≠ metric collection
  • Can change any without touching others

Declarative API

from benchmark import run_benchmark, BBBCDataset
from benchmark.adapters import OpenHCSAdapter, CellProfilerAdapter
from benchmark.metrics import Time, Memory, GPU

results = run_benchmark(
    datasets=[BBBCDataset.BBBC021, BBBCDataset.BBBC022],
    tools=[OpenHCSAdapter(), CellProfilerAdapter()],
    metrics=[Time(), Memory(), GPU()],
)

results.generate_figure("figure_5_performance.pdf")
results.generate_table(format="nature")

Fail-Loud Philosophy

  • No silent fallbacks
  • Explicit error types for every failure mode
  • Validation at every step

Platform, Not Application

  • Adding new dataset = one dataclass declaration
  • Adding new tool = implement ToolAdapter protocol
  • Adding new metric = implement MetricCollector protocol
  • Adding new pipeline = one pipeline spec

Diagrams Included

All plans include:

  • UML class diagrams - Show relationships and protocols
  • Flow diagrams - Show execution paths and error handling
  • Sequence diagrams - Show interactions between components
  • Architecture diagrams - Show system layers and data flow

Implementation Status

  • Complete architectural plans
  • UML/flow/sequence diagrams
  • Complete implementation code in plans
  • Smell-loop approval
  • Implementation
  • Testing
  • Benchmarking
  • Paper figures

Next Steps

  1. Smell-loop review of all plans
  2. Sequential implementation (plan_01 → plan_02 → ... → plan_05)
  3. Acquire BBBC datasets (BBBC021, BBBC022, BBBC038, BBBC039)
  4. Run benchmarks against CellProfiler, ImageJ, Python
  5. Generate figures for Nature Methods paper
  6. Write results section with quantitative data

Expected Results

Based on OpenHCS architecture:

  • 5-10× faster than CellProfiler (GPU acceleration)
  • 10-100× speedup for re-runs (intelligent caching)
  • Zero dimensional errors (vs common manual pipeline errors)
  • Linear scaling with dataset size

This is a DRAFT PR for planning purposes. Implementation will follow smell-loop approval.


Pull Request opened by Augment Code with guidance from the PR author

- plan_01: Benchmark infrastructure with orchestration, storage, comparison
- plan_02: Dataset acquisition with fail-loud validation and caching
- plan_03: Tool adapters for OpenHCS, CellProfiler, ImageJ, Python
- plan_04: Metric collectors (Time, Memory, GPU, Correctness)
- plan_05: Pipeline equivalence system for fair comparison

All plans include:
- UML class diagrams
- Flow diagrams
- Sequence diagrams
- Complete implementation code
- Integration examples

Ready for implementation following smell-loop approval.
@continue
Copy link

continue bot commented Dec 19, 2025

All Green - Keep your PRs mergeable

Learn more

All Green is an AI agent that automatically:

✅ Addresses code review comments

✅ Fixes failing CI checks

✅ Resolves merge conflicts


Unsubscribe from All Green comments

trissim and others added 24 commits December 19, 2025 19:22
Research findings from publications using BBBC datasets:
- Complete BBBC021/022/038 dataset specifications with real URLs, sizes, formats
- Real CellProfiler pipeline parameters from actual analysis.cppipe files
- Evaluation metrics from NuSeT (2020), Cimini et al. (2023), and other benchmarking papers
- Illumination correction parameters from Singh et al. (2014)
- Ground truth availability and usage strategies
- Preprocessing pipelines and subsetting approaches

Files added:
- plan_02_ADDENDUM_real_dataset_specs.md: Complete BBBC dataset specs, download strategies, validation without checksums
- plan_03_ADDENDUM_real_pipelines.md: Real CellProfiler pipeline from BBBC021 analysis.cppipe with all 27 modules
- plan_04_ADDENDUM_correctness_metrics.md: Pixel-level and object-level evaluation metrics from publications
- RESEARCH_SUMMARY.md: Complete investigation report with all sources cited

All findings sourced from publications, GitHub repos, and BBBC downloads. No handwaving.

Remaining gaps (require downloads to fill):
- BBBC022 filename pattern (need to download 1 plate to reverse-engineer)
- Dataset checksums (not provided by Broad, will compute or skip)
- File manifests (impractical to list 39,600 files, will use count validation)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Implement proper ABC-compliant handlers for BBBC datasets:

BBBC021Handler (ImageXpress format):
- Pattern: {Well}_{Site}_{Channel}{UUID}.tif (e.g., G10_s1_w1BEDC2073...tif)
- Channels: w1=DAPI, w2=Tubulin, w4=Actin
- FilenameParser with regex for Well/Site/Channel extraction
- MetadataHandler for CSV metadata (BBBC021_v1_image.csv)
- No virtual mapping needed (already flat structure)

BBBC038Handler (Kaggle nuclei, PNG format):
- Folder-based organization: stage1_train/{ImageId}/images/{ImageId}.png
- No structured filename pattern (uses ImageId as identifier)
- FilenameParser accepts .png files, extracts ImageId from path
- MetadataHandler for metadata.xlsx and CSV labels
- Handles segmentation masks in separate masks/ folders

Both handlers:
- Implement all abstract methods from MicroscopeHandler ABC
- Define compatible_backends (DISK only)
- Auto-register via _microscope_type class attribute
- Support FileManager abstraction throughout

No handwaving - ready for benchmark platform integration.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Implement ABC-compliant handlers with PASSING TESTS for BBBC datasets:

BBBC021Handler (ImageXpress-like with UUID):
- Parses: G10_s1_w1{UUID}.tif (original files in Week#/Week#_##### subdirectories)
- Constructs: G10_s1_w1_z001_t001.tif (virtual workspace with all components)
- Pattern handles BOTH original (with UUID) and virtual (with z/t) filenames
- Flattens Week#/Week#_##### folder structure to plate root
- Adds default z_index=1, timepoint=1 for pattern discovery consistency
- Channels: w1=DAPI, w2=Tubulin, w4=Actin (w3 not used)

BBBC038Handler (Kaggle nuclei, PNG):
- Parses: {hex_id}.png from stage1_train/{ImageId}/images/ subdirectories
- ImageId treated as unique "well" identifier
- Single channel, single site, no Z or timepoint
- Flattens folder structure to stage1_train/ directory

Both handlers:
- Follow virtual workspace architecture: ALL components in constructed filenames
- Implement all MicroscopeHandler ABC methods
- Auto-register via _microscope_type
- Compatible backends: [DISK]
- Ready for benchmark platform integration

Tests included:
- BBBC021: 6 real filenames from BBBC021_v1_image.csv (ALL PASS)
- BBBC038: 3 hex ID filenames (ALL PASS)
- Roundtrip: parse → construct → parse (ALL PASS)

No handwaving - tested with actual BBBC filenames.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
MICROSCOPE DETECTION & REGISTRATION:
- Add MetadataDetectMixin: reusable detect() implementation delegating to metadata handler
- Add TiffPixelSizeMixin: extract pixel size and channel names from TIFF tags
- BBBC021Handler: implement detect() via filename pattern matching
- BBBC038Handler: implement detect() via stage1_train folder detection
- ImageXpressHandler, OperaPhenixHandler, OpenHCSMicroscopeHandler: use MetadataDetectMixin
- Remove hardcoded handler registration at end of bbbc.py (now automatic via metaclass)

METADATA CACHING:
- Simplify MetadataCache: remove per-file mtime tracking and validation checks
- Cache is now explicit-clear-only (no automatic invalidation)
- Reduces complexity while maintaining correctness for single-plate workflows

REGISTRY DISCOVERY:
- LazyDiscoveryDict: skip cache when secondary registries present
- After discovery, populate secondary registries via _register_secondary hook
- Prevents stale cache from blocking secondary registry population

SIGNAL BATCHING (ImageBrowser performance):
- ColumnFilterWidget.select_all/select_none: always block signals during batch updates
- Emit single filter_changed signal at end instead of N signals
- Fixes signal storm when clicking 'None' button on 96-well filter (96 -> 1 signal)

DEPENDENCIES:
- Add tqdm>=4.66.5 for progress indication

This refactor improves:
- Microscope detection: deterministic, side-effect-free, testable
- Code reuse: mixins eliminate duplication across handlers
- Performance: signal batching prevents UI thrashing
- Maintainability: explicit registration removed, automatic via metaclass
ARCHITECTURE:
- Contracts: ToolAdapter, MetricCollector, DatasetSpec (immutable specs)
- Datasets: Registry of BBBC021, BBBC022, BBBC038 with download/extract/validate
- Pipelines: Registry of benchmark pipelines (nuclei_segmentation)
- Metrics: TimeMetric (perf_counter), MemoryMetric (RSS sampling)
- Adapters: OpenHCSAdapter implementing ToolAdapter contract
- Runner: Orchestrates tool validation, dataset acquisition, execution

DATASET ACQUISITION:
- Download with progress bars (tqdm)
- Extract zip archives atomically
- Validate by image count (±5% tolerance) or manifest
- Cache to ~/.cache/openhcs/benchmark_datasets/{id}/
- Fast path: skip re-download if cached and valid

OPENHCS ADAPTER:
- Validates OpenHCS installation
- Creates FileManager and microscope handler
- Runs minimal segmentation pipeline: blur → threshold → label
- Supports parameter validation (threshold_method, declump_method, diameter_range)
- Collects metrics via context managers
- Returns normalized BenchmarkResult with provenance

METRICS:
- TimeMetric: wall-clock execution time (perf_counter)
- MemoryMetric: peak RSS memory in background thread (psutil)
- Both implement MetricCollector ABC (context manager pattern)

PIPELINES:
- NUCLEI_SEGMENTATION: Otsu threshold + morphological operations
- Parameters: opening_radius, diameter_range, fill_holes
- Extensible: easy to add CELL_PAINTING, etc.

DATASETS:
- BBBC021_SINGLE_PLATE: 720 images, 839 MB
- BBBC022_SINGLE_PLATE_DNA: 3,456 images, 7.8 GB
- BBBC038_FULL: 33,215 images, 382 MB
- All with validation rules and microscope type

This enables:
- Reproducible benchmarking across tools
- Standardized metrics collection
- Dataset caching and validation
- Easy tool adapter implementation
- Extensible pipeline registry
…tion

SUMMARY
=======
Add complete CellProfiler conversion infrastructure for benchmarking OpenHCS
against CellProfiler. Uses a two-phase approach: one-time library absorption
(LLM converts entire CellProfiler library), then instant .cppipe conversion
(registry lookup, no LLM needed at conversion time).

CONVERTER INFRASTRUCTURE (benchmark/converter/)
===============================================
- absorb.py: CLI for one-time library absorption
  python -m benchmark.converter.absorb --model google/gemini-3-flash-preview

- library_absorber.py: Core absorption logic
  - Scans cellprofiler_source/library/modules/_*.py
  - LLM converts each to OpenHCS format
  - Validates: syntax, @numpy decorator, 'image' first param, no relative imports
  - Writes to cellprofiler_library/functions/

- llm_converter.py: Dual-backend LLM converter
  - Ollama (local): model names like 'qwen2.5-coder:7b'
  - OpenRouter (cloud): model names like 'google/gemini-3-flash-preview'
  - Auto-detects backend from model name format (org/model = OpenRouter)

- system_prompt.py: Comprehensive first-principles OpenHCS explanation (~470 lines)
  - Dimensional dataflow architecture
  - ProcessingContract semantics (PURE_2D, PURE_3D, FLEXIBLE, VOLUMETRIC_TO_SLICE)
  - Multi-input operations (stack along dim 0, unstack inside function)
  - special_outputs/special_inputs for labels and measurements
  - Conversion rules and template

- contract_inference.py: Runtime contract inference
- source_locator.py: CellProfiler source code locator
- parser.py: .cppipe file parser
- pipeline_generator.py: Generate OpenHCS pipelines
- settings_binder.py: Bind .cppipe settings to function kwargs
- convert.py: CLI for .cppipe conversion

ABSORBED LIBRARY (benchmark/cellprofiler_library/)
=================================================
26 CellProfiler modules converted to OpenHCS functions:
closing, colortogray, combineobjects, convertimagetoobjects,
convertobjectstoimage, correctilluminationapply, crop, dilateimage,
enhanceedges, enhanceorsuppressfeatures, erodeimage, erodeobjects,
expandorshrinkobjects, fillobjects, gaussianfilter, measureimageoverlap,
measureobjectsizeshape, medialaxis, medianfilter, morphologicalskeleton,
opening, overlayobjects, reducenoise, savecroppedobjects, threshold, watershed

CELLPROFILER SOURCE (benchmark/cellprofiler_source/)
====================================================
Extracted CellProfiler source code for LLM reference:
- modules/: 90 module class files
- library/modules/: 27 pure algorithm implementations
- library/functions/: Core utility functions
- library/opts/: Enums and options

EXAMPLE PIPELINES
=================
- benchmark/cellprofiler_pipelines/: Original .cppipe files + converted
- benchmark/pipelines/: OpenHCS benchmark pipelines (numpy, cupy, gpu variants)
EXPERIMENTAL - may be reverted.

- flash_config.py: Remove max_fps cap (None instead of 60)
- geometry_tracking.py: New orthogonal geometry tracking
  - WidgetSizeMonitor: Detects size changes in watched widgets
  - AutoGeometryTracker: Discovers geometry-affecting widgets
  - FlashGeometryTracker: Queues flashes during layout changes
  - Eliminates timing race conditions by state transitions, not arbitrary delays
CHANGES:
- system_prompt.py: Request structured JSON output with contract, category, confidence, reasoning
- llm_converter.py: Parse JSON response, populate ConversionResult with LLM-inferred metadata
- library_absorber.py: Use LLM-inferred values instead of hardcoded pure_2d/0.5 defaults
- pipeline_generator.py: Map category → variable_components (z_projection→Z_INDEX, channel_operation→CHANNEL)
- Removed LLM fallback mode - purely deterministic conversion from absorbed library
- Deleted broken ExampleHuman_openhcs.py (garbage from early LLM run)

CONTRACTS.JSON NOW INCLUDES:
- contract: PURE_2D | PURE_3D | FLEXIBLE | VOLUMETRIC_TO_SLICE
- category: image_operation | z_projection | channel_operation
- confidence: 0.0-1.0 (LLM's confidence in inference)
- reasoning: Why this contract/category was chosen

PIPELINE GENERATION:
- Fail-loud if modules missing from absorbed library (no fallback)
- variable_components derived from LLM-inferred category
…bsorbed modules

Implemented LLM-powered converter system that transpiles CellProfiler pipelines (.cppipe) into native OpenHCS pipelines. Successfully absorbed all 88 CellProfiler modules using Claude Opus 4.5 and converted both benchmark pipelines (ExampleHuman and ExampleFly) to runnable OpenHCS code.

Three-phase system: (1) Absorption - LLM extracts pure algorithms from CellProfiler source, infers contracts and categories; (2) Parsing - deterministic .cppipe parsing; (3) Generation - maps modules to OpenHCS functions with proper variable_components.

Key features: ROI+CSV materialization for segmentation, infrastructure module handling (LoadData/ExportToSpreadsheet), retry logic, registry system with contracts.json.

Results: 88 absorbed modules (segmentation, measurements, image processing, morphology, projections, transformations), 2 converted pipelines (ExampleHuman 4 modules, ExampleFly 9 modules).

Technical highlights: CamelCase registry fix, dual-axis resolution integration, special I/O handling, fail-loud error handling.
- Fixed parameter name normalization to exactly match SettingsBinder logic
  - Remove parenthetical content before normalization (e.g., '(Min,Max)')
  - This fixes mapping of tuple parameters like 'Typical diameter (Min,Max)' -> [min_diameter, max_diameter]

- Fixed FunctionStep API usage to use tuple pattern: func=(function, {kwargs})
  - Previously was incorrectly passing kwargs directly to FunctionStep
  - Now correctly passes kwargs dict as second element of tuple

- Backfilled parameter mappings for 83/88 absorbed CellProfiler functions
  - Used Gemini Flash 3.0 to generate mappings from original source + absorbed function
  - Mappings stored in function docstrings as single source of truth
  - Added backfill_parameter_mappings.py script

- Generated pipelines now have proper kwargs instead of comments
  - ExampleFly: min_diameter=10, max_diameter=40 correctly mapped from tuple
  - ExampleHuman: min_diameter=8, max_diameter=80 correctly mapped from tuple
  - All other parameters properly translated using docstring mappings
…semantics

Used LLM (Gemini 3.0 Flash Preview) to analyze all 88 absorbed functions and determine
correct categories based on input shape expectations and iteration semantics.

Changes:
- Created recategorize_functions.py script for LLM-based recategorization
- Updated contracts.json with 7 category changes (81 unchanged)

Category changes:
  z_projection (3 functions):
    - MakeProjection: Processes z-stacks (D, H, W) → (H, W) projections
    - Morphologicalskeleton: Has volumetric parameter for 3D processing
    - TrackObjects: Processes temporal sequences (frames over time)

  channel_operation (4 functions):
    - CorrectIlluminationCalculate: Per-channel illumination correction
    - IdentifyPrimaryObjects: Segment same marker across all sites per channel
    - RescaleIntensity: Per-channel intensity normalization
    - Tile: Assembles sites into montage per channel

Impact:
- IdentifyPrimaryObjects now uses VariableComponents.CHANNEL instead of SITE
- MakeProjection now uses VariableComponents.Z_INDEX instead of SITE
- Generated pipelines have semantically correct iteration order
- Functions receive correct input shapes based on their processing semantics

All changes verified against OpenHCS PURE_2D contract behavior:
- PURE_2D unstacks dim 0 and calls function on each (H, W) slice
- variable_components controls what dim 0 represents (sites, channels, or z-slices)
- Total function calls remain the same, only iteration order changes
…ents semantics

Updated LLM recategorization prompt with correct dimensional dataflow semantics:
- image_operation (SITE): Single-channel operations across all sites (default)
- z_projection (Z_INDEX): Functions that NEED z-stacks (projections, 3D ops)
- channel_operation (CHANNEL): Functions that NEED multiple channels simultaneously

Results:
- channel_operation (4): ColorToGray, GrayToColorRgb, MeasureColocalization, UnmixColors
- z_projection (2): MakeProjection, Morphologicalskeleton
- image_operation (82): Everything else (single-channel operations)

Fixed incorrect categorizations from previous run:
- IdentifyPrimaryObjects: channel_operation → image_operation ✓
- CorrectIlluminationCalculate: channel_operation → image_operation ✓
- RescaleIntensity: channel_operation → image_operation ✓
- Tile: channel_operation → image_operation ✓
- TrackObjects: z_projection → image_operation ✓ (time-lapse uses sequential_components)

Added correct categorizations:
- ColorToGray: image_operation → channel_operation ✓
- MeasureColocalization: image_operation → channel_operation ✓
- UnmixColors: image_operation → channel_operation ✓
- GrayToColorRgb: image_operation → channel_operation ✓ (manual fix)

Regenerated pipelines with correct variable_components.
UnmixColors has PURE_2D contract, which means it receives (H, W) and processes
each site independently. PURE_2D with channel_operation would unstack dimension 0
and process each channel independently, which defeats the purpose.

Dimensional dataflow rule:
- PURE_2D contract → ALWAYS image_operation (processes each site independently)
- FLEXIBLE/PURE_3D contract → can be channel_operation or z_projection (processes dim 0 together)

Final categorizations:
- channel_operation (3): ColorToGray, GrayToColorRgb, MeasureColocalization
  All have FLEXIBLE contract and process multiple channels together
- z_projection (2): MakeProjection, Morphologicalskeleton
  Process z-stacks (volumetric data)
- image_operation (83): Everything else, including all PURE_2D functions
Key changes:
1. measure_colocalization: Added channel_1/channel_2 params for arbitrary N-channel input
2. gray_to_color_rgb: Added red/green/blue_channel params for arbitrary N-channel input
3. gray_to_color_cmyk: Added channel selection params for arbitrary N-channel input
4. Fixed @numpy decorator: Removed invalid contract=ProcessingContract.X usage
5. Removed unused ProcessingContract imports from all 88 functions
6. Rewrote __init__.py with dynamic function loading from contracts.json
7. Regenerated pipelines with correct variable_components

The dimensional dataflow compiler perspective:
- Dimension 0 can be of ARBITRARY size (1, 2, 3, 4, 5, ... N)
- Functions should parameterize channel selection, not hardcode indices
- ProcessingContract is orthogonal to variable_components
1. Removed unused ProcessingContract import from header template
2. Removed duplicate imports in header template
3. Changed to dynamic function loading with get_function()
4. Fixed measurecolocalization parameter mapping:
   - 'Select images to measure' -> (pipeline-handled) (requires pipeline context)
   - 'Run all metrics?' -> (pipeline-handled) (multi-param not auto-mappable)
5. Regenerated ExampleFly and ExampleHuman pipelines with clean parameters
1. Removed duplicate parameter mapping from _outline helper function
2. Added correct mapping to identify_tertiary_objects docstring
3. Object selection settings ('Select the larger/smaller identified objects')
   are now (pipeline-handled) since they're @special_inputs
4. Only shrink_primary is an actual function parameter

In OpenHCS, @special_inputs are wired at compile time by name matching,
not passed as string parameters. CellProfiler's object naming convention
doesn't map directly to function kwargs.
- Categorized all 88 absorbed functions into FLEXIBLE vs PURE_3D contracts
- Identified critical architectural issues:
  * Contract mismatch (PURE_2D vs PURE_3D)
  * Tuple handling bug in _execute_pure_2d
  * Inconsistent special outputs format
- Created phased refactoring plan with timeline and risk mitigation
- Documented 14 FLEXIBLE functions (support true 3D + slice-by-slice)
- Documented 74 PURE_3D functions (always internal slicing)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Comprehensive design document covering:
- Architecture comparison (CellProfiler vs OpenHCS)
- Identified abstraction leaks (A1-A3, B1-B4, C1-C4)
- What we're certain about (contract system, aggregation orthogonality)
- Design proposal: AggregationSpec and compile-time symbol resolution
- Implementation phases
- Open questions for further discussion
Detailed mapping of:
- Core concept mapping (pipeline, data containers, object model)
- Semantic gaps requiring new concepts (ObjectRegistry, etc.)
- Adapter layer design for CellProfiler modules
- ProcessingContract mapping
- Measurement naming conventions
- Settings system mapping
- Abstraction leak analysis
…sign doc

Includes:
- Essential files to read (OpenHCS core + CellProfiler integration)
- Detailed execution flow diagram
- ProcessingContract implementation with code snippets
- Special outputs system explanation
- CellProfiler workspace structure
- Absorbed function patterns (current buggy vs required)
- Key terms glossary
- Quick reference: what to read when
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant