[DRAFT] Benchmark Platform for Nature Methods Paper by trissim · Pull Request #60 · OpenHCSDev/openhcs

trissim · 2025-12-19T21:36:21Z

Benchmark Platform for Nature Methods Paper

Overview

Complete architectural plans for benchmarking OpenHCS against CellProfiler, ImageJ, and Python scripts for the Nature Methods publication.

Plans Included

✅ plan_01_benchmark_infrastructure.md

Orchestration layer and comparison engine

Declarative API (run_benchmark())
BenchmarkRunner orchestration
FileResultStorage (immutable, append-only)
ComparisonEngine with statistical analysis
TableGenerator (Nature Methods, LaTeX, Markdown)
PlotGenerator (bar charts, line plots, heatmaps)

✅ plan_02_dataset_acquisition.md

Automatic dataset download and validation

DatasetRegistry (declarative BBBC datasets)
CacheManager (atomic operations, disk space checks)
DownloadManager (resume, checksums, progress)
ExtractionManager (zip/tar support)
VerificationManager (file existence, image validation)
AcquisitionOrchestrator (ties everything together)
Complete fail-loud error hierarchy

✅ plan_03_tool_adapters.md

Normalize heterogeneous tools to uniform interface

ToolAdapter protocol
OpenHCSAdapter (native execution)
CellProfilerAdapter (subprocess + .cppipe generation)
ImageJAdapter (subprocess + macro generation)
PythonScriptAdapter (in-process execution)
PipelineGenerator (translates configs to tool formats)
ResultParser (normalizes tool outputs)

✅ plan_04_metric_collectors.md

Context manager metrics for transparent collection

MetricCollector protocol
TimeMetric (high-precision timing)
MemoryMetric (peak RAM via psutil)
GPUMetric (peak GPU memory via pynvml)
CorrectnessMetric (ground truth comparison)

✅ plan_05_pipeline_equivalence.md

Equivalent analysis pipelines across all tools

PipelineRegistry (abstract pipeline specs)
Nuclei segmentation pipeline (Gaussian → Otsu → measure)
Translation to OpenHCS, CellProfiler, ImageJ, Python
Ensures fair comparison (same algorithm, different tools)

Architecture Highlights

Orthogonal Concerns

Each plan solves ONE problem completely:

Dataset acquisition ≠ tool execution ≠ metric collection
Can change any without touching others

Declarative API

from benchmark import run_benchmark, BBBCDataset
from benchmark.adapters import OpenHCSAdapter, CellProfilerAdapter
from benchmark.metrics import Time, Memory, GPU

results = run_benchmark(
    datasets=[BBBCDataset.BBBC021, BBBCDataset.BBBC022],
    tools=[OpenHCSAdapter(), CellProfilerAdapter()],
    metrics=[Time(), Memory(), GPU()],
)

results.generate_figure("figure_5_performance.pdf")
results.generate_table(format="nature")

Fail-Loud Philosophy

No silent fallbacks
Explicit error types for every failure mode
Validation at every step

Platform, Not Application

Adding new dataset = one dataclass declaration
Adding new tool = implement ToolAdapter protocol
Adding new metric = implement MetricCollector protocol
Adding new pipeline = one pipeline spec

Diagrams Included

All plans include:

UML class diagrams - Show relationships and protocols
Flow diagrams - Show execution paths and error handling
Sequence diagrams - Show interactions between components
Architecture diagrams - Show system layers and data flow

Implementation Status

Next Steps

Smell-loop review of all plans
Sequential implementation (plan_01 → plan_02 → ... → plan_05)
Acquire BBBC datasets (BBBC021, BBBC022, BBBC038, BBBC039)
Run benchmarks against CellProfiler, ImageJ, Python
Generate figures for Nature Methods paper
Write results section with quantitative data

Expected Results

Based on OpenHCS architecture:

5-10× faster than CellProfiler (GPU acceleration)
10-100× speedup for re-runs (intelligent caching)
Zero dimensional errors (vs common manual pipeline errors)
Linear scaling with dataset size

This is a DRAFT PR for planning purposes. Implementation will follow smell-loop approval.

Pull Request opened by Augment Code with guidance from the PR author

- plan_01: Benchmark infrastructure with orchestration, storage, comparison - plan_02: Dataset acquisition with fail-loud validation and caching - plan_03: Tool adapters for OpenHCS, CellProfiler, ImageJ, Python - plan_04: Metric collectors (Time, Memory, GPU, Correctness) - plan_05: Pipeline equivalence system for fair comparison All plans include: - UML class diagrams - Flow diagrams - Sequence diagrams - Complete implementation code - Integration examples Ready for implementation following smell-loop approval.

continue · 2025-12-19T21:36:24Z

Learn more

All Green is an AI agent that automatically:

✅ Addresses code review comments

✅ Fixes failing CI checks

✅ Resolves merge conflicts

Unsubscribe from All Green comments

Research findings from publications using BBBC datasets: - Complete BBBC021/022/038 dataset specifications with real URLs, sizes, formats - Real CellProfiler pipeline parameters from actual analysis.cppipe files - Evaluation metrics from NuSeT (2020), Cimini et al. (2023), and other benchmarking papers - Illumination correction parameters from Singh et al. (2014) - Ground truth availability and usage strategies - Preprocessing pipelines and subsetting approaches Files added: - plan_02_ADDENDUM_real_dataset_specs.md: Complete BBBC dataset specs, download strategies, validation without checksums - plan_03_ADDENDUM_real_pipelines.md: Real CellProfiler pipeline from BBBC021 analysis.cppipe with all 27 modules - plan_04_ADDENDUM_correctness_metrics.md: Pixel-level and object-level evaluation metrics from publications - RESEARCH_SUMMARY.md: Complete investigation report with all sources cited All findings sourced from publications, GitHub repos, and BBBC downloads. No handwaving. Remaining gaps (require downloads to fill): - BBBC022 filename pattern (need to download 1 plate to reverse-engineer) - Dataset checksums (not provided by Broad, will compute or skip) - File manifests (impractical to list 39,600 files, will use count validation) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Implement proper ABC-compliant handlers for BBBC datasets: BBBC021Handler (ImageXpress format): - Pattern: {Well}_{Site}_{Channel}{UUID}.tif (e.g., G10_s1_w1BEDC2073...tif) - Channels: w1=DAPI, w2=Tubulin, w4=Actin - FilenameParser with regex for Well/Site/Channel extraction - MetadataHandler for CSV metadata (BBBC021_v1_image.csv) - No virtual mapping needed (already flat structure) BBBC038Handler (Kaggle nuclei, PNG format): - Folder-based organization: stage1_train/{ImageId}/images/{ImageId}.png - No structured filename pattern (uses ImageId as identifier) - FilenameParser accepts .png files, extracts ImageId from path - MetadataHandler for metadata.xlsx and CSV labels - Handles segmentation masks in separate masks/ folders Both handlers: - Implement all abstract methods from MicroscopeHandler ABC - Define compatible_backends (DISK only) - Auto-register via _microscope_type class attribute - Support FileManager abstraction throughout No handwaving - ready for benchmark platform integration. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Implement ABC-compliant handlers with PASSING TESTS for BBBC datasets: BBBC021Handler (ImageXpress-like with UUID): - Parses: G10_s1_w1{UUID}.tif (original files in Week#/Week#_##### subdirectories) - Constructs: G10_s1_w1_z001_t001.tif (virtual workspace with all components) - Pattern handles BOTH original (with UUID) and virtual (with z/t) filenames - Flattens Week#/Week#_##### folder structure to plate root - Adds default z_index=1, timepoint=1 for pattern discovery consistency - Channels: w1=DAPI, w2=Tubulin, w4=Actin (w3 not used) BBBC038Handler (Kaggle nuclei, PNG): - Parses: {hex_id}.png from stage1_train/{ImageId}/images/ subdirectories - ImageId treated as unique "well" identifier - Single channel, single site, no Z or timepoint - Flattens folder structure to stage1_train/ directory Both handlers: - Follow virtual workspace architecture: ALL components in constructed filenames - Implement all MicroscopeHandler ABC methods - Auto-register via _microscope_type - Compatible backends: [DISK] - Ready for benchmark platform integration Tests included: - BBBC021: 6 real filenames from BBBC021_v1_image.csv (ALL PASS) - BBBC038: 3 hex ID filenames (ALL PASS) - Roundtrip: parse → construct → parse (ALL PASS) No handwaving - tested with actual BBBC filenames. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

MICROSCOPE DETECTION & REGISTRATION: - Add MetadataDetectMixin: reusable detect() implementation delegating to metadata handler - Add TiffPixelSizeMixin: extract pixel size and channel names from TIFF tags - BBBC021Handler: implement detect() via filename pattern matching - BBBC038Handler: implement detect() via stage1_train folder detection - ImageXpressHandler, OperaPhenixHandler, OpenHCSMicroscopeHandler: use MetadataDetectMixin - Remove hardcoded handler registration at end of bbbc.py (now automatic via metaclass) METADATA CACHING: - Simplify MetadataCache: remove per-file mtime tracking and validation checks - Cache is now explicit-clear-only (no automatic invalidation) - Reduces complexity while maintaining correctness for single-plate workflows REGISTRY DISCOVERY: - LazyDiscoveryDict: skip cache when secondary registries present - After discovery, populate secondary registries via _register_secondary hook - Prevents stale cache from blocking secondary registry population SIGNAL BATCHING (ImageBrowser performance): - ColumnFilterWidget.select_all/select_none: always block signals during batch updates - Emit single filter_changed signal at end instead of N signals - Fixes signal storm when clicking 'None' button on 96-well filter (96 -> 1 signal) DEPENDENCIES: - Add tqdm>=4.66.5 for progress indication This refactor improves: - Microscope detection: deterministic, side-effect-free, testable - Code reuse: mixins eliminate duplication across handlers - Performance: signal batching prevents UI thrashing - Maintainability: explicit registration removed, automatic via metaclass

ARCHITECTURE: - Contracts: ToolAdapter, MetricCollector, DatasetSpec (immutable specs) - Datasets: Registry of BBBC021, BBBC022, BBBC038 with download/extract/validate - Pipelines: Registry of benchmark pipelines (nuclei_segmentation) - Metrics: TimeMetric (perf_counter), MemoryMetric (RSS sampling) - Adapters: OpenHCSAdapter implementing ToolAdapter contract - Runner: Orchestrates tool validation, dataset acquisition, execution DATASET ACQUISITION: - Download with progress bars (tqdm) - Extract zip archives atomically - Validate by image count (±5% tolerance) or manifest - Cache to ~/.cache/openhcs/benchmark_datasets/{id}/ - Fast path: skip re-download if cached and valid OPENHCS ADAPTER: - Validates OpenHCS installation - Creates FileManager and microscope handler - Runs minimal segmentation pipeline: blur → threshold → label - Supports parameter validation (threshold_method, declump_method, diameter_range) - Collects metrics via context managers - Returns normalized BenchmarkResult with provenance METRICS: - TimeMetric: wall-clock execution time (perf_counter) - MemoryMetric: peak RSS memory in background thread (psutil) - Both implement MetricCollector ABC (context manager pattern) PIPELINES: - NUCLEI_SEGMENTATION: Otsu threshold + morphological operations - Parameters: opening_radius, diameter_range, fill_holes - Extensible: easy to add CELL_PAINTING, etc. DATASETS: - BBBC021_SINGLE_PLATE: 720 images, 839 MB - BBBC022_SINGLE_PLATE_DNA: 3,456 images, 7.8 GB - BBBC038_FULL: 33,215 images, 382 MB - All with validation rules and microscope type This enables: - Reproducible benchmarking across tools - Standardized metrics collection - Dataset caching and validation - Easy tool adapter implementation - Extensible pipeline registry

…tion SUMMARY ======= Add complete CellProfiler conversion infrastructure for benchmarking OpenHCS against CellProfiler. Uses a two-phase approach: one-time library absorption (LLM converts entire CellProfiler library), then instant .cppipe conversion (registry lookup, no LLM needed at conversion time). CONVERTER INFRASTRUCTURE (benchmark/converter/) =============================================== - absorb.py: CLI for one-time library absorption python -m benchmark.converter.absorb --model google/gemini-3-flash-preview - library_absorber.py: Core absorption logic - Scans cellprofiler_source/library/modules/_*.py - LLM converts each to OpenHCS format - Validates: syntax, @numpy decorator, 'image' first param, no relative imports - Writes to cellprofiler_library/functions/ - llm_converter.py: Dual-backend LLM converter - Ollama (local): model names like 'qwen2.5-coder:7b' - OpenRouter (cloud): model names like 'google/gemini-3-flash-preview' - Auto-detects backend from model name format (org/model = OpenRouter) - system_prompt.py: Comprehensive first-principles OpenHCS explanation (~470 lines) - Dimensional dataflow architecture - ProcessingContract semantics (PURE_2D, PURE_3D, FLEXIBLE, VOLUMETRIC_TO_SLICE) - Multi-input operations (stack along dim 0, unstack inside function) - special_outputs/special_inputs for labels and measurements - Conversion rules and template - contract_inference.py: Runtime contract inference - source_locator.py: CellProfiler source code locator - parser.py: .cppipe file parser - pipeline_generator.py: Generate OpenHCS pipelines - settings_binder.py: Bind .cppipe settings to function kwargs - convert.py: CLI for .cppipe conversion ABSORBED LIBRARY (benchmark/cellprofiler_library/) ================================================= 26 CellProfiler modules converted to OpenHCS functions: closing, colortogray, combineobjects, convertimagetoobjects, convertobjectstoimage, correctilluminationapply, crop, dilateimage, enhanceedges, enhanceorsuppressfeatures, erodeimage, erodeobjects, expandorshrinkobjects, fillobjects, gaussianfilter, measureimageoverlap, measureobjectsizeshape, medialaxis, medianfilter, morphologicalskeleton, opening, overlayobjects, reducenoise, savecroppedobjects, threshold, watershed CELLPROFILER SOURCE (benchmark/cellprofiler_source/) ==================================================== Extracted CellProfiler source code for LLM reference: - modules/: 90 module class files - library/modules/: 27 pure algorithm implementations - library/functions/: Core utility functions - library/opts/: Enums and options EXAMPLE PIPELINES ================= - benchmark/cellprofiler_pipelines/: Original .cppipe files + converted - benchmark/pipelines/: OpenHCS benchmark pipelines (numpy, cupy, gpu variants)

EXPERIMENTAL - may be reverted. - flash_config.py: Remove max_fps cap (None instead of 60) - geometry_tracking.py: New orthogonal geometry tracking - WidgetSizeMonitor: Detects size changes in watched widgets - AutoGeometryTracker: Discovers geometry-affecting widgets - FlashGeometryTracker: Queues flashes during layout changes - Eliminates timing race conditions by state transitions, not arbitrary delays

CHANGES: - system_prompt.py: Request structured JSON output with contract, category, confidence, reasoning - llm_converter.py: Parse JSON response, populate ConversionResult with LLM-inferred metadata - library_absorber.py: Use LLM-inferred values instead of hardcoded pure_2d/0.5 defaults - pipeline_generator.py: Map category → variable_components (z_projection→Z_INDEX, channel_operation→CHANNEL) - Removed LLM fallback mode - purely deterministic conversion from absorbed library - Deleted broken ExampleHuman_openhcs.py (garbage from early LLM run) CONTRACTS.JSON NOW INCLUDES: - contract: PURE_2D | PURE_3D | FLEXIBLE | VOLUMETRIC_TO_SLICE - category: image_operation | z_projection | channel_operation - confidence: 0.0-1.0 (LLM's confidence in inference) - reasoning: Why this contract/category was chosen PIPELINE GENERATION: - Fail-loud if modules missing from absorbed library (no fallback) - variable_components derived from LLM-inferred category

…bsorbed modules Implemented LLM-powered converter system that transpiles CellProfiler pipelines (.cppipe) into native OpenHCS pipelines. Successfully absorbed all 88 CellProfiler modules using Claude Opus 4.5 and converted both benchmark pipelines (ExampleHuman and ExampleFly) to runnable OpenHCS code. Three-phase system: (1) Absorption - LLM extracts pure algorithms from CellProfiler source, infers contracts and categories; (2) Parsing - deterministic .cppipe parsing; (3) Generation - maps modules to OpenHCS functions with proper variable_components. Key features: ROI+CSV materialization for segmentation, infrastructure module handling (LoadData/ExportToSpreadsheet), retry logic, registry system with contracts.json. Results: 88 absorbed modules (segmentation, measurements, image processing, morphology, projections, transformations), 2 converted pipelines (ExampleHuman 4 modules, ExampleFly 9 modules). Technical highlights: CamelCase registry fix, dual-axis resolution integration, special I/O handling, fail-loud error handling.

- Fixed parameter name normalization to exactly match SettingsBinder logic - Remove parenthetical content before normalization (e.g., '(Min,Max)') - This fixes mapping of tuple parameters like 'Typical diameter (Min,Max)' -> [min_diameter, max_diameter] - Fixed FunctionStep API usage to use tuple pattern: func=(function, {kwargs}) - Previously was incorrectly passing kwargs directly to FunctionStep - Now correctly passes kwargs dict as second element of tuple - Backfilled parameter mappings for 83/88 absorbed CellProfiler functions - Used Gemini Flash 3.0 to generate mappings from original source + absorbed function - Mappings stored in function docstrings as single source of truth - Added backfill_parameter_mappings.py script - Generated pipelines now have proper kwargs instead of comments - ExampleFly: min_diameter=10, max_diameter=40 correctly mapped from tuple - ExampleHuman: min_diameter=8, max_diameter=80 correctly mapped from tuple - All other parameters properly translated using docstring mappings

…semantics Used LLM (Gemini 3.0 Flash Preview) to analyze all 88 absorbed functions and determine correct categories based on input shape expectations and iteration semantics. Changes: - Created recategorize_functions.py script for LLM-based recategorization - Updated contracts.json with 7 category changes (81 unchanged) Category changes: z_projection (3 functions): - MakeProjection: Processes z-stacks (D, H, W) → (H, W) projections - Morphologicalskeleton: Has volumetric parameter for 3D processing - TrackObjects: Processes temporal sequences (frames over time) channel_operation (4 functions): - CorrectIlluminationCalculate: Per-channel illumination correction - IdentifyPrimaryObjects: Segment same marker across all sites per channel - RescaleIntensity: Per-channel intensity normalization - Tile: Assembles sites into montage per channel Impact: - IdentifyPrimaryObjects now uses VariableComponents.CHANNEL instead of SITE - MakeProjection now uses VariableComponents.Z_INDEX instead of SITE - Generated pipelines have semantically correct iteration order - Functions receive correct input shapes based on their processing semantics All changes verified against OpenHCS PURE_2D contract behavior: - PURE_2D unstacks dim 0 and calls function on each (H, W) slice - variable_components controls what dim 0 represents (sites, channels, or z-slices) - Total function calls remain the same, only iteration order changes

…ents semantics Updated LLM recategorization prompt with correct dimensional dataflow semantics: - image_operation (SITE): Single-channel operations across all sites (default) - z_projection (Z_INDEX): Functions that NEED z-stacks (projections, 3D ops) - channel_operation (CHANNEL): Functions that NEED multiple channels simultaneously Results: - channel_operation (4): ColorToGray, GrayToColorRgb, MeasureColocalization, UnmixColors - z_projection (2): MakeProjection, Morphologicalskeleton - image_operation (82): Everything else (single-channel operations) Fixed incorrect categorizations from previous run: - IdentifyPrimaryObjects: channel_operation → image_operation ✓ - CorrectIlluminationCalculate: channel_operation → image_operation ✓ - RescaleIntensity: channel_operation → image_operation ✓ - Tile: channel_operation → image_operation ✓ - TrackObjects: z_projection → image_operation ✓ (time-lapse uses sequential_components) Added correct categorizations: - ColorToGray: image_operation → channel_operation ✓ - MeasureColocalization: image_operation → channel_operation ✓ - UnmixColors: image_operation → channel_operation ✓ - GrayToColorRgb: image_operation → channel_operation ✓ (manual fix) Regenerated pipelines with correct variable_components.

UnmixColors has PURE_2D contract, which means it receives (H, W) and processes each site independently. PURE_2D with channel_operation would unstack dimension 0 and process each channel independently, which defeats the purpose. Dimensional dataflow rule: - PURE_2D contract → ALWAYS image_operation (processes each site independently) - FLEXIBLE/PURE_3D contract → can be channel_operation or z_projection (processes dim 0 together) Final categorizations: - channel_operation (3): ColorToGray, GrayToColorRgb, MeasureColocalization All have FLEXIBLE contract and process multiple channels together - z_projection (2): MakeProjection, Morphologicalskeleton Process z-stacks (volumetric data) - image_operation (83): Everything else, including all PURE_2D functions

Key changes: 1. measure_colocalization: Added channel_1/channel_2 params for arbitrary N-channel input 2. gray_to_color_rgb: Added red/green/blue_channel params for arbitrary N-channel input 3. gray_to_color_cmyk: Added channel selection params for arbitrary N-channel input 4. Fixed @numpy decorator: Removed invalid contract=ProcessingContract.X usage 5. Removed unused ProcessingContract imports from all 88 functions 6. Rewrote __init__.py with dynamic function loading from contracts.json 7. Regenerated pipelines with correct variable_components The dimensional dataflow compiler perspective: - Dimension 0 can be of ARBITRARY size (1, 2, 3, 4, 5, ... N) - Functions should parameterize channel selection, not hardcode indices - ProcessingContract is orthogonal to variable_components

1. Removed unused ProcessingContract import from header template 2. Removed duplicate imports in header template 3. Changed to dynamic function loading with get_function() 4. Fixed measurecolocalization parameter mapping: - 'Select images to measure' -> (pipeline-handled) (requires pipeline context) - 'Run all metrics?' -> (pipeline-handled) (multi-param not auto-mappable) 5. Regenerated ExampleFly and ExampleHuman pipelines with clean parameters

1. Removed duplicate parameter mapping from _outline helper function 2. Added correct mapping to identify_tertiary_objects docstring 3. Object selection settings ('Select the larger/smaller identified objects') are now (pipeline-handled) since they're @special_inputs 4. Only shrink_primary is an actual function parameter In OpenHCS, @special_inputs are wired at compile time by name matching, not passed as string parameters. CellProfiler's object naming convention doesn't map directly to function kwargs.

- Categorized all 88 absorbed functions into FLEXIBLE vs PURE_3D contracts - Identified critical architectural issues: * Contract mismatch (PURE_2D vs PURE_3D) * Tuple handling bug in _execute_pure_2d * Inconsistent special outputs format - Created phased refactoring plan with timeline and risk mitigation - Documented 14 FLEXIBLE functions (support true 3D + slice-by-slice) - Documented 74 PURE_3D functions (always internal slicing) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Comprehensive design document covering: - Architecture comparison (CellProfiler vs OpenHCS) - Identified abstraction leaks (A1-A3, B1-B4, C1-C4) - What we're certain about (contract system, aggregation orthogonality) - Design proposal: AggregationSpec and compile-time symbol resolution - Implementation phases - Open questions for further discussion

Detailed mapping of: - Core concept mapping (pipeline, data containers, object model) - Semantic gaps requiring new concepts (ObjectRegistry, etc.) - Adapter layer design for CellProfiler modules - ProcessingContract mapping - Measurement naming conventions - Settings system mapping - Abstraction leak analysis

…sign doc Includes: - Essential files to read (OpenHCS core + CellProfiler integration) - Detailed execution flow diagram - ProcessingContract implementation with code snippets - Special outputs system explanation - CellProfiler workspace structure - Absorbed function patterns (current buggy vs required) - Key terms glossary - Quick reference: what to read when

trissim and others added 24 commits December 19, 2025 19:22

chore: tighten benchmark platform plans

5febdae

docs: Move CellProfiler refactor plan to plans folder

811b1cb

docs: Move CellProfiler refactor plan to root plans folder

17e1608

Merge main into benchmark-platform

4169c07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

[DRAFT] Benchmark Platform for Nature Methods Paper#60

[DRAFT] Benchmark Platform for Nature Methods Paper#60
trissim wants to merge 25 commits intomainfrom
benchmark-platform

trissim commented Dec 19, 2025

Uh oh!

continue bot commented Dec 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

trissim commented Dec 19, 2025

Benchmark Platform for Nature Methods Paper

Overview

Plans Included

✅ plan_01_benchmark_infrastructure.md

✅ plan_02_dataset_acquisition.md

✅ plan_03_tool_adapters.md

✅ plan_04_metric_collectors.md

✅ plan_05_pipeline_equivalence.md

Architecture Highlights

Orthogonal Concerns

Declarative API

Fail-Loud Philosophy

Platform, Not Application

Diagrams Included

Implementation Status

Next Steps

Expected Results

Uh oh!

continue bot commented Dec 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant