Add Zarr v2 archive format support#286
Conversation
Implements a native Zarr v2 archive (ZarrArchive) that requires no external library beyond standard C++17. Each field is stored as a Zarr array in a subdirectory named `<prefix>_<field>.zarr/` under the archive directory. Multiple saves of the same field are tracked via a leading time dimension; individual saves map to separate chunk files (`<id>.0.0...0`) following the Zarr v2 chunk naming convention. Changes: - src/serialbox/core/archive/ZarrArchive.h: new archive class - src/serialbox/core/archive/ZarrArchive.cpp: full implementation * pure C++17, no external dependencies * native byte-order dtype strings in .zarray metadata * handles both contiguous and strided (padded) StorageViews * supports Read / Write / Append open modes * writeToFile / readFromFile for stateless single-save I/O - src/serialbox/core/archive/ArchiveFactory.cpp: register Zarr; add .zarr extension mapping in archiveFromExtension - src/serialbox/core/archive/ArchiveFactory.h: update docstring - src/serialbox/core/CMakeLists.txt: compile ZarrArchive sources - test/serialbox/core/archive/UnittestZarrArchive.cpp: unit tests mirroring the NetCDF test suite (construction, metadata validation, .zarray content, writeToFile/readFromFile, typed read/write round-trips) - test/serialbox/core/CMakeLists.txt: include new test file https://claude.ai/code/session_012z6neCsMd8cRDKcYFRaqrz
There was a problem hiding this comment.
Pull request overview
This PR adds support for the Zarr v2 storage format to Serialbox as a new archive backend. Zarr is a cloud-friendly, chunked array storage format that enables efficient I/O and interoperability with scientific Python tools. The implementation follows established patterns from existing archive backends (BinaryArchive and NetCDFArchive) and integrates seamlessly with the existing ArchiveFactory infrastructure.
Changes:
- Implements a new
ZarrArchiveclass that stores fields as Zarr v2 arrays in subdirectories, with support for multiple saves per field - Integrates Zarr archive into
ArchiveFactoryfor archive creation and file extension resolution - Adds comprehensive unit tests covering construction, metadata handling, read/write operations, and various data types/dimensions
Reviewed changes
Copilot reviewed 7 out of 8 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| src/serialbox/core/archive/ZarrArchive.h | Header defining the ZarrArchive class with Archive interface implementation, static utility methods, and helper functions |
| src/serialbox/core/archive/ZarrArchive.cpp | Implementation of ZarrArchive including endianness detection, data serialization, metadata management, and Zarr v2 format compliance |
| src/serialbox/core/archive/ArchiveFactory.h | Updated documentation to include .zarr extension mapping |
| src/serialbox/core/archive/ArchiveFactory.cpp | Integrated ZarrArchive into factory methods for creation and file I/O |
| src/serialbox/core/CMakeLists.txt | Added ZarrArchive source files to build configuration |
| test/serialbox/core/CMakeLists.txt | Added ZarrArchive unit test to test suite |
| test/serialbox/core/archive/UnittestZarrArchive.cpp | Comprehensive tests for ZarrArchive covering construction, metadata validation, read/write operations, and multiple data types/dimensions |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| const std::size_t numDataDims = activeDims.size(); | ||
|
|
||
| // Create directory | ||
| std::filesystem::create_directories(fieldDir); |
There was a problem hiding this comment.
The create_directories call should be wrapped in a try-catch block to handle std::filesystem::filesystem_error exceptions consistently with the constructor (lines 187-189) and the write method (lines 290-294). This ensures that filesystem errors are properly caught and converted to Serialbox Exception types with appropriate error messages.
Summary
This PR adds support for the Zarr v2 storage format as a new archive backend in Serialbox. Zarr is a cloud-friendly, chunked array storage format that enables efficient I/O and interoperability with scientific Python tools.
Key Changes
New ZarrArchive class (
src/serialbox/core/archive/ZarrArchive.h/cpp):Archiveinterface for Zarr v2 format<prefix>_<field>.zarr/.zarrayfor Zarr metadata,ArchiveMetaData-<prefix>.jsonfor Serialbox metadata)Core Features:
writeToFile/readFromFile)Integration:
ArchiveFactoryto recognize and create Zarr archivesDirectory Layout:
Implementation Details
https://claude.ai/code/session_012z6neCsMd8cRDKcYFRaqrz