Skip to content

Chardata first working on feature branch#6975

Open
pp-mo wants to merge 65 commits intoSciTools:FEATURE_chardatafrom
pp-mo:chardata_plus_encoded_datasets
Open

Chardata first working on feature branch#6975
pp-mo wants to merge 65 commits intoSciTools:FEATURE_chardatafrom
pp-mo:chardata_plus_encoded_datasets

Conversation

@pp-mo
Copy link
Member

@pp-mo pp-mo commented Mar 11, 2026

Successor to #6898
Now targetting (new) FEATURE_chardata feature branch in main repo

TODO: please check that any remaining unresolved issues on #6898 are now resolved here

pp-mo added 30 commits January 19, 2026 11:49
…Mostly working?

Get 'create_cf_data_variable' to call 'create_generic_cf_array_var': Mostly working?
Rename; addin parts of old investigation; add temporary notes.
@pp-mo pp-mo requested a review from ukmo-ccbunney March 11, 2026 01:35
@pp-mo pp-mo changed the title Chardata plus encoded datasets Chardata first working on feature branch Mar 11, 2026
Copy link
Contributor

@ukmo-ccbunney ukmo-ccbunney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking pretty good. 👍🏼
I've got a few comments, questions and suggestions.

I have not looked at the tests yet - just the main Iris code. I thought it was worth submitting the review at this point so you can see the comments. I'll take a look at the tests next.

Also - remind me - what are we doing in the case where data is stored as a netCDF string type - i.e. the variable length string type? At the moment that just loads in as an object array in numpy. Were we just leaving that as-is? We can't write that kind of datatype in Iris.

Edit: Discussed with @pp-mo and he reminded me that we never intended to handle the variable length string cases.

@codecov
Copy link

codecov bot commented Mar 13, 2026

Codecov Report

❌ Patch coverage is 92.92308% with 23 lines in your changes missing coverage. Please review.
✅ Project coverage is 90.18%. Comparing base (043b0bc) to head (e4242a1).

Files with missing lines Patch % Lines
...ib/iris/fileformats/netcdf/_bytecoding_datasets.py 93.36% 9 Missing and 4 partials ⚠️
lib/iris/fileformats/netcdf/saver.py 91.95% 5 Missing and 2 partials ⚠️
lib/iris/fileformats/netcdf/_thread_safe_nc.py 81.25% 3 Missing ⚠️
Additional details and impacted files
@@                 Coverage Diff                  @@
##           FEATURE_chardata    #6975      +/-   ##
====================================================
+ Coverage             90.11%   90.18%   +0.07%     
====================================================
  Files                    91       92       +1     
  Lines                 24912    25092     +180     
  Branches               4675     4689      +14     
====================================================
+ Hits                  22449    22629     +180     
- Misses                 1684     1688       +4     
+ Partials                779      775       -4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@ukmo-ccbunney ukmo-ccbunney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests looks sensible as far as I can tell, with the expectation that there will be more coverage added as part of #6898.

# Except if it already is one, since they forbid "re-wrapping".
if not hasattr(self._dataset, "THREAD_SAFE_FLAG"):
self._dataset = _thread_safe_nc.DatasetWrapper.from_existing(
self._dataset = bytecoding_datasets.DatasetWrapper.from_existing(
Copy link
Member Author

@pp-mo pp-mo Mar 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops.
I think this should possibly (also) be an 'EncodedDataset'.
I need to think about this one, I guess it depends on what kind of 'dataset-like' is passed in here.

Suggested change
self._dataset = bytecoding_datasets.DatasetWrapper.from_existing(
self._dataset = bytecoding_datasets.EncodedDataset.from_existing(

# Create a data-writeable object that we can stream into, which
# encapsulates the file to be opened + variable to be written.
write_wrapper = _thread_safe_nc.NetCDFWriteProxy(
write_wrapper = bytecoding_datasets.EncodedNetCDFWriteProxy(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
write_wrapper = bytecoding_datasets.EncodedNetCDFWriteProxy(
# Note: we do *not* support selectable string encoding for writes,
# so this never needs to be a _thread_safe_nc.NetCDFWriteProxy.
write_wrapper = bytecoding_datasets.EncodedNetCDFWriteProxy(

Co-authored-by: Patrick Peglar <patrick.peglar@metoffice.gov.uk>
@ukmo-ccbunney
Copy link
Contributor

ukmo-ccbunney commented Mar 13, 2026

Nearly there, I think.
There is just your decision on this comment and a failing doctest/docs build to address.

Edit: The docs failures might be a transient error - seems to be related to a link failure in the InterSphinx links:

intersphinx inventory 'https://pandas.pydata.org/docs/objects.inv' not fetchable due to <class 'requests.exceptions.HTTPError'>: 522 Server Error: <none> for url: https://pandas.pydata.org/docs/objects.inv

https://pandas.pydata.org seems to be unresponsive at the time of wtriting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

2 participants