Chardata plus encoded datasets by pp-mo · Pull Request #6898 · SciTools/iris

pp-mo · 2026-01-19T13:49:34Z

Closes #6309 + various

Successor to #6850
now incorporating #6851

+ now integrated usage with netcdf load+save, to use encoded datasets

…Mostly working? Get 'create_cf_data_variable' to call 'create_generic_cf_array_var': Mostly working?

… Cubes.

Rename; addin parts of old investigation; add temporary notes.

…or overlength writes.

…width.

pp-mo · 2026-01-28T18:19:06Z

lib/iris/fileformats/netcdf/_bytecoding_datasets.py

+    string_width: int  # string lengths when viewing as strings (i.e. "Uxx")
+
+    def __init__(self, cf_var):
+        """Get all the info from an netCDF4 variable (or similar wrapper object).


Suggested change

"""Get all the info from an netCDF4 variable (or similar wrapper object).

"""Get all the info from a netCDF4 variable.

It actually must be "at least" a threadsafe wrapped variable (or real netCDF4.Variable) and not an EncodedVariable, since we inspect it's '.dtype' etc.

pp-mo · 2026-01-28T18:19:42Z

lib/iris/fileformats/netcdf/_bytecoding_datasets.py

+    read_encoding: str  # *always* a valid encoding from the codecs package
+    write_encoding: str  # *always* a valid encoding from the codecs package
+    n_chars_dim: int  # length of associated character dimension
+    string_width: int  # string lengths when viewing as strings (i.e. "Uxx")


These are now only set if "is_chardata" -- see init code

pp-mo · 2026-01-28T18:21:39Z

lib/iris/fileformats/netcdf/_bytecoding_datasets.py

+DECODE_TO_STRINGS_ON_READ = NetcdfStringDecodeSetting()
+DEFAULT_READ_ENCODING = "utf-8"
+DEFAULT_WRITE_ENCODING = "ascii"


These should be made available in public API.
Probably by importing in iris.fileformats.netcdf and including in its __all__ ?

ukmo-ccbunney

Just one comment at this time.

ukmo-ccbunney · 2026-01-30T10:56:41Z

lib/iris/fileformats/netcdf/_bytecoding_datasets.py

+        encoding = self.read_encoding
+        if "utf-16" in encoding:
+            # Each char needs at least 2 bytes -- including a terminator char
+            strlen = (strlen // 2) - 1


Do we really need to account for a terminating char on "utf-32" and "utf-16" encodings?
When writing to a netCDF file, surely the terminator isn't written? This is just something that is used when storing strings in memory, is it not?

OK - this looks to be the case. Certainly encoding a byte string to "utf-16" or "utf-32" does appear to add an extra null terminator...

OK - this looks to be the case. Certainly encoding a byte string to "utf-16" or "utf-32" does appear to add an extra null terminator...

And, from my experiments, omitting the extra byte breaks a reverse 'decode' operation.

pp-mo · 2026-03-06T10:37:58Z

Update

merged from main to unblock CI testing

pp-mo added 28 commits January 19, 2026 11:49

Initial tests.

041af2d

Get 'create_cf_data_variable' to call 'create_generic_cf_array_var': …

65bd9dd

…Mostly working? Get 'create_cf_data_variable' to call 'create_generic_cf_array_var': Mostly working?

Reinstate decode on load, now in-Iris coded.

d75a7a7

Revert and amend.

07efc06

Hack to preserve the existing order of attributes on saved Coords and…

2321077

… Cubes.

Fix for dataless; avoid FUTURE global state change from temporary tests.

0174e53

Further fix to attribute ordering.

035e28b

Fixes for data packing.

80c4776

Latest test-chararrays.

d4d3ebd

Fix search+replace error.

3f10cc1

Tiny fix in crucial place! (merge error?).

ee2fe4c

Extra mock property prevents weird test crashes.

744826d

Fix another mock problem.

a3e1217

Initial dataset wrappers.

1a4f2f2

Rename; addin parts of old investigation; add temporary notes.

Various notes, choices + changes: Beginnings of encoded-dataset testing.

0148f43

Replace use of encoding functions with test-specific function: Test f…

20a5be2

…or overlength writes.

Radically simplify 'make_bytesarray', by using a known specified byte…

9b621bf

…width.

Add read tests.

b366fd2

Remove iris width control (not in this layer).

cf048b2

more notes

e684d1d

Merge branch 'encoded_datasets' into chardata_plus_encoded_datasets

28b124c

Remove temporary test code.

a20cc45

Use iris categorised warnings for unknown encodings.

c995a8d

Clarify the temporary load/save exercising tests (a bit).

f118c18

Use bytecoded_datasets in nc load+save, begin fixes.

c8a27df

Further attempt to satisfy warning cateogry checker.

c4a31a4

Fix overlength error tests.

10831d7

Get temporary iris load/save exercises working (todo: proper tests).

042028e

scitools-ci bot added this to 🚴 Peloton Jan 20, 2026

pp-mo mentioned this pull request Jan 20, 2026

Chardata plus #6850

Closed

pp-mo commented Jan 28, 2026

View reviewed changes

ukmo-ccbunney reviewed Jan 30, 2026

View reviewed changes

pp-mo mentioned this pull request Feb 3, 2026

Fix iris handling of netcdf character array variables #6309

Open

pp-mo added 5 commits February 27, 2026 16:46

Fix mock patches.

2dbdcba

Fix patches in test_CFReader.

a34ea09

Fix variable creation in odd cases.

aa1fe03

Ignore attribute reordering in scaling-packed saves.

f5d50ee

Fix test for refactored proxy constructor.

b2c6d51

pp-mo mentioned this pull request Feb 27, 2026

Chardata plus encoded datasets pp-mo/iris#122

Closed

pp-mo added 2 commits February 27, 2026 18:56

Fix get_cf_var_data to support vlen-string.

dfd4d91

Add back new test results, folder removed in error.

274fae4

pp-mo force-pushed the chardata_plus_encoded_datasets branch from 274fae4 to 31884e9 Compare March 6, 2026 10:37

pp-mo force-pushed the chardata_plus_encoded_datasets branch 2 times, most recently from e328f94 to 2800dc1 Compare March 6, 2026 12:31

Merge branch 'latest' into chardata_plus_encoded_datasets

09137c3

pp-mo force-pushed the chardata_plus_encoded_datasets branch from 2800dc1 to 09137c3 Compare March 6, 2026 12:52

pp-mo mentioned this pull request Mar 6, 2026

Chardata plus encoded datasets prerebase pp-mo/iris#124

Closed

pp-mo added 2 commits March 6, 2026 17:16

Fix string-type check in cf to suit any of the new dtypes.

122dc92

Remove non-working no-unit for label variables.

0bb70e1

pp-mo force-pushed the chardata_plus_encoded_datasets branch from c4a60d5 to 0bb70e1 Compare March 6, 2026 17:18

Separate asserts for ruff PT018.

3c44c8b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chardata plus encoded datasets#6898

Chardata plus encoded datasets#6898
pp-mo wants to merge 48 commits intoSciTools:mainfrom
pp-mo:chardata_plus_encoded_datasets

pp-mo commented Jan 19, 2026 •

edited

Loading

Uh oh!

pp-mo Jan 28, 2026 •

edited

Loading

Uh oh!

pp-mo Jan 28, 2026

Uh oh!

pp-mo Jan 28, 2026

Uh oh!

ukmo-ccbunney left a comment

Uh oh!

ukmo-ccbunney Jan 30, 2026

Uh oh!

ukmo-ccbunney Jan 30, 2026

Uh oh!

pp-mo Feb 26, 2026

Uh oh!

pp-mo commented Mar 6, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	"""Get all the info from an netCDF4 variable (or similar wrapper object).
	"""Get all the info from a netCDF4 variable.

Conversation

pp-mo commented Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pp-mo Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pp-mo Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

pp-mo Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

ukmo-ccbunney left a comment

Choose a reason for hiding this comment

Uh oh!

ukmo-ccbunney Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

ukmo-ccbunney Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

pp-mo Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

pp-mo commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Update

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pp-mo commented Jan 19, 2026 •

edited

Loading

pp-mo Jan 28, 2026 •

edited

Loading

pp-mo commented Mar 6, 2026 •

edited

Loading