Add support for reading/writing VTK XML ImageData (.vti) format#6032
Add support for reading/writing VTK XML ImageData (.vti) format#6032dzenanz wants to merge 5 commits intoInsightSoftwareConsortium:mainfrom
Conversation
|
It would be good to use an XML parsing library like expat which is already in ITK. |
|
Force-pushed CI failures fixed
Review feedback addressed
The XML header parsing is now done by expat (the same library
Switching to expat removes all the ad-hoc Tests addedThe previous commit shipped no VTI tests at all. This commit adds 1. Round-trip via
2. Behavior tests:
3. Hand-crafted-file readability tests for code paths the writer never produces but the reader must support (cross-checked against VTK's
This matches the relevant subset of VTK's Local resultsTest output (every assertion passes):
Out-of-scope items (potential follow-ups)
|
|
We should add a few test files converted into .vti format by ParaView, and regression test them against the existing .nrrd/.mha versions. And of course, manually review this. |
|
Legacy removed tests failed: Modules/IO/VTK/test/itkVTIImageIOTest.cxx:100:56: warning: 'itk::ImageConstIterator::IndexType itk::ImageConstIterator::GetIndex() const [with TImage = itk::Image<unsigned char, 1>; itk::ImageConstIterator::IndexType = itk::Index<1>]' is deprecated: Please use |
Closes InsightSoftwareConsortium#6030's parent issue thread; replaces the WIP implementation in the original PR with one that addresses both the CI failures and the review feedback. == Why a rewrite == The original WIP commit on this branch failed CI on every C++ platform and on Python wrapping, and Bradley Lowekamp's review asked for the XML parsing to use ITK's existing expat library rather than ad-hoc string matching. Rather than patch the WIP implementation, this commit restarts the file from the same upstream/main base with: - expat-based XML parsing for the file header - Python wrapping registration - KWStyle-clean doxygen comments - an ENH: prefix that satisfies kwrobot/ghostflow - a comprehensive round-trip and corner-case test suite == Algorithm == VTIImageIO inherits from itk::ImageIOBase and supports the serial-piece subset of the VTK XML ImageData (.vti) format. Reading 1. The whole file is slurped into memory. 2. If the file contains an `<AppendedData>` element, the XML view fed to expat is truncated at that element and replaced with a self-closing `<AppendedData/></VTKFile>` so the parser sees a well-formed document. The byte offset of the `_` marker that introduces the raw binary block is recorded for later seek-and-read. 3. The truncated XML view is parsed with expat (XML_Parser / XML_SetElementHandler / XML_SetCharacterDataHandler). Element handlers populate a VTIParseState with the VTKFile, ImageData, PointData, and active DataArray attributes, and the character data handler captures inline ASCII or base64 contents. 4. After parsing, geometry/spacing/origin/component-type/pixel-type are populated on the ImageIOBase from the captured state. 5. Read() then routes to one of three branches: - ASCII: parse the cached character data via ReadBufferAsASCII - base64: decode the cached base64 string, strip the UInt32/UInt64 block-size header, memcpy into the user buffer, then byte-swap if file != host endianness - raw appended: open the file, seek to m_AppendedDataOffset + m_DataArrayOffset + headerBytes, read, then byte-swap if necessary The expat-based reader handles the things string matching cannot: attribute reordering, optional whitespace, XML comments at any depth, the standalone <?xml ... ?> declaration, and PointData with multiple DataArrays where only one is the active scalar/vector/tensor. Writing - ASCII or binary (base64) DataMode, selectable via SetFileType(). - SymmetricSecondRankTensor pixels are expanded from ITK's 6-component storage to VTK's full 9-component layout for ASCII output. Binary tensor writing is intentionally rejected with an exception because the on-disk NumberOfComponents="9" header would silently disagree with a 6-component memory buffer; this is verified by a test. - Output is always written in the host byte order; the file declares its byte_order accordingly. - The block-size header for base64 binary is currently UInt32 (the UInt64 reader path is exercised by a hand-crafted test file below). Compression (zlib/lz4) and the VTK direction matrix are intentionally out of scope for this PR. == CI failures fixed == - itkVTIImageIO.h:124 KWStyle: comment doesn't have \class The DataEncoding `enum class` had a `/** ... */` doxygen comment; KWStyle's class-comment rule treats `enum class` like a class. Replaced with a plain `//` comment. - KeyError: 'VTIImageIOFactory' (Python tests) Added Modules/IO/VTK/wrapping/itkVTIImageIO.wrap registering both VTIImageIO and VTIImageIOFactory with the auto-loaded wrap module. - ghostflow-check-main: 'WIP:' prefix not in allowed list Commit message now uses ENH: prefix. - Module dependency: ITKExpat added as a PRIVATE_DEPENDS of ITKIOVTK so the new XML parser actually links. Updated the FACTORY_NAMES to include `ImageIO::VTI`. == Test strategy and coverage == The new itkVTIImageIOTest.cxx exercises the filter through 3 classes of cases: 1. Round-trip tests via ImageFileWriter -> ImageFileReader for the common pixel type and dimensionality combinations: uchar 1D / 2D, short 3D, float 3D, double 3D, RGB 2D, vector<float,3> in 3D, both ASCII and binary (base64) encodings where applicable. Each round-trip checks region, spacing, origin, and per-pixel bit-equivalence. 2. Behavior tests: - Symmetric tensor ASCII round-trip (writes 9-component layout, reads back into 6-component ITK layout). - Symmetric tensor binary write must throw (silent layout corruption guard). 3. Hand-crafted-file readability tests for code paths the writer never produces but the reader must support: - XML robustness: comments at multiple positions, attribute reordering, multiple DataArrays in PointData with the active Scalars selector pointing at the second array. Verifies dimensions, spacing, origin, AND that the correct active array is selected. - header_type="UInt64": base64 file with an 8-byte block-size header; verifies the dual UInt32/UInt64 header path. - format="appended" with raw binary in <AppendedData>: verifies the file-truncation + offset-seek path. - byte_order="BigEndian": base64 file with the data and the block-size header pre-swapped to big-endian; verifies the byte-swap-on-read path. - CanReadFile / CanWriteFile sanity. This matches the relevant subset of VTK's own TestDataObjectXMLIO.cxx coverage matrix (DataMode x ByteOrder x HeaderType x DataObjectType) for the features this PR claims. ZLIB/LZ4 compression and the VTK direction matrix are intentionally not exercised since this PR does not claim those features. == Local results == $ cmake --build build-ssim -j48 --target ITKIOVTKTestDriver [4/4] Linking CXX executable bin/ITKIOVTKTestDriver $ ctest -R itkVTIImageIOTest --output-on-failure Test InsightSoftwareConsortium#1259: itkVTIImageIOTest .... Passed 0.03 sec 100% tests passed, 0 tests failed out of 1 pre-commit (gersemi, clang-format, kw-pre-commit) clean on every touched file. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This appears when legacy code is removed. The exact error message was: Modules/IO/VTK/test/itkVTIImageIOTest.cxx:100:56: warning: 'itk::ImageConstIterator::IndexType itk::ImageConstIterator::GetIndex() const [with TImage = itk::Image<unsigned char, 1>; itk::ImageConstIterator::IndexType = itk::Index<1>]' is deprecated: Please use ComputeIndex() instead, or use an iterator with index, like ImageIteratorWithIndex! [-Wdeprecated-declarations]
|
@greptileai review this. |
|
| Filename | Overview |
|---|---|
| Modules/IO/VTK/src/itkVTIImageIO.cxx | Core implementation: two P1 bugs — tensor read maps wrong indices, byte-swap no-op on BE hosts reading LE files; P2 issues include entire-file slurp for binary paths, uint32_t block-size truncation, and base64-appended encoding not distinguished from raw-appended. |
| Modules/IO/VTK/include/itkVTIImageIO.h | New header following ITK conventions with SmartPointer, itkNewMacro, itkOverrideGetNameOfClassMacro; private state members well-documented; no issues found. |
| Modules/IO/VTK/itk-module.cmake | Correctly adds ITKExpat as PRIVATE_DEPENDS and registers the VTI factory name; description updated. |
| Modules/IO/VTK/test/itkVTIImageIOTest.cxx | Comprehensive round-trip tests for scalar/vector/RGB/tensor types; however tensor value correctness is not verified (test only checks no exception), missing coverage for the broken tensor mapping bug. |
| Modules/IO/VTK/test/itkVTIImageIOReadWriteTest.cxx | Integration test using real ITK data files with pixel-level comparison; well structured. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[VTIImageIO::ReadImageInformation] --> B[SlurpFile into memory]
B --> C{AppendedData in file?}
C -- No --> D[Parse full XML with expat]
C -- Yes --> E[Truncate XML at AppendedData tag / Record _ marker offset]
E --> D
D --> F{DataArray format?}
F -- ascii --> G[Store ASCII text in m_AsciiDataContent]
F -- binary/base64 --> H[Store base64 text in m_Base64DataContent]
F -- appended --> I[Store m_AppendedDataOffset + m_DataArrayOffset]
J[VTIImageIO::Read] --> K{m_DataEncoding?}
K -- ASCII --> L[ReadBufferAsASCII from cached string]
K -- Base64 --> M[DecodeBase64 / skip block-size header / memcpy]
K -- RawAppended --> N[Seek to offset in file / read raw bytes]
M --> O[SwapBufferIfNeeded]
N --> O
O --> P{fileBigEndian == sysBigEndian?}
P -- Yes --> Q[No swap needed]
P -- No --> R[SwapRangeFromSystemToBigEndian - no-op on BE host]
S[VTIImageIO::Write] --> T{IOFileEnum?}
T -- ASCII --> U[WriteBufferAsASCII / Expand tensor 6 to 9 components]
T -- Binary --> V[Prepend UInt32 block header / EncodeBase64]
Reviews (1): Last reviewed commit: "COMP: Fix GetIndex() is deprecated: Plea..." | Re-trigger Greptile
| if (isTensor) | ||
| { | ||
| // VTK tensors are 3x3 = 9 components on disk; ITK uses 6 (symmetric). | ||
| this->SetPixelType(IOPixelEnum::SYMMETRICSECONDRANKTENSOR); | ||
| this->SetNumberOfComponents(6); | ||
| } |
There was a problem hiding this comment.
Tensor round-trip produces wrong values
SetNumberOfComponents(6) after finding a 9-component (3×3) tensor DataArray causes ReadBufferAsASCII to read only 6 of the 9 per-pixel ASCII values, mapping them to the wrong tensor indices.
The writer emits rows as t[0] t[1] t[2] | t[1] t[3] t[4] | t[2] t[4] t[5] (full 3×3), so the flat stream is [t0,t1,t2,t1,t3,t4,t2,t4,t5]. Reading 6 of these yields [t0,t1,t2,t1,t3,t4], not [t0,t1,t2,t3,t4,t5]. Symmetric indices 3, 4, and 5 are silently wrong after round-trip. The test only checks that no exception is thrown, not that pixel values are correct.
| SwapBufferIfNeeded(void * buffer, std::size_t componentSize, std::size_t numComponents, IOByteOrderEnum fileOrder) | ||
| { | ||
| const bool fileBigEndian = (fileOrder == IOByteOrderEnum::BigEndian); | ||
| const bool sysBigEndian = ByteSwapper<uint16_t>::SystemIsBigEndian(); | ||
| if (fileBigEndian == sysBigEndian) | ||
| { | ||
| return; | ||
| } | ||
| switch (componentSize) | ||
| { | ||
| case 1: | ||
| break; | ||
| case 2: | ||
| ByteSwapper<uint16_t>::SwapRangeFromSystemToBigEndian(static_cast<uint16_t *>(buffer), numComponents); | ||
| break; | ||
| case 4: | ||
| ByteSwapper<uint32_t>::SwapRangeFromSystemToBigEndian(static_cast<uint32_t *>(buffer), numComponents); | ||
| break; | ||
| case 8: | ||
| ByteSwapper<uint64_t>::SwapRangeFromSystemToBigEndian(static_cast<uint64_t *>(buffer), numComponents); | ||
| break; | ||
| default: | ||
| { | ||
| ExceptionObject e_(__FILE__, __LINE__, "Unknown component size for byte swap.", ITK_LOCATION); | ||
| throw e_; | ||
| } | ||
| } |
There was a problem hiding this comment.
Byte-swap is a no-op when reading a little-endian file on a big-endian system
SwapRangeFromSystemToBigEndian performs a byte reversal only when the host is little-endian; on a big-endian host it is a no-op. The guard (fileBigEndian != sysBigEndian) correctly enters the swap branch when file=LE, sys=BE, but then the selected swap function does nothing, leaving the data in little-endian order while the caller expects system (big-endian) byte order.
The correct call for the file=LE path is SwapRangeFromSystemToLittleEndian, which byte-reverses on a big-endian host:
if (fileBigEndian)
{
// data is BE, system is LE
ByteSwapper<uintN_t>::SwapRangeFromSystemToBigEndian(ptr, n);
}
else
{
// data is LE, system is BE
ByteSwapper<uintN_t>::SwapRangeFromSystemToLittleEndian(ptr, n);
}| else if (std::strcmp(name, "AppendedData") == 0) | ||
| { | ||
| st->sawAppendedData = true; | ||
| // We do not consume any character data from inside AppendedData via | ||
| // expat -- the binary content following the `_` marker is XML-illegal | ||
| // and is read directly from the file by the caller. We don't even | ||
| // get this far on a real raw-appended file because the binary bytes | ||
| // would have caused a parser error before reaching this start tag. | ||
| // The caller pre-truncates the XML at <AppendedData> to handle that. | ||
| } |
There was a problem hiding this comment.
Base64-encoded appended data silently misread as raw binary
The encoding attribute of <AppendedData> (e.g., encoding="base64") is not captured. Any file with format="appended" and encoding="base64" in the AppendedData element is assumed to be raw binary and will produce garbage pixel data without any error or warning.
Consider capturing the encoding attribute and either handling base64-appended or throwing a clear itkExceptionMacro("AppendedData encoding=\"base64\" is not supported…").
| const std::string content = SlurpFile(m_FileName); | ||
| if (content.empty()) | ||
| { | ||
| itkExceptionMacro("Cannot open or read file: " << m_FileName); | ||
| } | ||
|
|
There was a problem hiding this comment.
Entire file slurped into memory for raw-appended paths
SlurpFile reads the complete file content (including all binary pixel data) into a std::string for every call path, including raw-appended files where only the XML header preceding <AppendedData> is needed. For large 3-D volumes this can double peak memory use. Consider reading only up to (and including) the <AppendedData …> tag when the appended-data path is detected.
There was a problem hiding this comment.
I just manually checked this. This remark is not accurate. But the entire file will be read twice.
There was a problem hiding this comment.
This could of course be more performant. But a good first step would be to just make it work.
There was a problem hiding this comment.
content variable at line 492 is local to InternalReadImageInformation method, so it will go out of scope, thus not increasing the peak memory requirement.
| file << ">\n"; | ||
|
|
||
| // Prepend a UInt32 block-size header (number of raw data bytes). | ||
| const auto blockSize = static_cast<uint32_t>(totalBytes); |
There was a problem hiding this comment.
uint32_t block-size header silently truncates for images >~4 GB
static_cast<uint32_t>(totalBytes) wraps around for images whose serialised byte count exceeds UINT32_MAX (~4 GiB). The resulting header value is wrong but no exception is thrown. An explicit range check would make the overflow visible:
if (totalBytes > static_cast<SizeType>(std::numeric_limits<uint32_t>::max()))
{
itkExceptionMacro("Image exceeds 4 GB; use header_type=\"UInt64\" (not yet supported for writing).");
}
const auto blockSize = static_cast<uint32_t>(totalBytes);
Prompt for Claude Sonnet 4.6:
Add support for .vti image file format. Look at how similar .vtk image file format is implemented in itk::VTKImageIO class.Full implementation supporting:
Note: I have not reviewed this myself. Opening PR to check automated testing results.