Skip to content

Conversation

@JohannesKarlsen99
Copy link

This PR adds support for using pre-calculated checksums during E-ARK SIP generation, avoiding redundant checksum calculations for large files.

Fixes #368

Changes

Interface Changes

  • IPFileInterface.java: Added getChecksum(), getChecksumAlgorithm(), and hasPreCalculatedChecksum(String algorithm) methods to the interface

Implementation Changes

  • IPFileShallow.java: Implemented the new interface methods
  • METSFileTypeZipEntryInfo.java: Added constructor that accepts pre-calculated checksum
  • ZIPUtils.java:
    • Added overloaded addFileTypeFileToZip() method accepting pre-calculated checksum
    • Modified zip() to skip checksum calculation when valid pre-calculated checksum exists
    • Added copyWithoutChecksum() helper method
    • Replaced deprecated DatatypeConverter.printHexBinary with HexFormat
  • FolderWriteStrategy.java:
    • Modified writeFileToPath() to skip checksum calculation when valid pre-calculated checksum exists
    • Replaced deprecated DatatypeConverter.printHexBinary with HexFormat
  • EARKUtils.java: Updated all addFileTypeFileToZip() calls to pass checksum from IPFile

Bug Fix

Fixed a bug where file.setChecksum(sip.getChecksum()) was overwriting the pre-calculated checksum with the algorithm name before it could be used. The fix saves the pre-calculated checksum values before this call.

Usage Example

// Create a file with pre-calculated checksum
IPFile representationFile = new IPFile(Paths.get("large_video.mkv"));
representationFile.setChecksum("ABC123DEF456...");  // Your pre-calculated checksum
representationFile.setChecksumAlgorithm("SHA-256");  // Must match SIP's algorithm
representation.addFile(representationFile);

// When building the SIP, the checksum won't be recalculated
Path sipPath = sip.build(writeStrategy);

Testing

Added testPreCalculatedChecksumSupport() test that:

  1. Sets a fake checksum on an IPFile
  2. Builds a SIP
  3. Extracts the SIP and verifies the fake checksum appears in the METS file
  4. This proves the library uses the pre-calculated value rather than calculating

@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Dec 18, 2025
@dosubot
Copy link

dosubot bot commented Dec 18, 2025

Related Documentation

1 document(s) may need updating based on files changed in this PR:

RODA and DBPTK Space

How did I do? Any feedback?  Join Discord

@dosubot dosubot bot added feature java Pull requests that update java code labels Dec 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature java Pull requests that update java code size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support for pre-calculated checksums to avoid redundant checksum calculation during SIP generation

1 participant