Skip to content

Conversation

@austnwil
Copy link
Contributor

Description of changes:

This starts the implementation of the binary 1.1 array-backed bytecode generator, adding a basic refill implementation and support for reading integer, lob, text, and short timestamp references.

Reading decimal and long timestamp references is not yet implemented.

I pulled the test cases for the opcode handlers into shared test cases since they were also useful for testing the generator. A lot of files were changed but most of them are just refactoring the existing opcode handler tests to use the shared test cases.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@codecov
Copy link

codecov bot commented Oct 23, 2025

Codecov Report

❌ Patch coverage is 92.39766% with 13 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (ion11@1ce55a6). Learn more about missing BASE report.

Files with missing lines Patch % Lines
...ion/bytecode/bin11/ByteArrayBytecodeGenerator11.kt 66.66% 7 Missing and 3 partials ⚠️
...n/ion/bytecode/bin11/bytearray/TimestampDecoder.kt 98.49% 1 Missing and 1 partial ⚠️
...n/ion/bytecode/bin11/bytearray/PrimitiveDecoder.kt 80.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@           Coverage Diff            @@
##             ion11    #1134   +/-   ##
========================================
  Coverage         ?   69.24%           
  Complexity       ?     6249           
========================================
  Files            ?      201           
  Lines            ?    25245           
  Branches         ?     4399           
========================================
  Hits             ?    17482           
  Misses           ?     6418           
  Partials         ?     1345           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@austnwil austnwil marked this pull request as ready for review October 23, 2025 21:36
@austnwil austnwil requested review from popematt and tgregg and removed request for tgregg October 23, 2025 21:36
Comment on lines 91 to 99
@JvmStatic
fun emitRefill(destination: BytecodeBuffer) {
destination.add(Instructions.I_REFILL)
}

@JvmStatic
fun emitEndOfInput(destination: BytecodeBuffer) {
destination.add(Instructions.I_END_OF_INPUT)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a blocker.

We don't need to have helper functions for everything. In this case, it's worth asking what is this actually helping with? Some of the other helper functions in the class handle things like spreading the bits for a long value or converting from float to the int bits.

If you compare these two, I think the version with BytecodeHelper doesn't add value, and maybe even obscures what's going on.

destination.add(Instructions.I_REFILL)

BytecodeHelper.emitRefill(destination)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point

}

if (currentPosition < source.size) {
BytecodeEmitter.emitRefill(destination)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I attempted to work on writing tests for a different BytecodeGenerator implementation, and I found that it added a lot of noise to have the REFILL instruction added by the BytecodeGenerator. It's also something that's the same regardless of the implementation, and it's a concern of the reader rather than being part of the data, so I think it makes more sense for the BytecodeIonReader to handle adding in the REFILL instruction.

/**
* Helper class for decoding the various short timestamp encoding variants from a [ByteArray].
*/
internal object ShortTimestampDecoder {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about just TimestampDecoder? There's not really a need to have separate short and long timestamp decoder classes.

return Timestamp.forSecond(year + 1970, month, day, hour, minute, secondBigDecimal.add(fractionalSecondBigDecimal), (offset - 56) * 15)
}

fun readTimestamp(source: ByteArray, position: Int, precisionAndOffsetMode: Int): Timestamp {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is precisionAndOffsetMode the opcode?

Copy link
Contributor Author

@austnwil austnwil Oct 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was just the low nibble. Looks like the bytecode reference does call for the entire opcode, so I made that change

Comment on lines 216 to 218
// TODO: calling function references like this might be slower than just using a conditional or other solutions.
// Might be worth looking into.
val decoder = opcodeToDecoderFunctionTable[precisionAndOffsetMode]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think it will be. A when on the opcode is probably going to be faster because it will be compiled as a tableswitch instruction in the java bytecode instead of having to load the array, and then load a value from the array (both of which require fetching objects from the heap), and then the JVM potentially still has to perform a vtable lookup to find the correct method implementation for the interface.

The extra indirection here also limits the JVM's ability to perform method inlining. For example, if an application only reads data that uses 0x85 opcode, the JVM could inline that particular method into this method, eliminating the overhead of the function call. However, when you're using method references like this, hinders that ability.

FYI, I think I wrote a note about that here: https://github.com/popematt/ion-java/blob/6f5274ecc5e5812f08884a6d1aa1c4d7546861fe/src/main/java/com/amazon/ion/v8/TimestampHelper.kt#L34-L38

So why are use using an array of function references for the opcode lookup? The main reasons are that (1) the opcode handler table is already going to be hard for the JVM to predict anyway because there's going to be lots of different opcodes in normal Ion data, (2) the when would make the method very long and hard to read, and (3) the compiled method is going to be quite large and unable to be inlined at its own call sites.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great explanation!

Comment on lines 59 to 68
val timestampReferenceBytes = encodedTimestampBytes.hexStringToByteArray()
val generator = ByteArrayBytecodeGenerator11(timestampReferenceBytes, 0)
val bytecode = BytecodeBuffer()
generator.refill(bytecode, ConstantPool(), intArrayOf(), intArrayOf(), arrayOf())

val timestampPrecisionAndOffsetMode = Instructions.getData(bytecode.get(0))
val timestampPosition = bytecode.get(1)
val expectedTimestamp = Timestamp.valueOf(expectedTimestampString)
val readTimestamp = generator.readShortTimestampReference(timestampPosition, timestampPrecisionAndOffsetMode)
assertEquals(expectedTimestamp, readTimestamp)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize I said previously that we should use them how they're meant to be used, but I guess I didn't qualify that very well. Let's test the read*Reference() methods in isolation from the refill() method so that it's easier to track down a failure if something goes wrong.

E.g.:

val bytes = timestampReferenceBytes.hexStringToByteArray()
val generator = ByteArrayBytecodeGenerator11(bytes, 0)
val opcode = bytes[0].toInt().and(0xFF)
val readTimestamp = generator.readShortTimestampReference(1, opcode)
assertEquals(expectedTimestamp, readTimestamp)

fun readTimestampToMonth(source: ByteArray, position: Int): Timestamp {
val yearAndMonth = readFixedInt16(source, position).toInt()
val year = yearAndMonth.and(MASK_7)
val month = yearAndMonth.shr(7)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, com.amazon.ion.impl.bin.Ion_1_1_Constants has constants for all of these offsets so that we don't need to be using "magic" numbers here.

@austnwil austnwil marked this pull request as draft October 27, 2025 16:40
@austnwil
Copy link
Contributor Author

Thanks for the thorough review! I added some changes; I believe I got everything you mentioned. Here are the new updates:

  • Removed the superfluous emitRefill and emitEndOfInput helpers
  • Stopped emitting I_REFILL in the generator
  • Renamed ShortTimestampDecoder to TimestampDecoder
  • Changed the short timestamp-related logic to store and read the entire opcode in the bytecode rather than just the low nibble
  • Removed the function lookup table in the short timestamp decoder in favor of a when
  • Reverted the refactoring of the opcode handler tests that brought the test cases into OpcodeTestCases
  • Created a higher-level test for the bytecode generator instead of just retesting the handlers
  • Removed some unnecessary helper funcs
  • Applied some other small suggestions and fixes

@austnwil austnwil marked this pull request as ready for review October 30, 2025 00:09
"8A 35 7D CB EA 85 BC 01, 2023-10-15T11:22:33.444+01:15",
"8B 35 7D CB EA 85 8B C8 06, 2023-10-15T11:22:33.444555+01:15",
"8C 35 7D CB EA 85 92 61 7F 1A, 2023-10-15T11:22:33.444555666+01:15",
// TODO: add min/max values, other extremes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, we have a bunch of test cases here that you could add—just need to convert from binary to hex: https://github.com/popematt/ion-java/blob/6f5274ecc5e5812f08884a6d1aa1c4d7546861fe/src/test/java/com/amazon/ion/impl/bin/IonEncoder_1_1Test.java#L337-L390

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added more test cases and they actually caught a bug

austnwil and others added 4 commits October 30, 2025 11:11
Co-authored-by: Matthew Pope <81593196+popematt@users.noreply.github.com>
Co-authored-by: Matthew Pope <81593196+popematt@users.noreply.github.com>
Co-authored-by: Matthew Pope <81593196+popematt@users.noreply.github.com>
@austnwil austnwil merged commit 4ead384 into amazon-ion:ion11 Nov 3, 2025
29 of 36 checks passed
@austnwil austnwil deleted the austnwil/ion11-bytearray-bytecode-generator branch November 3, 2025 18:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants