-
Notifications
You must be signed in to change notification settings - Fork 125
Add starting point for ByteArray-backed bytecode generator #1134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add starting point for ByteArray-backed bytecode generator #1134
Conversation
- Implement readTextReference - Implement readBytesReference - Implement basic refill
- Opcode handler tests now use the common test cases shared by the bytecode generator tests - Added test to bytecode generator that tests long inputs containing all supported opcodes
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## ion11 #1134 +/- ##
========================================
Coverage ? 69.24%
Complexity ? 6249
========================================
Files ? 201
Lines ? 25245
Branches ? 4399
========================================
Hits ? 17482
Misses ? 6418
Partials ? 1345 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| @JvmStatic | ||
| fun emitRefill(destination: BytecodeBuffer) { | ||
| destination.add(Instructions.I_REFILL) | ||
| } | ||
|
|
||
| @JvmStatic | ||
| fun emitEndOfInput(destination: BytecodeBuffer) { | ||
| destination.add(Instructions.I_END_OF_INPUT) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a blocker.
We don't need to have helper functions for everything. In this case, it's worth asking what is this actually helping with? Some of the other helper functions in the class handle things like spreading the bits for a long value or converting from float to the int bits.
If you compare these two, I think the version with BytecodeHelper doesn't add value, and maybe even obscures what's going on.
destination.add(Instructions.I_REFILL)
BytecodeHelper.emitRefill(destination)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point
| } | ||
|
|
||
| if (currentPosition < source.size) { | ||
| BytecodeEmitter.emitRefill(destination) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I attempted to work on writing tests for a different BytecodeGenerator implementation, and I found that it added a lot of noise to have the REFILL instruction added by the BytecodeGenerator. It's also something that's the same regardless of the implementation, and it's a concern of the reader rather than being part of the data, so I think it makes more sense for the BytecodeIonReader to handle adding in the REFILL instruction.
| /** | ||
| * Helper class for decoding the various short timestamp encoding variants from a [ByteArray]. | ||
| */ | ||
| internal object ShortTimestampDecoder { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about just TimestampDecoder? There's not really a need to have separate short and long timestamp decoder classes.
| return Timestamp.forSecond(year + 1970, month, day, hour, minute, secondBigDecimal.add(fractionalSecondBigDecimal), (offset - 56) * 15) | ||
| } | ||
|
|
||
| fun readTimestamp(source: ByteArray, position: Int, precisionAndOffsetMode: Int): Timestamp { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is precisionAndOffsetMode the opcode?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was just the low nibble. Looks like the bytecode reference does call for the entire opcode, so I made that change
| // TODO: calling function references like this might be slower than just using a conditional or other solutions. | ||
| // Might be worth looking into. | ||
| val decoder = opcodeToDecoderFunctionTable[precisionAndOffsetMode] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think it will be. A when on the opcode is probably going to be faster because it will be compiled as a tableswitch instruction in the java bytecode instead of having to load the array, and then load a value from the array (both of which require fetching objects from the heap), and then the JVM potentially still has to perform a vtable lookup to find the correct method implementation for the interface.
The extra indirection here also limits the JVM's ability to perform method inlining. For example, if an application only reads data that uses 0x85 opcode, the JVM could inline that particular method into this method, eliminating the overhead of the function call. However, when you're using method references like this, hinders that ability.
FYI, I think I wrote a note about that here: https://github.com/popematt/ion-java/blob/6f5274ecc5e5812f08884a6d1aa1c4d7546861fe/src/main/java/com/amazon/ion/v8/TimestampHelper.kt#L34-L38
So why are use using an array of function references for the opcode lookup? The main reasons are that (1) the opcode handler table is already going to be hard for the JVM to predict anyway because there's going to be lots of different opcodes in normal Ion data, (2) the when would make the method very long and hard to read, and (3) the compiled method is going to be quite large and unable to be inlined at its own call sites.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great explanation!
| val timestampReferenceBytes = encodedTimestampBytes.hexStringToByteArray() | ||
| val generator = ByteArrayBytecodeGenerator11(timestampReferenceBytes, 0) | ||
| val bytecode = BytecodeBuffer() | ||
| generator.refill(bytecode, ConstantPool(), intArrayOf(), intArrayOf(), arrayOf()) | ||
|
|
||
| val timestampPrecisionAndOffsetMode = Instructions.getData(bytecode.get(0)) | ||
| val timestampPosition = bytecode.get(1) | ||
| val expectedTimestamp = Timestamp.valueOf(expectedTimestampString) | ||
| val readTimestamp = generator.readShortTimestampReference(timestampPosition, timestampPrecisionAndOffsetMode) | ||
| assertEquals(expectedTimestamp, readTimestamp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realize I said previously that we should use them how they're meant to be used, but I guess I didn't qualify that very well. Let's test the read*Reference() methods in isolation from the refill() method so that it's easier to track down a failure if something goes wrong.
E.g.:
val bytes = timestampReferenceBytes.hexStringToByteArray()
val generator = ByteArrayBytecodeGenerator11(bytes, 0)
val opcode = bytes[0].toInt().and(0xFF)
val readTimestamp = generator.readShortTimestampReference(1, opcode)
assertEquals(expectedTimestamp, readTimestamp)
src/test/java/com/amazon/ion/bytecode/bin11/bytearray/PrimitiveDecoderTest.kt
Show resolved
Hide resolved
src/main/java/com/amazon/ion/bytecode/bin11/bytearray/ShortTimestampDecoder.kt
Outdated
Show resolved
Hide resolved
| fun readTimestampToMonth(source: ByteArray, position: Int): Timestamp { | ||
| val yearAndMonth = readFixedInt16(source, position).toInt() | ||
| val year = yearAndMonth.and(MASK_7) | ||
| val month = yearAndMonth.shr(7) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, com.amazon.ion.impl.bin.Ion_1_1_Constants has constants for all of these offsets so that we don't need to be using "magic" numbers here.
src/test/java/com/amazon/ion/bytecode/bin11/bytearray/ReferenceOpcodeHandlerTest.kt
Outdated
Show resolved
Hide resolved
Co-authored-by: Matthew Pope <81593196+popematt@users.noreply.github.com>
|
Thanks for the thorough review! I added some changes; I believe I got everything you mentioned. Here are the new updates:
|
src/main/java/com/amazon/ion/bytecode/bin11/ByteArrayBytecodeGenerator11.kt
Outdated
Show resolved
Hide resolved
src/main/java/com/amazon/ion/bytecode/bin11/ByteArrayBytecodeGenerator11.kt
Outdated
Show resolved
Hide resolved
src/test/java/com/amazon/ion/bytecode/bin11/ByteArrayBytecodeGenerator11Test.kt
Outdated
Show resolved
Hide resolved
| "8A 35 7D CB EA 85 BC 01, 2023-10-15T11:22:33.444+01:15", | ||
| "8B 35 7D CB EA 85 8B C8 06, 2023-10-15T11:22:33.444555+01:15", | ||
| "8C 35 7D CB EA 85 92 61 7F 1A, 2023-10-15T11:22:33.444555666+01:15", | ||
| // TODO: add min/max values, other extremes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, we have a bunch of test cases here that you could add—just need to convert from binary to hex: https://github.com/popematt/ion-java/blob/6f5274ecc5e5812f08884a6d1aa1c4d7546861fe/src/test/java/com/amazon/ion/impl/bin/IonEncoder_1_1Test.java#L337-L390
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added more test cases and they actually caught a bug
Co-authored-by: Matthew Pope <81593196+popematt@users.noreply.github.com>
Co-authored-by: Matthew Pope <81593196+popematt@users.noreply.github.com>
Co-authored-by: Matthew Pope <81593196+popematt@users.noreply.github.com>
Description of changes:
This starts the implementation of the binary 1.1 array-backed bytecode generator, adding a basic refill implementation and support for reading integer, lob, text, and short timestamp references.
Reading decimal and long timestamp references is not yet implemented.
I pulled the test cases for the opcode handlers into shared test cases since they were also useful for testing the generator. A lot of files were changed but most of them are just refactoring the existing opcode handler tests to use the shared test cases.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.