Added discard remaining input option when deserializing#13
Added discard remaining input option when deserializing#13gfusee wants to merge 1 commit intozefchain:mainfrom
Conversation
| #[allow(unsafe_code)] | ||
| pub unsafe fn from_bytes_discarding_remaining_input<'a, T>(bytes: &'a [u8]) -> Result<T> |
There was a problem hiding this comment.
I wouldn't use unsafe for that. I believe what you're asking is a variant where we don't call end(). Or equivalently exposing the Deserializer struct. Historically, @bmwill and I have pushed back on this idea. Not sure if there is a stronger argument now.
There was a problem hiding this comment.
I was inspired by some std's methods that are marked as unsafe even they don't use any unsafe code, such as Vec::set_len
To not call end would indeed achieve what I'm asking. However, end is defined in a trait (BcsDeserializer), which implicitly tells that it will be called anyway at the... end of something? Even the comment states this: The Deserializer::end method should be called after a type has been fully deserialized. It would be unintuitive to not call it in my opinion.
Exposing the Deserializer is a solution, but it would oblige developers to write themself the code to bypass the end function. What do you think about a variant of this where:
- we remove the
endfunction to be a requirement of theBcsDeserializertrait, which has no side effect since it is never called in a generic context (at least I didn't find any generic usage, correct me if I'm wrong) - we rename the
Deserializerstruct toRawDeserializer, and we remove theendmethod - we create a struct called
DefaultDeserializerthat has the end method that ensures if there is no remaining input. It usesRawDeserializerlogic behind the scene - we use
DefaultDeserializerwhereDeserializerwas used - we expose both
DefaultDeserializerandRawDeserializer - no
from_bytes_discarding_remaining_input, nonew_discarding_remaining_inputand nodiscard_remaining_input
This way the developer would not notice any change, but someone wanting to not have the checks would just have to use RawDeserializer directly
Let me know if it looks good for you 🙏
Abstract
When trying to write objects with shared logic, developers might need to define generic objects. This is especially true in programming languages that don’t support interfaces such as Move.
These generic objects are easy to serialize, since the compiler or runtime always knows the exact object type at the time of calling the serialization function. However, this doesn’t apply to deserialization, where we might want to decode the common fields of a generic object without needing to know its generic parameters.
Let’s take an example using the Move language. A developer could declare the following object:
This object is serialized in BCS as <borrowed bytes><fees bytes><T bytes>
Let's assume we want to develop a backend around this that checks what account borrowed the most and the one having the highest pending fees. We only have to retrieve each PledgedLoan for all the accounts and deserialize them into the following Rust struct for each address:
This is impossible to do in the current implementation of this crate because the deserializer checks that all the bytes input has been used, which is not the case because we don't need the locked field.
Safety concerns
Removing these checks directly is not good practice. Indeed, we can imagine fields of a struct having an incorrect
Deserializeimplementation, causing them to consume more or fewer bytes than theirSerializecounterpart. In the best case, this would make deserialization fail, in the worst case, it could succeed with incorrect or undefined behavior.Solution implemented
I added a boolean field,
discard_remaining_input, to theDeserializerstruct. When true, it skips the check in theendmethod. This field is always set to false in all existing methods, so the current behavior of the crate remains unchanged.Next, I added two unsafe functions:
new_discarding_remaining_input: likenew, but setsdiscard_remaining_inputto true.from_bytes_discarding_remaining_input: likefrom_bytes, but usesnew_discarding_remaining_inputinstead of new.I marked them
unsafeto force developers to understand the risks, since misuse can have severe consequences.