Skip to content

Add defstruct macro#14

Merged
IGJoshua merged 81 commits intoIGJoshua:developfrom
rutenkolk:master
Mar 3, 2025
Merged

Add defstruct macro#14
IGJoshua merged 81 commits intoIGJoshua:developfrom
rutenkolk:master

Conversation

@rutenkolk
Copy link
Copy Markdown
Contributor

@rutenkolk rutenkolk commented Oct 13, 2024

[Note: I would consider this a draft PR, as I have not yet added tests (which could unearth some bugs and necessitate appropriate fixes).]

Hi, this PR contains the addition of a defstruct macro. It does the following:

  • It adds serialize and deserialize code to a serde "registry" for the new type (details below)
  • It generates a new type that has the specified members (details below)
  • It adds an implmentation for c-layout
  • It adds inline implementations for both deserialize-from and serialize-into
  • It adds an implementation for clojure.pprint/simple-dispatch

serde registry

The "registry" is implemented via the multimethods generate-deserialize and generate-serialize which produce code to de/serialize the respective types. This removes indirection in the de/serialize code for types that use other types. i think in the original discussion we were on the same page, but thought the other meant something different. The defstruct macro adds implementations to the multimethods for the newly generated type.

the generated type

The new type is generated via deftype in the private function generate-struct-record. This is an attempt to strike a middle ground between the two positions of the original discussion, although the result might be a bit odd:

  • The type implements both IPersistentVector and IPersistentMap.
    • The basic idea is: if it is treated like a vector, it behaves like a vector. if it is treated like a map, it behaves like a map.
  • It therefore implements both vector-like methods like nth as well as map-like methods like without (for e.g. dissoc).
  • If there is a an overlap in map/vector interface such as with assoc, it supports both paradigms of indices-as-keys and membernames-as-keys. Practically speaking, if you use something like assoc with a number as a key, it behaves like a vector (and will return a vector), otherwise like a map (and will return a map).
  • one notable exception here is foreach which can't support both paradigms, and it is therefore implemented as if it's a vector. The rationale here is that the value of the type is composed of the actual values of the members, not the associated names of the places of the values. If you map or reduce over an object of this type, you will do so over the values of the members.

with-c-layout

There was one implementation problem. Since padding was needed to be taken into account to allow for inline serdes, the new code for the macro needed to rely on with-c-layout. The problem here is that with-c-layout is in the layout namespace which already depends on mem. As a stopgap solution i simply copied the function over as a private function. I would be in favor of actually deprecating the layout namespace. for backwards compatibility the with-c-layout function in layout could depend on the one in mem. Not only is the layout namespace at this point somewhat anemic, it has also caused me trouble. I'm not sure if it's a bug, but due to with-c-layout being in layout, i ran into the problem that there are now two different :padding keywords, which i found confusing.

tests & benchmarks

No tests or benchmarks exist right now. I don't expect the custom type to be slower than defrecord, but I want to test it.
Similarly, i do want to add a first set of tests for the de/serialization.

@rutenkolk rutenkolk marked this pull request as draft October 13, 2024 21:19
@IGJoshua
Copy link
Copy Markdown
Owner

I absolutely love this. You've done a fantastic job! I look forward to seeing the tests that you add for this, and I'm also thinking ahead to reorganizing some of the existing serde code to use the new generate-serialize and generate-deserialize to add some inline arities to serialize and deserialize when the type argument is a constant. Don't worry about any of that in this PR, it's just something I want to use this for in the future.

@IGJoshua
Copy link
Copy Markdown
Owner

IGJoshua commented Oct 14, 2024

So I like the way you've chosen to introduce the type registry. It integrates well with the existing tools, and provides a way to do inline arities for serialize and deserialize in the future. One hesitation I have at the moment looking over the code is the use of ::mem/array to refer to actual java arrays.

Arrays

Currently in coffi the ::mem/array type serializes anything seqable, which does include arrays, and it deserializes to a vec.
I think that to support the direction you're going here, we should add optional kwargs as options in the array type, with a :raw? true option meaning that it will deserialize to a JVM array and will assume that the argument is an array, and then add some conditionals in to ensure that we have the fast path for array serialization.

I also think that your compromise around using a record-like type with both map and array style accesses is appropriate and well-done. I might personally want to go the other direction with the foreach implementation though, making it act as if it's reducing over a sequence of map entries. Doing it this way allows adding a quick map val into the stack without too much performance overhead, and it avoids the need to figure out a zipmap with the keys and values separately. I don't have too strong an opinion on this one though as long as keys returns the keys in the same order as foreach yields the values.

serde registry

The serde registry as it stands with generate-serialize and generate-deserialize both look pretty good in terms of usage and follow about what I want them to do, but I want to note two things about them that I'm not sure how I feel about right now.

The first is just an observation and not a problem, that being that these functions all generate the equivalent of a serialize-into or deserialize-from call, which I think is appropriate, I'm just thinking about what this might mean in terms of naming though if the generate-x functions are going to become a part of the public api of coffi.

The second is that these macros as they stand are unhegenic macro helpers. I think it would be appropriate for the multimethod to take in the symbol which will be used to refer to the segment.

with-c-layout

For the with-c-layout problem, I think there's a couple things to be done. To start with, we can make the private version in coffi.mem use :coffi.layout/padding explicitly which doesn't require the namespace be loaded, which reduces it to just one padding key. Then for the rest, I'm a little undecided about it.

All the structs being passed over the C abi will most certainly use the with-c-layout layout, however the intention behind having the namespace in the first place was to allow easily serializing clojure maps into e.g. std140 or std430 from the GLSL spec, I just haven't gotten around to implementing those yet as I was wanting to get a defstruct macro and some codegen for an opengl bindgen library first.

Specifically though, if we remove the coffi.layout namespace and just assume everything is the c-layout, that will then mean there's no way for a user of the library to reach lower in the abstractions to implement a different layout for their usecase except to re-implement defstruct for their own layout.

@IGJoshua IGJoshua mentioned this pull request Oct 14, 2024
Copy link
Copy Markdown
Owner

@IGJoshua IGJoshua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've got a few specific things in the review I'd like resolved or to discuss, and attached here I've got a few patches that I'd like if they were applied to the PR.

0004-Don-t-use-underscore-on-used-args.patch.txt
0003-Remove-duplicate-c-layout-implementation.patch.txt
0002-Fix-warning-about-defstruct-redefinition.patch.txt
0001-Use-a-once-only-impl-rather-than-with-typehints.patch.txt

Comment thread src/clj/coffi/mem.clj Outdated
Comment thread src/clj/coffi/mem.clj Outdated
Comment thread src/clj/coffi/mem.clj Outdated
Comment thread src/clj/coffi/mem.clj
Comment thread src/clj/coffi/mem.clj Outdated
Comment thread src/clj/coffi/mem.clj Outdated
Comment thread src/clj/coffi/mem.clj Outdated
Comment thread src/clj/coffi/mem.clj Outdated
IGJoshua and others added 11 commits January 2, 2025 23:32
Signed-off-by: Kristin Rutenkolk <kristin.rutenkolk@hhu.de>
Signed-off-by: Kristin Rutenkolk <kristin.rutenkolk@hhu.de>
Signed-off-by: Kristin Rutenkolk <kristin.rutenkolk@hhu.de>
Signed-off-by: Kristin Rutenkolk <kristin.rutenkolk@hhu.de>
…or strings

Co-authored-by: Joshua Suskalo <joshua@suskalo.org>
Co-authored-by: Joshua Suskalo <joshua@suskalo.org>
@rutenkolk rutenkolk changed the base branch from master to develop January 4, 2025 19:06
@rutenkolk
Copy link
Copy Markdown
Contributor Author

I think work on this PR is nearing completion. As for the performance, here are some benchmark results:

Serializing a struct with n amount of ::mem/int members:

linear scale:
grafik

logarithmic scale:
grafik

Deserializing a struct with n amount of ::mem/int members:

linear scale:
grafik

logarithmic scale:
grafik

Serializing a struct with one ::mem/array of ::mem/ints of fixed size n

all individual ways to serialize, logarithmic scale:
grafik

comparison defstruct with raw arrays and vectors for arrays vs. defalias, logarithmic scale:
grafik

Deserializing a struct with one ::mem/array of ::mem/ints of fixed size n

all individual ways to deserialize, logarithmic scale:
grafik

comparison defstruct with raw arrays and vectors for arrays vs. defalias, logarithmic scale:
grafik

In all cases, a significant speedup has been achieved, often more than an order of magnitude, in special cases more than two. For native arrays, raw java arrays outperform vectors quite heavily. there may be room for improving this specific codepath, but it is still a noticeable improvement. For some cases, like raw arrays, the performance is in the realm of only a few nanoseconds and one of the biggest factors actually becomes the initial dispatch of the multimethod, so the actual time the de/serialization takes is probably hard to improve significantly further without changing aspects of how coffi itself operates.

@rutenkolk
Copy link
Copy Markdown
Contributor Author

to document a last touch:

i moved with-c-layout to coffi.layout again and load-fileed it in coffi.mem right above the definition of defstruct (the latest possible time, so that it hopefully will cause as little problems as possible, should coffi.layout be developed further).

defstruct can still be called from coffi.mem and even after removing the dependency on coffi.layout in mem_test.clj no test fails, so i think this worked out just fine!

@rutenkolk
Copy link
Copy Markdown
Contributor Author

i actually just pushed another fix here: the introduction of load-file in mem.clj caused the library to not find layout.clj when used as a dependency in other projects. the correct function to use is load. this was addressed in #f307d1e

@rutenkolk
Copy link
Copy Markdown
Contributor Author

I retracted the commit implementing boolean support from this PR and moved it to a different (draft) PR as requested by @IGJoshua

@IGJoshua IGJoshua merged commit 00faaaa into IGJoshua:develop Mar 3, 2025
@rutenkolk
Copy link
Copy Markdown
Contributor Author

🥳

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants