Hi, I'm not super familiar with hvm's architecture and particular design decisions, but it seems like right now you are doing some approximation of divergence tracking.
But for example if I have a shared list and one consumer only takes head while the other takes sum of the list, the current generic DUP logic creates a cascade that copies the whole list structure (or at least allocates the DUP nodes), even though the first consumer stops reading after index k (0 in the case of taking only head).
I think we can walk the graph and identify the divergence point - you can then emit C code that keeps the prefix shared and only performs the split/copy at that specific depth, so it reduces total operations from O(n) to ~O(k) for deduplication overhead. This is especially relevant for the CON-DUP case.
It translates to a pretty intuitive idea: how long two consumers of the same shared value can keep sharing before they actually need to split? This seems strictly closer to the true goal than only using structural propagation + labels + incremental deduplication
Concretely it will reduce unnecessary duplications/rewrites/heap allocations, especially if its done at comptime, then it'll literally shrink the amount of nodes in the first place. Also should improve cache behavior, although this might heavily depend on your overall architecture.
The catch is we'd need an additional data structure to keep track of this. Originally I thought I could use a combined representation: graph + sparse tree. Graph has the compact representation that we actually use, but forces us to do these checks in the first place. Tree makes those checks trivial (since children can't be connected), but can grow exponentially (2^n), so we can't just unfold everything. But it seems like there are practical bottlenecks so you'd probably have to do most of these checks at the level of compiler sitting on top of HVM.
But even if nothing changes in HVM, I think this analysis can still be done at comptime in HVM specifically, its strictly better in precision, ofc only for things that can be predicted at comptime.
if you had some sparse side metadata for ambiguous shared producers, e.g. a side table keyed by shared dup/producer locations or annotations emitted by compiler/AOT, then the runtime could potentially track a little more of this dynamic information without changing the main graph representation. This could be really beneficial for more "chaotic" cases. But it presumes extra lookups in the hot loop, unless maybe you can encode it implicitly in the structure or still fit the lookup in one instruction/cache line.
Maybe you've already thought of that and decided to discard it? But yeah i think a language's compiler is going to have better access to the kind of info we need.
On a more interesting note: it sounds similar to partial evaulation/copy on write/longest common prefix, etc
I think polynomial trees (Spivak) are pretty natural at representing these types of operations, so maybe there's some insight to be extracted
Hi, I'm not super familiar with hvm's architecture and particular design decisions, but it seems like right now you are doing some approximation of divergence tracking.
But for example if I have a shared list and one consumer only takes
headwhile the other takessumof the list, the current generic DUP logic creates a cascade that copies the whole list structure (or at least allocates the DUP nodes), even though the first consumer stops reading after indexk(0 in the case of taking onlyhead).I think we can walk the graph and identify the divergence point - you can then emit C code that keeps the prefix shared and only performs the split/copy at that specific depth, so it reduces total operations from O(n) to ~O(k) for deduplication overhead. This is especially relevant for the CON-DUP case.
It translates to a pretty intuitive idea: how long two consumers of the same shared value can keep sharing before they actually need to split? This seems strictly closer to the true goal than only using structural propagation + labels + incremental deduplication
Concretely it will reduce unnecessary duplications/rewrites/heap allocations, especially if its done at comptime, then it'll literally shrink the amount of nodes in the first place. Also should improve cache behavior, although this might heavily depend on your overall architecture.
The catch is we'd need an additional data structure to keep track of this. Originally I thought I could use a combined representation: graph + sparse tree. Graph has the compact representation that we actually use, but forces us to do these checks in the first place. Tree makes those checks trivial (since children can't be connected), but can grow exponentially (2^n), so we can't just unfold everything. But it seems like there are practical bottlenecks so you'd probably have to do most of these checks at the level of compiler sitting on top of HVM.
But even if nothing changes in HVM, I think this analysis can still be done at comptime in HVM specifically, its strictly better in precision, ofc only for things that can be predicted at comptime.
if you had some sparse side metadata for ambiguous shared producers, e.g. a side table keyed by shared dup/producer locations or annotations emitted by compiler/AOT, then the runtime could potentially track a little more of this dynamic information without changing the main graph representation. This could be really beneficial for more "chaotic" cases. But it presumes extra lookups in the hot loop, unless maybe you can encode it implicitly in the structure or still fit the lookup in one instruction/cache line.
Maybe you've already thought of that and decided to discard it? But yeah i think a language's compiler is going to have better access to the kind of info we need.
On a more interesting note: it sounds similar to partial evaulation/copy on write/longest common prefix, etc
I think polynomial trees (Spivak) are pretty natural at representing these types of operations, so maybe there's some insight to be extracted