You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Investigate effect of making SortedSet balance less frequently (or even moving to RB trees rather than AVL trees)
Even when there is no need to rebalance, SortedSet sometimes copies more nodes than is necessary. Fix this.
SortedSet#delete_at traverses the tree twice. Add a method to AVLNode so it just needs to traverse the tree once.
When a SortedSet is created with a 1-param block (to derive sort keys), should the sort keys for each node be cached? (In some cases, the computation to derive the sort key might be expensive.)
In Set#union, check if other is a Hamster::Set which is larger than self. If so, flip around self and other. (It will be cheaper to build up the result using the larger trie.)
In Set#intersection, if other is significantly smaller than self, but is not a Hamster::Set, then create an empty Trie, iterate through other, and add all the elements which are also in self to the new Trie.
In Set#subset? and Set#proper_subset?, if other is not a ::Set or Hamster::Set (i.e. #include? may be slow), and if other is large, it may be faster to build a temporary ::Set out of other first, and use that for testing inclusion. (Don't bother using a Hamster::Set for this purpose, it will be slower.) Another option would be to iterate through other, count the number of elements which are in self, and see if the number is equal to self.size.
Likewise in Set#disjoint?: if self.size < other.size but other is not a ::Set or Hamster::Set, testing other.include?(item) may be slow (and may lead to O(n^2) run time). In this case, it may be better to build a temporary ::Set for testing inclusion, or just use the 2nd branch so the inclusion test is done on self, not other.
In Trie, add methods for bulk insertion and deletion (which copy less nodes than repeated adding/deletion of single elements). Use these for operations like Set#union and so on.
Investigate the effect on Vector of using an implementation strategy more like the Persistent::Vector library. That is, all Vector tries would always be composed of full, 32-item nodes. Then, each Vector would have a "tail" array of 0-31 additional items at the end. Vector#push would only have to modify the trie once for every 32 pushes; the rest of the time, it would just be copying the "tail" array. Likewise for Vector#pop. If this proves worthwhile, consider having both "head" and "tail" arrays for odd numbers of items at the front and back. This could also make Vector#shift and #unshift fast. Retrieving an item from a Vector by index would require checking whether it falls within the "head", the "tail", or the vector trie itself.
If giving Vectors a "head" and "tail" proves to be a worthwhile and successful strategy, compare its performance as a stack/queue with Deque. If Vector can be made almost as efficient as Deque for that purpose, drop Deque and allow Vector to replace it.
Otherwise, if keeping Deque is necessary, try to mitigate the nasty worst-case scenario which happens when a large Deque is built up, and then items are alternately popped from the front and back. (It will keep reversing the list, and reversing it back again.) If, say, we are reversing @rear and moving the items to @front, I would suggest: copy the first 1/3 of the items and keep them on @rear. Reverse the remaining 2/3 and put them on @front.
There is another, less deadly worst-case scenario when retrieving the #first or #last item from a Deque with all the items in a long @front or @rear. Perhaps each Deque can keep the last item in @front and @rear in instance variables, so #first and #last can always be returned quickly.
SortedSetbalance less frequently (or even moving to RB trees rather than AVL trees)SortedSetsometimes copies more nodes than is necessary. Fix this.SortedSet#delete_attraverses the tree twice. Add a method toAVLNodeso it just needs to traverse the tree once.SortedSetis created with a 1-param block (to derive sort keys), should the sort keys for each node be cached? (In some cases, the computation to derive the sort key might be expensive.)Set#union, check ifotheris aHamster::Setwhich is larger thanself. If so, flip aroundselfandother. (It will be cheaper to build up the result using the larger trie.)Set#intersection, ifotheris significantly smaller thanself, but is not aHamster::Set, then create an emptyTrie, iterate throughother, and add all the elements which are also inselfto the newTrie.Set#subset?andSet#proper_subset?, ifotheris not a::SetorHamster::Set(i.e.#include?may be slow), and ifotheris large, it may be faster to build a temporary::Setout ofotherfirst, and use that for testing inclusion. (Don't bother using aHamster::Setfor this purpose, it will be slower.) Another option would be to iterate throughother, count the number of elements which are inself, and see if the number is equal toself.size.Set#disjoint?: ifself.size < other.sizebutotheris not a::SetorHamster::Set, testingother.include?(item)may be slow (and may lead to O(n^2) run time). In this case, it may be better to build a temporary::Setfor testing inclusion, or just use the 2nd branch so the inclusion test is done onself, notother.Trie, add methods for bulk insertion and deletion (which copy less nodes than repeated adding/deletion of single elements). Use these for operations likeSet#unionand so on.Vectorof using an implementation strategy more like thePersistent::Vectorlibrary. That is, allVectortries would always be composed of full, 32-item nodes. Then, eachVectorwould have a "tail" array of 0-31 additional items at the end.Vector#pushwould only have to modify the trie once for every 32 pushes; the rest of the time, it would just be copying the "tail" array. Likewise forVector#pop. If this proves worthwhile, consider having both "head" and "tail" arrays for odd numbers of items at the front and back. This could also makeVector#shiftand#unshiftfast. Retrieving an item from aVectorby index would require checking whether it falls within the "head", the "tail", or the vector trie itself.Vectors a "head" and "tail" proves to be a worthwhile and successful strategy, compare its performance as a stack/queue withDeque. IfVectorcan be made almost as efficient asDequefor that purpose, dropDequeand allowVectorto replace it.Dequeis necessary, try to mitigate the nasty worst-case scenario which happens when a largeDequeis built up, and then items are alternately popped from the front and back. (It will keep reversing the list, and reversing it back again.) If, say, we are reversing@rearand moving the items to@front, I would suggest: copy the first 1/3 of the items and keep them on@rear. Reverse the remaining 2/3 and put them on@front.#firstor#lastitem from aDequewith all the items in a long@frontor@rear. Perhaps eachDequecan keep the last item in@frontand@rearin instance variables, so#firstand#lastcan always be returned quickly.