Hashtable code improvements by Explorer09 · Pull Request #1842 · htop-dev/htop

Explorer09 · 2025-12-24T21:24:14Z

No description provided.

Hashtable.c

BenBE · 2025-12-25T00:06:55Z

Hashtable.c

-      if (SIZE_MAX / 2 < this->size)
-         CRT_fatalError("Hashtable: size overflow");
-
+   if (10 * this->items > 7 * this->size)


The only reason this is safe from overflows is that sizeof(HashtableItem) > 7 …
But I'm not sure, this can also necessarily be said about sizeof(HashtableItem) > 10, thus the multiplications here technically should be overflow-checked …

sizeof(HashtableItem) is currently 12 for 32-bit systems and 24 for 64-bit systems. Since htop doesn't support 16-bit systems, I really doubt there would be a case where sizeof(HashtableItem) can be less than 10.

I can add a sizeof(HashtableItem) > 10 assertion just to be safe.

Explorer09 · 2025-12-25T10:03:50Z

Hashtable.c

+      if (sizeof(HashtableItem) < 10 && SIZE_MAX / 10 < this->size)
         CRT_fatalError("Hashtable: size overflow");

      Hashtable_setSize(this, 2 * this->size);


I am tempted to adjust this line so that it becomes Hashtable_setSize(this, this->size + 1);

Since Hashtable.size is supposed to be a prime number close to a power of 2. There can be a case where multiplying the size by 2 skips an order of magnitude on the buffer size allocation. Example: (2^14 - 3) * 2 = 2^15 - 6 > 2^15 - 19, thus (2^15 - 19) might be skipped in the buffer size allocation.

I just doubt if this change is safe.

On first glance this should work. Not tested in detail though. But given that setSize rounds up to the next prime anyway this should likely work.

Explorer09 · 2025-12-25T17:08:27Z

Hashtable.c

Not sure if I can ask @cgzones a question:

Is it allowed for the assertion of this line be changed to assert(this->size >= this->items);?

In other words, what would happen if this->items == this->size?

There is a commit, b45eaf2, that changed the minimum size to 7, but the reason stated in that commit didn't fully make sense to me. While between 2 and 3, the grow factor ((3 - 2)/2 = 50%) is indeed less than 70%, but between 3 and 7, there'd be no problem with the grow factor ((7 - 3)/3 = 133%). The cause of the assertion error was more of an off-by-one from the conditional 10 * this->items > 7 * this->size. It should be 10 * (this->items + 1) > 7 * this->size instead if we have to satisfy the assertion this->size > this->items.

I am reluctant to add a +1 to the this->items conditional above, as I guess that the whole Hashtable structure should work fine if we allow this->items == this->size.

Given size is the number of allocted entries and items the number of actually used entries, the >= should be fine.

The reason for avoiding hash tables below 7 is efficiency: It doesn't make sense to allocate smaller blocks as most of our hash tables are far larger anyway.

@BenBE I know efficiency is a good reason, but I just want to write a good technical reason in the code comments. Especially that some numbers in the primeDiffs array (in my commit) will be intentionally unused.

Another (technical) reason is memory fragmentation. Allocating small blocks tends to fragment memory far more than using larger blocks (which for small collections might not even need resizing).

BenBE · 2025-12-26T20:06:37Z

How's the last commit related?

Explorer09 · 2025-12-26T21:11:43Z

How's the last commit related?

My improvement on the Hashtable code is to hope that the ht_key_t type can be upgraded from unsigned int to size_t, otherwise it would make no sense to support a Hashtable size of more than 2^32 entries.

The last commit might look distracting to the code improvement commits. I apologize. If the last commit would need more review, I'm happy to move it to a separate pull request.

BenBE · 2025-12-26T22:34:51Z

NP with the commit itself. Just wondered if they are complete. Also, do the current set of changes in the first 3 commits do work without the last one.

Explorer09 · 2025-12-27T00:55:33Z

NP with the commit itself. Just wondered if they are complete. Also, do the current set of changes in the first 3 commits do work without the last one.

I think the last commit needs some cleanup or discussion, but the first 3 commits are ready and can be cherry-picked to main early.

The last commit depends on the first 3 commits but the first 3 can work without the last.

BenBE · 2025-12-27T11:02:39Z

Can you split off the last commit into its own PR? TIA.

BenBE · 2025-12-27T22:51:08Z

Hashtable.c

+   if (this->items >= this->size * 7 / 10)
+      Hashtable_setSize(this, this->size + 1);


AFAICS the Hashtable_setSize should be called in either path to allocate new entries as needed.

What do you mean? This part of code addresses expanding the buffer. The buckets buffer is allocated stating at Hashtable_new.

Yes, but once the items get near the size because it can't allocate any more buffer space, you could at some point reach items == size, and thus the next insert will fail due to no more space allocated.

Instead when nearing the maximum capacity we should fall of to a more linear allocation regime …

@BenBE When it "can't allocate any more buffer space" htop will exit, because of the xCalloc call.

The case where items == size can only happen on small sizes such as 2 or 3 (the minimum size is 7 now, so the sizes of 2 and 3 are theoretical situations), but even when that happens, the next Hashtable_put call will always grow the buffer. Thus there's no problem here.

I was more thinking for very large allocations. Will have to take a closer look after New Year's …

Any harm in making the call to Hashtable_setSize unconditional? The above bounds check should be part of that function already.

I was thinking how that would affect the shrinking of the buffer, with respect to this code:

https://github.com/Explorer09/htop-1/blob/3dc65f62da56befae7ed7dcb6e66ca5bea856710/Hashtable.c#L292

I personally like the idea, by centralizing the conditionals that readjust the buffer size, we can save some sanity checks in the Hashtable_setSize function.

Update: It seems that there's a side effect if I try to merge the conditionals of expanding and shrinking the buffer in Hashtable_setSize, thus I have to give up on the idea.

When creating a Hashtable through Hashtable_new, it is allowed to specify a larger size for initial allocation. During the initial population of the items, this avoids unnecessary expansion or relocation of the buffer. If I move the shrinking condition to Hashtable_setSize, then the buffer will shrink automatically when adding an element to it. This would remove the benefits of initialing a Hashtable with larger size.

I think we arrived at this issue before … IIRC.

Maybe inhibit shrinking while we try to insert items …

Maybe inhibit shrinking while we try to insert items …

Inhibit shrinking means a flag argument in Hashtable_setSize. It seems like we cannot have less than 2 arguments for the setSize function. If we cannot reduce the number of arguments for it, then I'd like to keep the current function prototype, and use the size argument to determine whether the buffer should grow or shrink.

BenBE · 2026-02-20T18:40:33Z

@Explorer09 Any reason against pre-computing the table at compile-time?

#include <stdint.h>

#define B(i,a) (((uint64_t)1ull << (i)) - (a)),

#define A1(I, a0) \
    B((I)+0, a0)
#define A2(I, a0,a1) \
    A1((I)+0, a0) \
    A1((I)+1, a1)
#define A4(I, a0,a1,a2,a3) \
    A2((I)+0, a0,a1) \
    A2((I)+2, a2,a3)
#define A8(I, a0,a1,a2,a3,a4,a5,a6,a7) \
    A4((I)+0, a0,a1,a2,a3) \
    A4((I)+4, a4,a5,a6,a7)
#define A16(I, a0,a1,a2,a3,a4,a5,a6,a7,a8,a9,a10,a11,a12,a13,a14,a15) \
    A8((I)+0, a0,a1,a2,a3,a4,a5,a6,a7) \
    A8((I)+8, a8,a9,a10,a11,a12,a13,a14,a15)

static const uint64_t primeDiffs[] = {
    A16( 0, 0,0,1,1,3,1,3,1,5,3,3,9,3,1,3,19)
#if SIZE_MAX > UINT16_MAX
    A16(16, 15,1,5,1,3,9,3,15,3,39,5,39,57,3,35,1)
# if SIZE_MAX > UINT32_MAX
    A16(32, 5,9,41,31,5,25,45,7,87,21,11,57,17,55,21,115)
    A16(48, 59,81,27,129,47,111,33,55,5,13,27,55,93,1,57,25)
# endif
#endif
};

#undef A16
#undef A8
#undef A4
#undef A2
#undef A1
#undef B

Saves you from calculating things at runtime on every access to that table and the memory size isn't of much concern here.

FWIW, the code path isn't hot enough to gamble on keeping a single dcache line for the table contents.

Explorer09 · 2026-02-20T22:59:20Z

Explorer09 Any reason against pre-computing the table at compile-time?

Why pre-compute then? It was because the code path isn't a hot one that I proposed reducing the table size in memory. It's quite trivial to compute ((1ull << n) - A[n]). You don't reallocate the memory of Hashtables often.

The shrunk lookup table size could be just 64 bytes. If I expand and pre-compute the way you do, then it would become 512 bytes (64 * sizeof(uint64_t)) it's 8 times difference.

BenBE · 2026-02-20T23:08:46Z

Multiple reasons:

Reducing runtime code complexity (basically a follow-up to PR Allow for optimizing out bound check on 64 bit systems #1909)
No necessity to safe every last byte; in particular with static (constant) program memory.
Single-byte access is somewhat less efficient here (additional movzx compared to plain memory read)

Explorer09 · 2026-02-22T09:48:06Z

Multiple reasons:

Reducing runtime code complexity (basically a follow-up to PR Allow for optimizing out bound check on 64 bit systems #1909)

No necessity to safe every last byte; in particular with static (constant) program memory.

Single-byte access is somewhat less efficient here (additional movzx compared to plain memory read)

I don't like the reasoning, although I think such debate is going to be like arguing the color of a bikeshed, and thus I'm not going to change the decisions of the maintainers. You have the choice. I've rebased this PR so you can leave out the lookup table shrink.

Anyway, here's my motivation: My vision of htop is that it should be a compact tool for process monitoring even though it comes with a fancy, terminal UI. That means if there's any chance to reduce memory footprint for the htop program itself, I would wish it could be done (unless there's a larger performance tradeoff with reducing the code size). The memory is better left for large, server applications that need them. In my opinion, it's better for htop to be slightly slower (i.e. take slightly more CPU time) for certain tasks if doing them faster would consume memory that could be essential for server apps.

Signed-off-by: Kang-Che Sung <explorer09@gmail.com>

* Move assertions about hash table sizes to Hashtable_isConsistent() so they can be checked in all Hashtable methods. * Slightly improve conditionals of growing and shrinking the "buckets" buffer. Specifically the calculations are now less prone to arithmetic overflow and can work with Hashtable.size value up to (SIZE_MAX / 7). (Original limit was (SIZE_MAX / 10)). * If `Hashtable.size > SIZE_MAX / sizeof(HashtableItem)`, allow the compiler to optimize out one conditional of checking overflow. (The buffer allocation would still fail at xCalloc() in that case.) * Hashtable_setSize() is now a private method. Signed-off-by: Kang-Che Sung <explorer09@gmail.com>

The lookup table now codes the difference between 2^n to the nearest prime not greater than 2^n (i.e. https://oeis.org/A013603 ). With the change of the lookup table, (2^64 - 59) has been removed. It is believed that such removal won't cause practical problems as the number is very close to SIZE_MAX and a system is unlikely to succeed in allocating a memory block _that_ huge. Signed-off-by: Kang-Che Sung <explorer09@gmail.com>

Explorer09 changed the title ~~Hashtable code shrink~~ Hashtable code improvements Dec 24, 2025

BenBE added enhancement Extension or improvement to existing feature code quality ♻️ Code quality enhancement labels Dec 24, 2025

BenBE requested changes Dec 25, 2025

View reviewed changes

Explorer09 force-pushed the hashtable-primes branch 3 times, most recently from fabbd8e to 3870132 Compare December 25, 2025 09:54

Explorer09 commented Dec 25, 2025

View reviewed changes

Explorer09 force-pushed the hashtable-primes branch from 3870132 to daf0292 Compare December 26, 2025 19:17

Explorer09 force-pushed the hashtable-primes branch 5 times, most recently from be3e2f7 to 493c3cb Compare December 26, 2025 21:05

Explorer09 force-pushed the hashtable-primes branch from 493c3cb to daf0292 Compare December 27, 2025 16:36

BenBE reviewed Dec 27, 2025

View reviewed changes

Explorer09 force-pushed the hashtable-primes branch 2 times, most recently from 7cf3e2a to 3dc65f6 Compare January 1, 2026 18:57

Explorer09 force-pushed the hashtable-primes branch 5 times, most recently from 80c8562 to ed56798 Compare January 12, 2026 04:16

Explorer09 force-pushed the hashtable-primes branch from ed56798 to 9d6d20a Compare January 28, 2026 06:29

Explorer09 force-pushed the hashtable-primes branch from 9d6d20a to 8f05dff Compare January 30, 2026 08:43

Explorer09 force-pushed the hashtable-primes branch 3 times, most recently from 96ce5c6 to 9f35f81 Compare February 20, 2026 13:25

Explorer09 mentioned this pull request Feb 20, 2026

Allow for optimizing out bound check on 64 bit systems #1909

Merged

Explorer09 force-pushed the hashtable-primes branch from 9f35f81 to 88db60c Compare February 22, 2026 09:24

Explorer09 force-pushed the hashtable-primes branch 2 times, most recently from 4e01e6e to e201bcb Compare February 28, 2026 18:20

Explorer09 force-pushed the hashtable-primes branch from e201bcb to 158bed2 Compare March 9, 2026 17:33

Explorer09 force-pushed the hashtable-primes branch from 158bed2 to 84d4b7b Compare March 18, 2026 11:13

Explorer09 force-pushed the hashtable-primes branch 2 times, most recently from 8b2ff8a to c322974 Compare April 1, 2026 16:28

Explorer09 force-pushed the hashtable-primes branch 2 times, most recently from 39545e1 to d03c955 Compare April 7, 2026 16:36

Explorer09 added 3 commits April 10, 2026 13:45

Hashtable: Extend OEISprimes[] to up to (2^64 - 59)

6e47d1c

Signed-off-by: Kang-Che Sung <explorer09@gmail.com>

Explorer09 force-pushed the hashtable-primes branch from d03c955 to b46d41a Compare April 10, 2026 05:45

		if (this->items >= this->size * 7 / 10)
		Hashtable_setSize(this, this->size + 1);

Uh oh!

Conversation

Explorer09 commented Dec 24, 2025

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

BenBE commented Dec 26, 2025

Uh oh!

Explorer09 commented Dec 26, 2025

Uh oh!

BenBE commented Dec 26, 2025

Uh oh!

Explorer09 commented Dec 27, 2025

Uh oh!

BenBE commented Dec 27, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Explorer09 Jan 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Explorer09 Jan 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

BenBE Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

BenBE commented Feb 20, 2026

Uh oh!

Explorer09 commented Feb 20, 2026

Uh oh!

BenBE commented Feb 20, 2026

Uh oh!

Explorer09 commented Feb 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Explorer09 Jan 10, 2026 •

edited

Loading

Explorer09 Jan 11, 2026 •

edited

Loading

BenBE Jan 12, 2026 •

edited

Loading