Replies: 2 comments 9 replies
-
Advantages of ART over Hash IndexesRange query support: ART can run range queries, though not as fast as B+-Trees since it must perform tree traversals DuckDB, whereas hash tables cannot efficiently support range queries at all. This is crucial for DuckDB's analytical workload. Sorted order: ART maintains data in sorted order, which allows efficient range queries Database of Databases, while hash tables are inherently unordered. Cache locality with skewed workloads: When the access pattern is skewed, ART takes advantage of cache effects, whereas hash tables have no data locality when there is locality in the query set. With significant skew, ART performs nearly 50% faster than a hash table Database of Databases. Single structure for multiple needs: Using one structure for both constraint checks and range queries instead of two is more code efficient and maintainable DuckDB. Advantages of Hash Indexes over ARTBetter random access performance: For uniformly random lookups on sparse key sets, hash tables generally outperform ART due to their O(1) average-case complexity. Simpler implementation: Hash tables are conceptually simpler and require less complex memory management. Performance ConsiderationsART's lookup performance is comparable to that of hash tables, and in some cases can even beat them, particularly when keys are dense or access patterns are skewed. However, ART's L3 cache misses rapidly exceed those of hash tables as the index size increases DuckDB, because ART decomposes a key into bytes and traverses the tree, potentially incurring more cache misses. For DuckDB's analytical use case, the ability to support range queries and maintain sorted order makes ART the better choice despite hash tables potentially having an edge in pure point-lookup scenarios. |
Beta Was this translation helpful? Give feedback.
-
|
I think in terms of primary key indices, the hash index that is already present is without alternative; just for the reason that nodes must have unique identity, and that must be enforced, or you'll get problems with the rel tables. That said, ART indices make sense to me for secondary index support of a kind (DuckDB just labels these as primary index-like to my understanding, because in an analytical relational DB that expects to run lots of aggregations on tables, ART indexing best support point access patterns you'd use for something like a PK index) and would possibly enable some neat filtering patterns (such as efficiently finding a collection of nodes that match a certain range or property equality) that the rest of the database would profit from - assuming the query planner can find them. However in the direct terms, I suspect min-max statistics would have the most direct pay-off by letting the scan skip chunks that are irrelevant (and does not ask the query planner to identify the opportunity and optimize) and hence I put my vote behind that for now. Side remark: having a DuckDB-like storage structure for node tables (but with the hash index enforcing PK uniqueness) could be interesting insofar as that it might support having a common embedded storage layer that can still support aggregation and other more ""classical"" relational workloads, given Ladybug's inherited data model already being coached in very relational-like term. That said, I am worried about compatibility implications with other forks. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
The Node Table storage organization in LadybugDB is not very different from DuckDB and other columnar databases. The core concepts are the same (NodeGroup == RowGroup in DuckDB). There may be minor implementation differences.
But the primary key indexing strategy is very different. LadybugDB uses a hash index and DuckDB uses Adaptive Radix Tree (ART) index. Both of them use zonemaps. But DuckDB records min-max and handles range queries better. LadybugDB uses them only to record/skip NULLs.
LadybugDB is also very space inefficient with respect to index implementation, although in-theory a hash index should be more compact. Details in #97
Below is a poll on what we should do. Please vote.
8 votes ·
Beta Was this translation helpful? Give feedback.
All reactions