-
Notifications
You must be signed in to change notification settings - Fork 148
Description
Description
When trying to create an OnDiskGraphIndex from an OnHeapGraphIndex with more than ~350M entries, the write fails with the following error:
Exception in thread "main" java.lang.RuntimeException: java.lang.IllegalStateException: max capacity reached at size=348966081
at io.github.jbellis.jvector.graph.disk.AbstractGraphIndexWriter.sequentialRenumbering(AbstractGraphIndexWriter.java:157)
...
Caused by: java.lang.IllegalStateException: max capacity reached at size=348966081
at org.agrona.collections.Int2IntHashMap.capacity(Int2IntHashMap.java:1083)
To reproduce
- Create an
ImmutableGraphIndexwith at least 350M entries - Create an
OnDiskGraphIndexWriterfor that graph.- Do not manually specify an Ordinal Mapping
- Use this writer to write the graph to disk
- Observe the error when running the program
Expected behavior
The write should succeed as long as the total number of ordinals in the graph are within the integer limit, i.e. ~2B ordinals.
Analysis
When an OnDiskGraphIndex writer is created is created without specifying an ordinal mapping, one is created automatically using an instance of org.agrona.collections.Int2IntHashMap.
This map has some properties that are relevant to the current error:
- The map is internally backed by an
int[]array. - Both the key and value for each inserted element are stored in adjacent positions in the same
int[]array. - The map attempts to resize the underlying
int[]array to double the capacity each time - The capacity is resized whenever the ratio of capacity used to total capacity exceeds the load factor
- The load factor is 0.65 by default
Maximum size of an int[] array = 2^31 = ~2.15 B
The size required for the next resize operation to fail = ~2.15 B / 2 = ~1.07 B
The maximum occupied spaces = 1B * load factor = ~698 M
Since each element takes two slots, maximum number of elements = ~349 M
This number is very close to the maximum size stated in the error, which is 348,966,081
Workarounds
Pass your own implementation of OrdinalMapper when creating an OnDiskGraphIndexWriter.
More details
JVector version: 4.0.0-rc.7
Full stacktrace
Exception in thread "main" java.lang.RuntimeException: java.lang.IllegalStateException: max capacity reached at size=348966081
at io.github.jbellis.jvector.graph.disk.AbstractGraphIndexWriter.sequentialRenumbering(AbstractGraphIndexWriter.java:157)
at io.github.jbellis.jvector.graph.disk.AbstractGraphIndexWriter$Builder.build(AbstractGraphIndexWriter.java:400)
at local.diskord.App.main(App.java:114)
Caused by: java.lang.IllegalStateException: max capacity reached at size=348966081
at org.agrona.collections.Int2IntHashMap.capacity(Int2IntHashMap.java:1083)
at org.agrona.collections.Int2IntHashMap.rehash(Int2IntHashMap.java:314)
at org.agrona.collections.Int2IntHashMap.increaseCapacity(Int2IntHashMap.java:304)
at org.agrona.collections.Int2IntHashMap.put(Int2IntHashMap.java:252)
at io.github.jbellis.jvector.graph.disk.AbstractGraphIndexWriter.sequentialRenumbering(AbstractGraphIndexWriter.java:152)
... 2 more