Skip to content

Error creating an on-disk graph with more than ~350M entries #621

@ashkrisk

Description

@ashkrisk

Description

When trying to create an OnDiskGraphIndex from an OnHeapGraphIndex with more than ~350M entries, the write fails with the following error:

Exception in thread "main" java.lang.RuntimeException: java.lang.IllegalStateException: max capacity reached at size=348966081
        at io.github.jbellis.jvector.graph.disk.AbstractGraphIndexWriter.sequentialRenumbering(AbstractGraphIndexWriter.java:157)
...
Caused by: java.lang.IllegalStateException: max capacity reached at size=348966081
        at org.agrona.collections.Int2IntHashMap.capacity(Int2IntHashMap.java:1083)

To reproduce

  1. Create an ImmutableGraphIndex with at least 350M entries
  2. Create an OnDiskGraphIndexWriter for that graph.
    • Do not manually specify an Ordinal Mapping
  3. Use this writer to write the graph to disk
  4. Observe the error when running the program

Expected behavior

The write should succeed as long as the total number of ordinals in the graph are within the integer limit, i.e. ~2B ordinals.

Analysis

When an OnDiskGraphIndex writer is created is created without specifying an ordinal mapping, one is created automatically using an instance of org.agrona.collections.Int2IntHashMap.

This map has some properties that are relevant to the current error:

  • The map is internally backed by an int[] array.
  • Both the key and value for each inserted element are stored in adjacent positions in the same int[] array.
  • The map attempts to resize the underlying int[] array to double the capacity each time
  • The capacity is resized whenever the ratio of capacity used to total capacity exceeds the load factor
  • The load factor is 0.65 by default

Maximum size of an int[] array = 2^31 = ~2.15 B
The size required for the next resize operation to fail = ~2.15 B / 2 = ~1.07 B
The maximum occupied spaces = 1B * load factor = ~698 M
Since each element takes two slots, maximum number of elements = ~349 M

This number is very close to the maximum size stated in the error, which is 348,966,081

Workarounds

Pass your own implementation of OrdinalMapper when creating an OnDiskGraphIndexWriter.

More details

JVector version: 4.0.0-rc.7

Full stacktrace

Exception in thread "main" java.lang.RuntimeException: java.lang.IllegalStateException: max capacity reached at size=348966081
        at io.github.jbellis.jvector.graph.disk.AbstractGraphIndexWriter.sequentialRenumbering(AbstractGraphIndexWriter.java:157)
        at io.github.jbellis.jvector.graph.disk.AbstractGraphIndexWriter$Builder.build(AbstractGraphIndexWriter.java:400)
        at local.diskord.App.main(App.java:114)
Caused by: java.lang.IllegalStateException: max capacity reached at size=348966081
        at org.agrona.collections.Int2IntHashMap.capacity(Int2IntHashMap.java:1083)
        at org.agrona.collections.Int2IntHashMap.rehash(Int2IntHashMap.java:314)
        at org.agrona.collections.Int2IntHashMap.increaseCapacity(Int2IntHashMap.java:304)
        at org.agrona.collections.Int2IntHashMap.put(Int2IntHashMap.java:252)
        at io.github.jbellis.jvector.graph.disk.AbstractGraphIndexWriter.sequentialRenumbering(AbstractGraphIndexWriter.java:152)
        ... 2 more

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions