Bug: Large bulk ingest remains impractically slow on 0.16.1 for a 10M-node / 77.79M-edge graph

### Ladybug version

v0.16.1

### What operating system are you using?

Ubuntu 22.04

### What happened?

## Summary

Large bulk ingest is still very slow for our workload even when using bulk `COPY`-style loading through the Python API, rather than single-row transactional inserts.

I know there is already related context in:
- #302
- #386

But this report is specifically about large-scale bulk load performance, not per-row or non-batched insert overhead.

## Environment

- LadybugDB version: `0.16.1`
- Python version: `3.12.13`
- Platform: `Linux x86_64`
- Binding: Python
- Workload runner: custom benchmark harness using Ladybug as an on-disk graph store

## What we are doing

This is **not** single-row insert mode.

We:
1. Create schema up front.
2. Bulk load node tables from CSV using `COPY ... FROM`.
3. Bulk load relationship tables from CSV using `COPY ... FROM`.
4. Use an on-disk database.
5. Do not create extra secondary indexes for this benchmark.

In our harness, node tables are copied directly from CSV, and relationship CSVs are rewritten into chunks and then loaded via `COPY`.

## Dataset shape

Synthetic property graph:

- 10 node labels
- 10 relationship types
- 10,000,000 nodes
- 77,790,000 edges
- skewed degree profile
- node properties:
  - 8 extra text
  - 18 extra numeric
  - 8 extra boolean
- edge properties:
  - 4 extra text
  - 10 extra numeric
  - 4 extra boolean

## Observed behavior

Large ingest takes roughly **19 to 20 hours** before analytical queries even begin.

Representative ingest times from repeated runs:

- `68,619,567 ms` (~19.06 h)
- `69,380,471 ms` (~19.27 h)
- `72,656,042 ms` (~20.18 h)

So in practice, just preparing the large dataset is already prohibitively expensive for benchmarking and evaluation workflows.

## Why this seems distinct from #302

Issue #302 discusses poor performance for non-batched / transactional single-row inserts.

Our case is different:
- we are not inserting row-by-row
- we are doing large CSV-based bulk load
- the database still takes around 19 to 20 hours to ingest this graph

That makes it feel like there may still be a separate bottleneck in the current bulk-load path at larger scales.

## Minimal reproduction shape

The exact harness is custom, but the important part is:

- create 10 node tables and 10 relationship tables
- bulk import ~10M nodes and ~77.79M relationships from CSV
- use Python API with on-disk database
- large skewed graph with many properties per record

If useful, I can provide a more minimal standalone repro script that just focuses on schema creation + CSV `COPY` load without the rest of the benchmark machinery.

## Expected behavior

Bulk ingest should be substantially faster for this scale, or at least there should be a documented fast-ingest path that makes this practical.

## Actual behavior

Large bulk ingest completes, but takes around 19 to 20 hours, which is too slow for practical large-scale benchmark preparation.

## Additional note

I am intentionally keeping this issue focused on ingest time only.

In separate large runs, we also saw later query-execution instability / process kills during heavy OLAP traversal queries, but I do not want to mix that into this report unless you think the two are likely connected.

### Are there known steps to reproduce?

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: Large bulk ingest remains impractically slow on 0.16.1 for a 10M-node / 77.79M-edge graph #474

Ladybug version

What operating system are you using?

What happened?

Summary

Environment

What we are doing

Dataset shape

Observed behavior

Why this seems distinct from #302

Minimal reproduction shape

Expected behavior

Actual behavior

Additional note

Are there known steps to reproduce?

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Bug: Large bulk ingest remains impractically slow on 0.16.1 for a 10M-node / 77.79M-edge graph #474

Description

Ladybug version

What operating system are you using?

What happened?

Summary

Environment

What we are doing

Dataset shape

Observed behavior

Why this seems distinct from #302

Minimal reproduction shape

Expected behavior

Actual behavior

Additional note

Are there known steps to reproduce?

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions