Skip to content

Improve performance of new transform for large tables#68

Merged
brycekbargar merged 18 commits intolibrary-data-platform:release-v4.0.0from
Five-Colleges-Incorporated:performance-testing
Mar 10, 2026
Merged

Improve performance of new transform for large tables#68
brycekbargar merged 18 commits intolibrary-data-platform:release-v4.0.0from
Five-Colleges-Incorporated:performance-testing

Conversation

@brycekbargar
Copy link
Collaborator

@brycekbargar brycekbargar commented Mar 10, 2026

The biggest draw of refactoring the transformation logic to happen in postgres vs python was a performance speed up. Unfortunately as I had originally implemented it postgres ran out of memory and died on any table with a row x column count approaching a million. This PR is the result of a lot of performance tuning and optimization to get the memory usage down while still being fast. I've verified this on a table with 6 million rows and feel confident about the biggest tables (which testing will happen soon).

During testing I realized I forgot the progress bars which was really annoying as I had no idea if transformation was doing anything so I added them back in this PR. I also realized that indexing did not work on tables with schemas and fixed it.

Note: Postgres 14 is required for negative indexing on the table name in the tcatalog table.

@brycekbargar brycekbargar merged commit d33f3c4 into library-data-platform:release-v4.0.0 Mar 10, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant