the commits table has a primary key that is derived only from a sequence (i.e. an autoincrementing integer)
When facade analyze_commits_in_parallel runs (to insert all new commit change information into the commits table), it does not utilize Augur's existing upsert (bulk_insert_dicts or similar) logic. Instead it simply inserts always, causing new IDs to be generated and new rows to be added.
If the run of analyzing commits is a rerun (i.e. the repo previously was fully collected, but the admin reset the last collection date to force recollection, meaning many of the commits are already in the table), this will simply generate duplicate rows, contributing to the size growth of one of the largest tables in Augur.
In order to use upserts for the commits table, we need a compound primary key based on the actual data. Given this table is actually more accurately described as commit_changes (#3682), i propose this constraint UniqueConstraint("repo_id", "cmt_commit_hash", "cmt_filename", name="commit-changes-unique"),.
the
commitstable has a primary key that is derived only from a sequence (i.e. an autoincrementing integer)When facade
analyze_commits_in_parallelruns (to insert all new commit change information into the commits table), it does not utilize Augur's existing upsert (bulk_insert_dictsor similar) logic. Instead it simply inserts always, causing new IDs to be generated and new rows to be added.If the run of analyzing commits is a rerun (i.e. the repo previously was fully collected, but the admin reset the last collection date to force recollection, meaning many of the commits are already in the table), this will simply generate duplicate rows, contributing to the size growth of one of the largest tables in Augur.
In order to use upserts for the commits table, we need a compound primary key based on the actual data. Given this table is actually more accurately described as commit_changes (#3682), i propose this constraint
UniqueConstraint("repo_id", "cmt_commit_hash", "cmt_filename", name="commit-changes-unique"),.