Skip to content

Conversation

@hladush
Copy link
Contributor

@hladush hladush commented Nov 4, 2025

Background

There is a need to have an ability to remove old jobs from the DB, it helps to have small, cheap databases.

Change

Introduce new property for setting finished job retention period.
If property is not set jobs will be stored permanently

Testing

Tested on local machine.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements automatic cleanup of finished jobs based on a configurable retention period. It adds a new cleanupFinishedJobs function that deletes jobs older than the specified retention days, along with their associated tags.

  • Added new FinishedJobRetentionDays configuration field to control job retention
  • Implemented cascading deletion of old jobs and related tag tables
  • Integrated telemetry logging for improved error handling

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
internal/pkg/janitor/janitor.go Added FinishedJobRetentionDays configuration field and integrated new cleanup function into the janitor loop with telemetry error logging
internal/pkg/janitor/job.go Implemented cleanupFinishedJobs function with transactional deletion of old jobs and associated tags
internal/pkg/janitor/queries/old_jobs_tags_delete.sql SQL query to delete job tags for old jobs
internal/pkg/janitor/queries/old_jobs_delete.sql SQL query to delete old jobs from the jobs table
internal/pkg/janitor/queries/old_jobs_command_tags_delete.sql SQL query to delete job command tags for old jobs
internal/pkg/janitor/queries/old_jobs_cluster_tags_delete.sql SQL query to delete job cluster tags for old jobs
configs/local.yaml Added sample configuration for finished job retention (14 days)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

if err := sess.Commit(); err != nil {
return fmt.Errorf("failed to commit cleanup transaction: %w", err)
}

Copy link

Copilot AI Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trailing whitespace on line 131 should be removed to maintain consistent code formatting.

Suggested change

Copilot uses AI. Check for mistakes.
Comment on lines 111 to 113
defer func() {
_ = sess.Rollback()
}()
Copy link

Copilot AI Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The deferred rollback at lines 111-113 will always execute after sess.Commit() on line 128, undoing the commit. This pattern is incorrect for transaction management. The deferred rollback should check if the transaction has been committed before attempting to rollback. Consider removing this deferred rollback or checking the session's committed state before rolling back.

Copilot uses AI. Check for mistakes.
alephys26

This comment was marked as outdated.

@hladush hladush force-pushed the ihladush/add_functionality_for_old_jobs_removal branch 2 times, most recently from 32a6f2e to 54b57fb Compare November 11, 2025 19:47
alephys26
alephys26 previously approved these changes Nov 12, 2025
Copy link
Contributor

@alephys26 alephys26 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good, I just had a small concern that do we have a reliable way of exporting the job history so that a proper trail can be maintained? This will be clearing the db of records of jobs after a fixed number of days.

@@ -0,0 +1 @@
SELECT MAX(system_job_id) FROM jobs WHERE updated_at < extract(epoch FROM now() - ($1 || ' days')::interval)::int; No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it'll be easier to get the cutoff unixtime in golang and that just pass it as a parameter to the query: select system_job_id from jobs where updated_at < <cutoff timestamp> order by updated_at desc limit 1 we might want to do this read as a dirty read so it's non blocking... and might add an index in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that this query will be faster specially in case of index.
Postgre doesn't support dirty read at all READ COMMITTED is the lowest level

@@ -0,0 +1,7 @@
DELETE FROM job_cluster_tags
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

once we have the cutoff system_job_id... all these queries will become much simpler: delete from job_cluster_tags where system_job_id <= $1 limit 1000 and we keep running this query until changed rownum != 0.

PS the same comment on all other queries.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately we are not in the MySQL world :( Postgre doesn't support that
While there is no LIMIT clause for DELETE, it is possible to get a similar effect using the same method described in the documentation of UPDATE:

WITH delete_batch AS (
SELECT l.ctid FROM user_logs AS l
WHERE l.status = 'archived'
ORDER BY l.creation_date
FOR UPDATE
LIMIT 10000
)
DELETE FROM user_logs AS dl
USING delete_batch AS del
WHERE dl.ctid = del.ctid;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh... this is sad :) CTE way looks a bit more readable than the subquery though :)

}

func (j *Janitor) cleanupFinishedJobs() error {
if j.FinishedJobRetentionDays == 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if j.FinishedJobRetentionDays <= 0

Comment on lines 135 to 137
if _, err := sess.Exec(q, biggestID.Int64); err != nil {
return err
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to see how many records were deleted here and iterate until rownum changed != 0

@hladush hladush force-pushed the ihladush/add_functionality_for_old_jobs_removal branch from 54b57fb to be2aa81 Compare November 13, 2025 19:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants