-
Notifications
You must be signed in to change notification settings - Fork 7
Add functionality for old jobs removal #60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements automatic cleanup of finished jobs based on a configurable retention period. It adds a new cleanupFinishedJobs function that deletes jobs older than the specified retention days, along with their associated tags.
- Added new
FinishedJobRetentionDaysconfiguration field to control job retention - Implemented cascading deletion of old jobs and related tag tables
- Integrated telemetry logging for improved error handling
Reviewed Changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| internal/pkg/janitor/janitor.go | Added FinishedJobRetentionDays configuration field and integrated new cleanup function into the janitor loop with telemetry error logging |
| internal/pkg/janitor/job.go | Implemented cleanupFinishedJobs function with transactional deletion of old jobs and associated tags |
| internal/pkg/janitor/queries/old_jobs_tags_delete.sql | SQL query to delete job tags for old jobs |
| internal/pkg/janitor/queries/old_jobs_delete.sql | SQL query to delete old jobs from the jobs table |
| internal/pkg/janitor/queries/old_jobs_command_tags_delete.sql | SQL query to delete job command tags for old jobs |
| internal/pkg/janitor/queries/old_jobs_cluster_tags_delete.sql | SQL query to delete job cluster tags for old jobs |
| configs/local.yaml | Added sample configuration for finished job retention (14 days) |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
internal/pkg/janitor/job.go
Outdated
| if err := sess.Commit(); err != nil { | ||
| return fmt.Errorf("failed to commit cleanup transaction: %w", err) | ||
| } | ||
|
|
Copilot
AI
Nov 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trailing whitespace on line 131 should be removed to maintain consistent code formatting.
internal/pkg/janitor/job.go
Outdated
| defer func() { | ||
| _ = sess.Rollback() | ||
| }() |
Copilot
AI
Nov 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The deferred rollback at lines 111-113 will always execute after sess.Commit() on line 128, undoing the commit. This pattern is incorrect for transaction management. The deferred rollback should check if the transaction has been committed before attempting to rollback. Consider removing this deferred rollback or checking the session's committed state before rolling back.
32a6f2e to
54b57fb
Compare
alephys26
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code looks good, I just had a small concern that do we have a reliable way of exporting the job history so that a proper trail can be maintained? This will be clearing the db of records of jobs after a fixed number of days.
| @@ -0,0 +1 @@ | |||
| SELECT MAX(system_job_id) FROM jobs WHERE updated_at < extract(epoch FROM now() - ($1 || ' days')::interval)::int; No newline at end of file | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it'll be easier to get the cutoff unixtime in golang and that just pass it as a parameter to the query: select system_job_id from jobs where updated_at < <cutoff timestamp> order by updated_at desc limit 1 we might want to do this read as a dirty read so it's non blocking... and might add an index in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree that this query will be faster specially in case of index.
Postgre doesn't support dirty read at all READ COMMITTED is the lowest level
| @@ -0,0 +1,7 @@ | |||
| DELETE FROM job_cluster_tags | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
once we have the cutoff system_job_id... all these queries will become much simpler: delete from job_cluster_tags where system_job_id <= $1 limit 1000 and we keep running this query until changed rownum != 0.
PS the same comment on all other queries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately we are not in the MySQL world :( Postgre doesn't support that
While there is no LIMIT clause for DELETE, it is possible to get a similar effect using the same method described in the documentation of UPDATE:
WITH delete_batch AS (
SELECT l.ctid FROM user_logs AS l
WHERE l.status = 'archived'
ORDER BY l.creation_date
FOR UPDATE
LIMIT 10000
)
DELETE FROM user_logs AS dl
USING delete_batch AS del
WHERE dl.ctid = del.ctid;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh... this is sad :) CTE way looks a bit more readable than the subquery though :)
| } | ||
|
|
||
| func (j *Janitor) cleanupFinishedJobs() error { | ||
| if j.FinishedJobRetentionDays == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if j.FinishedJobRetentionDays <= 0
internal/pkg/janitor/job.go
Outdated
| if _, err := sess.Exec(q, biggestID.Int64); err != nil { | ||
| return err | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to see how many records were deleted here and iterate until rownum changed != 0
54b57fb to
be2aa81
Compare
Background
There is a need to have an ability to remove old jobs from the DB, it helps to have small, cheap databases.
Change
Introduce new property for setting finished job retention period.
If property is not set jobs will be stored permanently
Testing
Tested on local machine.