User Story
As a user of Sleeper, once a job is submitted I want the system to remember which table it job is for by its ID, so that when I rename the table the job still runs against that table.
Description / Background
Under epic:
When a bulk import job is received in the starter lambda, the lambda looks up the Sleeper table by the table name. When it submits the job to be run in Spark, it passes the job to Spark as it was when it received it, including the table name instead of the ID.
We'd like the job to be passed to Spark by the table ID instead of the name, so that if the table is renamed between the starter lambda and the Spark driver, the job will still run.
Technical Notes / Implementation Details
The job is written to S3 with BulkImportExecutor.WriteJobToBucket. The job is read in Spark with BulkImportJobLoaderFromS3, in BulkImportJobDriver.start.
The Sleeper table is then looked up by its name in BulkImportJobDriver.run. Because it looks up by the name, that needs to happen outside of the try/catch/finally where failure is submitted to the job tracker. When we load the table properties by its ID instead, we'll be able to track any failure in the job tracker, and put it inside the try/catch/finally.
User Story
As a user of Sleeper, once a job is submitted I want the system to remember which table it job is for by its ID, so that when I rename the table the job still runs against that table.
Description / Background
Under epic:
When a bulk import job is received in the starter lambda, the lambda looks up the Sleeper table by the table name. When it submits the job to be run in Spark, it passes the job to Spark as it was when it received it, including the table name instead of the ID.
We'd like the job to be passed to Spark by the table ID instead of the name, so that if the table is renamed between the starter lambda and the Spark driver, the job will still run.
Technical Notes / Implementation Details
The job is written to S3 with
BulkImportExecutor.WriteJobToBucket. The job is read in Spark withBulkImportJobLoaderFromS3, inBulkImportJobDriver.start.The Sleeper table is then looked up by its name in
BulkImportJobDriver.run. Because it looks up by the name, that needs to happen outside of the try/catch/finally where failure is submitted to the job tracker. When we load the table properties by its ID instead, we'll be able to track any failure in the job tracker, and put it inside the try/catch/finally.