TIMX 509 - run-timestamp argument for all transform commands#320
Merged
TIMX 509 - run-timestamp argument for all transform commands#320
Conversation
Why these changes are being introduced: Similar to when run-id was added as an allowed payload attribute, that is then passed around to various command generation, so the same is needed for run-timestamp for generating transform commands (for Transmogrifier). How this addresses that need: * "run-timestamp" is an allowed input payload attribute * if included, passed to transform commands * if absent, a run-timestamp is minted by the lambda for all transform commands generated The net effect is the lambda will provide the *same* run-timestamp for all Transmogrifier commands it prepares, which ensures all writes for the run get the same timestamp. The only variation is whether the StepFunction passes the timestamp, or the lambda mints it; both are supported. Side effects of this change: * All Transmogrifier commands will now include a --run-timestamp CLI argument Relevant ticket(s): * https://mitlibraries.atlassian.net/browse/TIMX-509
Pull Request Test Coverage Report for Build 15742189993Details
💛 - Coveralls |
9 tasks
ghukill
commented
Jun 24, 2025
Comment on lines
-113
to
+159
| ```bash | ||
| docker run -e TIMDEX_ALMA_EXPORT_BUCKET_ID=alma-bucket-name \ | ||
| -e TIMDEX_S3_EXTRACT_BUCKET_ID=timdex-bucket-name \ | ||
| -e WORKSPACE=dev \ | ||
| -p 9000:8080 timdex-pipeline-lambdas-dev:latest | ||
| ``` | ||
| ```bash | ||
| docker run -e TIMDEX_ALMA_EXPORT_BUCKET_ID=alma-bucket-name \ | ||
| -e TIMDEX_S3_EXTRACT_BUCKET_ID=timdex-bucket-name \ | ||
| -e WORKSPACE=dev \ | ||
| -p 9000:8080 timdex-pipeline-lambdas-dev:latest | ||
| ``` | ||
|
|
||
| - POST to the container | ||
| Note: running this with next-step transform or load involves an actual S3 connection and is thus tricky to test locally. Better to push the image to Dev1 and test there. | ||
|
|
||
| ```bash | ||
| curl -XPOST "http://localhost:9000/2015-03-31/functions/function/invocations" -d '{ | ||
| "next-step": "extract", | ||
| "run-date": "2022-03-10T16:30:23Z", | ||
| "run-type": "daily", | ||
| "source": "YOURSOURCE", | ||
| "verbose": "true", | ||
| "oai-pmh-host": "https://YOUR-OAI-SOURCE/oai", | ||
| "oai-metadata-format": "oai_dc", | ||
| "oai-set-spec": "YOUR-SET-SPEC" | ||
| }' | ||
| ``` | ||
| ```bash | ||
| curl -XPOST "http://localhost:9000/2015-03-31/functions/function/invocations" -d '{ | ||
| "next-step": "extract", | ||
| "run-date": "2022-03-10T16:30:23Z", | ||
| "run-type": "daily", | ||
| "source": "YOURSOURCE", | ||
| "verbose": "true", | ||
| "oai-pmh-host": "https://YOUR-OAI-SOURCE/oai", | ||
| "oai-metadata-format": "oai_dc", | ||
| "oai-set-spec": "YOUR-SET-SPEC" | ||
| }' | ||
| ``` | ||
|
|
||
| - Observe output | ||
| ```json | ||
| { | ||
| "run-date": "2022-03-10", | ||
| "run-type": "daily", | ||
| "source": "YOURSOURCE", | ||
| "verbose": true, | ||
| "next-step": "transform", | ||
| "extract": { | ||
| "extract-command": [ | ||
| "--host=https://YOUR-OAI-SOURCE/oai", | ||
| "--output-file=s3://timdex-bucket-name/YOURSOURCE/YOURSOURCE-2022-03-09-daily-extracted-records-to-index.xml", | ||
| "--verbose", | ||
| "harvest", | ||
| "--metadata-format=oai_dc", | ||
| "--set-spec=YOUR-SET-SPEC", | ||
| "--from-date=2022-03-09" | ||
| ] | ||
| } | ||
| - | ||
| ```json | ||
| { | ||
| "run-date": "2022-03-10", | ||
| "run-type": "daily", | ||
| "source": "YOURSOURCE", | ||
| "verbose": true, | ||
| "next-step": "transform", | ||
| "extract": { | ||
| "extract-command": [ | ||
| "--host=https://YOUR-OAI-SOURCE/oai", | ||
| "--output-file=s3://timdex-bucket-name/YOURSOURCE/YOURSOURCE-2022-03-09-daily-extracted-records-to-index.xml", | ||
| "--verbose", | ||
| "harvest", | ||
| "--metadata-format=oai_dc", | ||
| "--set-spec=YOUR-SET-SPEC", | ||
| "--from-date=2022-03-09" | ||
| ] | ||
| } | ||
| ``` | ||
| } | ||
| ``` |
ehanson8
approved these changes
Jun 24, 2025
Contributor
ehanson8
left a comment
There was a problem hiding this comment.
Great work updating all the apps to include this!
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
NOTE: this PR builds upon PRs MITLibraries/timdex-dataset-api#149 and MITLibraries/transmogrifier#254.
Purpose and background context
This PR updates the TIMDEX lambda to include a
--run-timestampfor all transform commands generated for Transmogrifier.Two scenarios are supported:
run-timestampas an input payload to the lambda, which is picked up and passed alongHow can a reviewer manually see the effects of these changes?
NOTE: this is an example of something that moving to SAM will make simpler, but using the previous approach for now.
1- Build new docker image:
2- Run docker image:
3- From another terminal, make
curlrequest to generate Transmogrifier transform commands:With formatted output like the following, noting the inclusion of
--run-timestamp=2025-06-17T12:34:56.789000for each:{ "run-date": "2025-06-17", "run-type": "daily", "source": "libguides", "verbose": true, "next-step": "load", "transform": { "files-to-transform": [ { "transform-command": [ "--input-file=s3://timdex-extract-dev-222053980223/libguides/libguides-2025-06-17-daily-extracted-records-to-index_01.xml", "--output-location=s3://timdex-extract-dev-222053980223/dataset", "--source=libguides", "--run-id=abc123", "--run-timestamp=2025-06-17T12:34:56.789000" <--------- ] }, { "transform-command": [ "--input-file=s3://timdex-extract-dev-222053980223/libguides/libguides-2025-06-17-daily-extracted-records-to-index_02.xml", "--output-location=s3://timdex-extract-dev-222053980223/dataset", "--source=libguides", "--run-id=abc123", "--run-timestamp=2025-06-17T12:34:56.789000" <--------- ] }, { "transform-command": [ "--input-file=s3://timdex-extract-dev-222053980223/libguides/libguides-2025-06-17-daily-extracted-records-to-index_03.xml", "--output-location=s3://timdex-extract-dev-222053980223/dataset", "--source=libguides", "--run-id=abc123", "--run-timestamp=2025-06-17T12:34:56.789000" <--------- ] } ] } }Includes new or updated dependencies?
YES: dependencies updated
Changes expectations for external applications?
YES: Transmogrifier will now recieve a
--run-timestampCLI argumentWhat are the relevant tickets?