Skip to content

Fix: Bug fixes for s3 path validation, mlflow app creation#5402

Merged
rsareddy0329 merged 17 commits into
aws:masterfrom
rsareddy0329:master-mc-bug-fixes
Dec 10, 2025
Merged

Fix: Bug fixes for s3 path validation, mlflow app creation#5402
rsareddy0329 merged 17 commits into
aws:masterfrom
rsareddy0329:master-mc-bug-fixes

Conversation

@rsareddy0329

Copy link
Copy Markdown
Contributor

Issue #, if available:

Description of changes:

  • Modified/Updated to better error handling for Mlflow app creation
  • S3 Output path validation - create if the prefix does not exist, instead of error.
  • Update notebooks to reflect recent code updates.

Testing:

  • Unit tests are successful
  • Submitted jobs to all finetuning technique flows on these changes

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

error_msg = f"MLflow app creation failed. Current status: {new_app.status}"
if hasattr(new_app, 'failure_reason') and new_app.failure_reason:
error_msg += f". Reason: {new_app.failure_reason}"
raise RuntimeError(error_msg)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to raise error in case of timeout exception or should just display error message ? If MLFlow app is required parameter for CTJ API then it's fine to throw error.

if "NoSuchBucket" in str(e) or "Not Found" in str(e):
# Create bucket
region = sagemaker_session.boto_region_name
if region == 'us-east-1':

@jam-jee jam-jee Dec 10, 2025

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this region specific check here ?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is expected for s3

if "NoSuchBucket" in str(e) or "Not Found" in str(e):
# Create bucket
region = sagemaker_session.boto_region_name
if region == 'us-east-1':

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is expected for s3

@rsareddy0329 rsareddy0329 merged commit cd406fa into aws:master Dec 10, 2025
12 of 22 checks passed
Evan-W-ang added a commit to Evan-W-ang/sagemaker-python-sdk that referenced this pull request Jun 8, 2026
* fix: Fix the recipe selection for multiple recipe scenario

* fix: Fix the recipe selection for multiple recipe scenario

* fix: Hyperparameter issue fixes, validate s3 output path,additional unit tests

* Fix: Add validation to bedrock reward models

* Fix: Add validation to bedrock reward models

* Fix: Add allow list for bedrock eval models

* Fix: Add allow list for bedrock eval models

* Fix: Bug fixes for s3 path validation, mlflow app creation

* Fix: Update Legal verbiage, and allowed reward model ids based on region

* Fix: Update model_package_group_name to model_package_group in all trianers to maintain consistency

---------

Co-authored-by: Roja Reddy Sareddy <rsareddy@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants