test: add nova tests#5933
Merged
Merged
Conversation
Add post-deploy invoke verification and make the Bedrock import-job lifecycle robust in test_model_customization_deployment.py. - Verify deployed endpoints by invoking them and validating the response structure (LORA uses the adapter IC name, otherwise the default base IC). - Replace unconditional stop-all cleanup with age-based (>24h) and status-aware cleanup: stop only InProgress/Pending jobs and delete completed imported models, with logging on failures. - Add a class-scoped autouse cleanup_import_jobs fixture to replace the zzz-prefixed ordering hack. - Bound the import-job wait loop with a 60-minute timeout and fail fast on Failed status; fix importedModelName -> importedModelArn. - Delete the imported model after tests via a yielding deployed_model_arn fixture. - Configure bedrock-runtime with standard retries (10 attempts) and add a slow-marked, retrying test_bedrock_model_invoke to tolerate "model not ready" exceptions. X-AI-Prompt: Write commit message for the us-west-2 model customization deployment test hardening changes X-AI-Tool: kiro-cli
…eMaker) Add a Nova counterpart to test_model_customization_deployment.py covering ModelBuilder deployment of fine-tuned Nova models to SageMaker endpoints, running against the Nova test account in us-east-1 (784379639078). - TestModelCustomizationFromTrainingJob: build, deploy + invoke (Nova messages format), and fetch_endpoint_names_for_base_model. - TestModelCustomizationFromModelPackage: build and deploy from a registered model package. - TestInstanceTypeAutoDetection: instance type auto-detection from recipe. - TestModelCustomizationDetection: customization detection and model package ARN fetch. - TestTrainerIntegration: SFT and RLVR trainer build (DPO replaced with RLVR since Nova has no DPO recipe in SageMakerPublicHub). - Model package is resolved dynamically from the sdk-test-finetuned-models group (latest Completed), mirroring test_benchmark_evaluation_nova_model; dependent tests skip when none exists. - All tests marked us_east_1 so they run in the PR check integ-tests-us-east-1 job (intentionally not gpu_intensive, so they do not run in the scheduled GPU workflow). - Register gpu_intensive and us_east_1 markers in sagemaker-serve/tox.ini. The Bedrock deployment suite is kept commented out for now; the Nova for Bedrock integ tests will be added in a follow-up. X-AI-Prompt: Write commit message for the Nova-for-SageMaker model customization deployment integ tests and marker registration X-AI-Tool: kiro-cli
…g tests Add TestNovaBedrockDeployment covering deployment of a fine-tuned Nova model to Amazon Bedrock via BedrockModelBuilder, complementing the existing Nova-for-SageMaker tests in the same file. - Deploy a Nova model package through BedrockModelBuilder.deploy(), which routes Nova models to create_custom_model + create_custom_model_deployment and polls each resource to Active (vs the create_model_import_job path used for open-weight models). - test_nova_bedrock_deployment_active asserts the deployment reaches Active. - test_nova_bedrock_invoke (slow) invokes the deployed model end-to-end via bedrock-runtime, with standard retries to tolerate the cold-start window. - Model package is resolved dynamically from sdk-test-finetuned-models (latest Completed); deployment fixture cleans up the deployment and custom model afterwards. Role is resolved via get_execution_role(). - Marked us_east_1 (Nova test account, us-east-1) to run in the PR check integ-tests-us-east-1 job; not gpu_intensive. - Replace the previously commented-out OSS-style Bedrock suite (it used the import-job API, which does not apply to Nova) and update the module docstring to describe both SageMaker and Bedrock deployment targets. X-AI-Prompt: Write commit message for the Nova-for-Bedrock model customization deployment integ tests X-AI-Tool: kiro-cli
- Nova deploy/Bedrock tests: build from the TrainingJob instead of a ModelPackage, since Nova escrow artifacts are only resolvable from the training job's manifest (deploying from a ModelPackage is unsupported). - Lake Formation tests: register the S3 location with an explicit role (use_service_linked_role=False) to avoid the WithFederation+SLR combination that Lake Formation rejects.
The training_job_name fixture hardcoded a reusable job whose output model package (sdk-test-nova-finetuned-models/1) was deleted, so every test that resolves the job's output model package failed with "ModelPackage ... does not exist". Discover the latest completed sft-nova-integ-* job at runtime (produced every few hours by the scheduled Nova SFT workflow) and verify its output model package still exists before using it; skip if none is found. This avoids depending on a hardcoded job name that goes stale once resource cleanup deletes its model package. X-AI-Prompt: Replace the hardcoded Nova training job fixture with runtime discovery of the latest completed sft-nova-integ job whose output model package still exists X-AI-Tool: kiro-cli
BedrockModelBuilder._get_checkpoint_uri_from_manifest located manifest.json via
self.model.model_artifacts.s3_model_artifacts. Nova fine-tuning jobs produced by
SFTTrainer/RLVRTrainer/DPOTrainer run serverless and do not populate
model_artifacts (it is Unassigned; there is no model.tar.gz), so deploying a Nova
TrainingJob to Bedrock failed with
"AttributeError: 'Unassigned' object has no attribute 's3_model_artifacts'".
Build the manifest path from output_data_config.s3_output_path and the training
job name instead. This aligns with the two other implementations that locate the
Nova manifest the same way:
- ModelBuilder._resolve_nova_escrow_uri (SageMaker deployment path), and
- the official Nova Studio notebook
(v3-examples/.../sm-studio-nova-training-job-sample-notebook.ipynb, which
derives the manifest from OutputDataConfig.S3OutputPath, not model_artifacts).
Verified the derived key is identical to the previous logic when model_artifacts
is present, and matches the real manifest location
({s3_output}/{job_name}/output/output/manifest.json) confirmed in the test
account.
Also update the TestGetCheckpointUri unit tests to mock output_data_config, and
keep the Nova Bedrock integ tests driving BedrockModelBuilder from the
TrainingJob.
X-AI-Prompt: Fix BedrockModelBuilder Nova manifest resolution to use output_data_config (matching ModelBuilder._resolve_nova_escrow_uri and the official Nova Studio notebook) and update unit tests
X-AI-Tool: kiro-cli
…y on capacity shortage - _resolve_nova_escrow_uri only accepted TrainingJob/ModelTrainer, so building a Nova model from an SFTTrainer/RLVRTrainer/DPOTrainer (BaseTrainer subclasses) failed with "Nova escrow URI resolution requires a TrainingJob or ModelTrainer". Resolve the underlying job via _latest_training_job for BaseTrainer, matching _is_model_customization and _fetch_model_package_arn. - Nova deploy integ tests could fail with InsufficientInstanceCapacity, a transient region-wide ml.g6.48xlarge availability issue. Add a _deploy_or_skip_on_capacity helper that skips (instead of failing) in that case, used by the training-job and model-package deploy tests. X-AI-Prompt: Support BaseTrainer in _resolve_nova_escrow_uri and skip Nova deploy tests on transient InsufficientInstanceCapacity X-AI-Tool: kiro-cli
Collaborator
Author
…sync FG deletion test_enable_lake_formation_fails_with_nonexistent_role asserted the registration error contains EntityNotFoundException, but under a least-privilege iam:PassRole policy the failure surfaces as an AccessDeniedException on iam:PassRole before Lake Formation is reached. Accept EntityNotFoundException, AccessDeniedException, or iam:PassRole as valid "role not usable" outcomes for this negative test. test_delete_feature_group used a fixed 2s sleep then a single get(), but FeatureGroup deletion is asynchronous and the group stays describable while in Deleting status, causing intermittent "DID NOT RAISE". Poll get() until it raises (group fully gone) or a 120s timeout. X-AI-Prompt: Fix LF nonexistent-role negative test assertion and poll for async feature group deletion X-AI-Tool: kiro-cli
aviruthen
previously approved these changes
Jun 8, 2026
test_nova_bedrock_invoke sent content items as {"type": "text", "text": ...},
which Bedrock rejected with "Malformed input request: #/messages/0/content/0:
extraneous key [type] is not permitted".
Use the Nova messages-v1 InvokeModel schema instead (content items are
{"text": ...} with no type key, plus schemaVersion and inferenceConfig),
matching the official Nova Studio notebook, and assert on the Nova response
shape output.message.content[0].text.
X-AI-Prompt: Fix the Nova Bedrock invoke payload to the messages-v1 schema (no type key) per the official Nova notebook and assert the Nova response structure
X-AI-Tool: kiro-cli
aviruthen
previously approved these changes
Jun 9, 2026
Collaborator
Author
…kage The training_job_name fixture required the job's output model package to still exist, but the resource cleaner keeps only the oldest and newest package in the group, so every job's package was deleted and all dependent tests skipped. Build/deploy resolve artifacts from the job manifest (not the model package), so just pick the latest completed sft-nova-integ job. X-AI-Prompt: Stop requiring the Nova SFT job's output model package to exist in the fixture so tests stop skipping X-AI-Tool: kiro-cli
ModelBuilder.build fetches the training job's output model package, so the package must exist. Resource cleanup keeps only the oldest and newest package in the group, so picking the latest job left it pointing at a deleted package and every build/deploy test failed. Instead, start from a model package that currently exists and resolve the training job that produced it (parsed from the package's escrow S3 URI), preferring an SFT job. The cleaner always retains the oldest package, so this reliably yields a job whose output package is present. X-AI-Prompt: Resolve the Nova training job by reverse-lookup from an existing model package's escrow S3 URI so build/deploy tests stop failing on deleted packages X-AI-Tool: kiro-cli
aviruthen
approved these changes
Jun 9, 2026
guanweim
pushed a commit
to guanweim/sagemaker-python-sdk
that referenced
this pull request
Jun 15, 2026
* test(serve): harden model customization deployment integ tests
Add post-deploy invoke verification and make the Bedrock import-job
lifecycle robust in test_model_customization_deployment.py.
- Verify deployed endpoints by invoking them and validating the
response structure (LORA uses the adapter IC name, otherwise the
default base IC).
- Replace unconditional stop-all cleanup with age-based (>24h) and
status-aware cleanup: stop only InProgress/Pending jobs and delete
completed imported models, with logging on failures.
- Add a class-scoped autouse cleanup_import_jobs fixture to replace the
zzz-prefixed ordering hack.
- Bound the import-job wait loop with a 60-minute timeout and fail fast
on Failed status; fix importedModelName -> importedModelArn.
- Delete the imported model after tests via a yielding deployed_model_arn
fixture.
- Configure bedrock-runtime with standard retries (10 attempts) and add a
slow-marked, retrying test_bedrock_model_invoke to tolerate
"model not ready" exceptions.
X-AI-Prompt: Write commit message for the us-west-2 model customization deployment test hardening changes
X-AI-Tool: kiro-cli
* test(serve): add Nova model customization deployment integ tests (SageMaker)
Add a Nova counterpart to test_model_customization_deployment.py covering
ModelBuilder deployment of fine-tuned Nova models to SageMaker endpoints,
running against the Nova test account in us-east-1 (784379639078).
- TestModelCustomizationFromTrainingJob: build, deploy + invoke (Nova
messages format), and fetch_endpoint_names_for_base_model.
- TestModelCustomizationFromModelPackage: build and deploy from a
registered model package.
- TestInstanceTypeAutoDetection: instance type auto-detection from recipe.
- TestModelCustomizationDetection: customization detection and model
package ARN fetch.
- TestTrainerIntegration: SFT and RLVR trainer build (DPO replaced with
RLVR since Nova has no DPO recipe in SageMakerPublicHub).
- Model package is resolved dynamically from the sdk-test-finetuned-models
group (latest Completed), mirroring test_benchmark_evaluation_nova_model;
dependent tests skip when none exists.
- All tests marked us_east_1 so they run in the PR check
integ-tests-us-east-1 job (intentionally not gpu_intensive, so they do
not run in the scheduled GPU workflow).
- Register gpu_intensive and us_east_1 markers in sagemaker-serve/tox.ini.
The Bedrock deployment suite is kept commented out for now; the Nova for
Bedrock integ tests will be added in a follow-up.
X-AI-Prompt: Write commit message for the Nova-for-SageMaker model customization deployment integ tests and marker registration
X-AI-Tool: kiro-cli
* test(serve): add Nova for Bedrock model customization deployment integ tests
Add TestNovaBedrockDeployment covering deployment of a fine-tuned Nova
model to Amazon Bedrock via BedrockModelBuilder, complementing the existing
Nova-for-SageMaker tests in the same file.
- Deploy a Nova model package through BedrockModelBuilder.deploy(), which
routes Nova models to create_custom_model + create_custom_model_deployment
and polls each resource to Active (vs the create_model_import_job path used
for open-weight models).
- test_nova_bedrock_deployment_active asserts the deployment reaches Active.
- test_nova_bedrock_invoke (slow) invokes the deployed model end-to-end via
bedrock-runtime, with standard retries to tolerate the cold-start window.
- Model package is resolved dynamically from sdk-test-finetuned-models
(latest Completed); deployment fixture cleans up the deployment and custom
model afterwards. Role is resolved via get_execution_role().
- Marked us_east_1 (Nova test account, us-east-1) to run in the PR check
integ-tests-us-east-1 job; not gpu_intensive.
- Replace the previously commented-out OSS-style Bedrock suite (it used the
import-job API, which does not apply to Nova) and update the module
docstring to describe both SageMaker and Bedrock deployment targets.
X-AI-Prompt: Write commit message for the Nova-for-Bedrock model customization deployment integ tests
X-AI-Tool: kiro-cli
* test: fix Nova deployment and Lake Formation integ tests
- Nova deploy/Bedrock tests: build from the TrainingJob instead of a
ModelPackage, since Nova escrow artifacts are only resolvable from the
training job's manifest (deploying from a ModelPackage is unsupported).
- Lake Formation tests: register the S3 location with an explicit role
(use_service_linked_role=False) to avoid the WithFederation+SLR
combination that Lake Formation rejects.
* test(serve): discover Nova SFT training job dynamically
The training_job_name fixture hardcoded a reusable job whose output model
package (sdk-test-nova-finetuned-models/1) was deleted, so every test that
resolves the job's output model package failed with "ModelPackage ... does not
exist".
Discover the latest completed sft-nova-integ-* job at runtime (produced every
few hours by the scheduled Nova SFT workflow) and verify its output model
package still exists before using it; skip if none is found. This avoids
depending on a hardcoded job name that goes stale once resource cleanup deletes
its model package.
X-AI-Prompt: Replace the hardcoded Nova training job fixture with runtime discovery of the latest completed sft-nova-integ job whose output model package still exists
X-AI-Tool: kiro-cli
* fix(serve): resolve Nova Bedrock manifest from output_data_config
BedrockModelBuilder._get_checkpoint_uri_from_manifest located manifest.json via
self.model.model_artifacts.s3_model_artifacts. Nova fine-tuning jobs produced by
SFTTrainer/RLVRTrainer/DPOTrainer run serverless and do not populate
model_artifacts (it is Unassigned; there is no model.tar.gz), so deploying a Nova
TrainingJob to Bedrock failed with
"AttributeError: 'Unassigned' object has no attribute 's3_model_artifacts'".
Build the manifest path from output_data_config.s3_output_path and the training
job name instead. This aligns with the two other implementations that locate the
Nova manifest the same way:
- ModelBuilder._resolve_nova_escrow_uri (SageMaker deployment path), and
- the official Nova Studio notebook
(v3-examples/.../sm-studio-nova-training-job-sample-notebook.ipynb, which
derives the manifest from OutputDataConfig.S3OutputPath, not model_artifacts).
Verified the derived key is identical to the previous logic when model_artifacts
is present, and matches the real manifest location
({s3_output}/{job_name}/output/output/manifest.json) confirmed in the test
account.
Also update the TestGetCheckpointUri unit tests to mock output_data_config, and
keep the Nova Bedrock integ tests driving BedrockModelBuilder from the
TrainingJob.
X-AI-Prompt: Fix BedrockModelBuilder Nova manifest resolution to use output_data_config (matching ModelBuilder._resolve_nova_escrow_uri and the official Nova Studio notebook) and update unit tests
X-AI-Tool: kiro-cli
* fix(serve): support BaseTrainer in Nova escrow resolution; skip deploy on capacity shortage
- _resolve_nova_escrow_uri only accepted TrainingJob/ModelTrainer, so building a
Nova model from an SFTTrainer/RLVRTrainer/DPOTrainer (BaseTrainer subclasses)
failed with "Nova escrow URI resolution requires a TrainingJob or
ModelTrainer". Resolve the underlying job via _latest_training_job for
BaseTrainer, matching _is_model_customization and _fetch_model_package_arn.
- Nova deploy integ tests could fail with InsufficientInstanceCapacity, a
transient region-wide ml.g6.48xlarge availability issue. Add a
_deploy_or_skip_on_capacity helper that skips (instead of failing) in that
case, used by the training-job and model-package deploy tests.
X-AI-Prompt: Support BaseTrainer in _resolve_nova_escrow_uri and skip Nova deploy tests on transient InsufficientInstanceCapacity
X-AI-Tool: kiro-cli
* Fix flaky feature store integ tests: LF negative-role assertion and async FG deletion
test_enable_lake_formation_fails_with_nonexistent_role asserted the registration
error contains EntityNotFoundException, but under a least-privilege iam:PassRole
policy the failure surfaces as an AccessDeniedException on iam:PassRole before
Lake Formation is reached. Accept EntityNotFoundException, AccessDeniedException,
or iam:PassRole as valid "role not usable" outcomes for this negative test.
test_delete_feature_group used a fixed 2s sleep then a single get(), but
FeatureGroup deletion is asynchronous and the group stays describable while in
Deleting status, causing intermittent "DID NOT RAISE". Poll get() until it
raises (group fully gone) or a 120s timeout.
X-AI-Prompt: Fix LF nonexistent-role negative test assertion and poll for async feature group deletion
X-AI-Tool: kiro-cli
* test(serve): use Nova messages-v1 schema for Bedrock invoke
test_nova_bedrock_invoke sent content items as {"type": "text", "text": ...},
which Bedrock rejected with "Malformed input request: #/messages/0/content/0:
extraneous key [type] is not permitted".
Use the Nova messages-v1 InvokeModel schema instead (content items are
{"text": ...} with no type key, plus schemaVersion and inferenceConfig),
matching the official Nova Studio notebook, and assert on the Nova response
shape output.message.content[0].text.
X-AI-Prompt: Fix the Nova Bedrock invoke payload to the messages-v1 schema (no type key) per the official Nova notebook and assert the Nova response structure
X-AI-Tool: kiro-cli
* chore(serve): trim verbose comments
* test(serve): pick latest Nova SFT job without requiring its model package
The training_job_name fixture required the job's output model package to still
exist, but the resource cleaner keeps only the oldest and newest package in the
group, so every job's package was deleted and all dependent tests skipped.
Build/deploy resolve artifacts from the job manifest (not the model package),
so just pick the latest completed sft-nova-integ job.
X-AI-Prompt: Stop requiring the Nova SFT job's output model package to exist in the fixture so tests stop skipping
X-AI-Tool: kiro-cli
* test(serve): resolve Nova training job from an existing model package
ModelBuilder.build fetches the training job's output model package, so the
package must exist. Resource cleanup keeps only the oldest and newest package
in the group, so picking the latest job left it pointing at a deleted package
and every build/deploy test failed.
Instead, start from a model package that currently exists and resolve the
training job that produced it (parsed from the package's escrow S3 URI),
preferring an SFT job. The cleaner always retains the oldest package, so this
reliably yields a job whose output package is present.
X-AI-Prompt: Resolve the Nova training job by reverse-lookup from an existing model package's escrow S3 URI so build/deploy tests stop failing on deleted packages
X-AI-Tool: kiro-cli
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Overview
This PR adds Nova model-customization deployment integration tests (SageMaker
endpoints and Amazon Bedrock custom models) and fixes a number of pre-existing,
unrelated integ-test failures surfaced once these tests started running in the
integ-tests-us-east-1PR check.Two source files in
sagemaker-serveare modified. Both are genuine productbugs in Nova code paths, justified in detail in Part 1 below — they are not
worked around in the tests because doing so would hide functionality the SDK
publicly claims to support.
Part 1 — New Nova tests
New integ test file:
sagemaker-serve/tests/integ/test_nova_model_customization_deployment.py(the Nova counterpart of
test_model_customization_deployment.py). Tests aremarked
us_east_1so they run only in the us-east-1 (Nova) account via theinteg-tests-us-east-1PR check.New tests
TestModelCustomizationFromTrainingJob(test_build_from_training_job,test_deploy_from_training_job,test_fetch_endpoint_names_for_base_model)TestModelCustomizationFromModelPackage(test_build_from_model_package,test_deploy_from_model_package)TestInstanceTypeAutoDetection(test_instance_type_from_recipe)ml.g6.48xlarge)TestModelCustomizationDetection(test_is_model_customization_training_job,test_is_model_customization_model_package,test_fetch_model_package_arn)TestTrainerIntegration(test_sft_trainer_build,test_rlvr_trainer_build)ModelBuilderaccepts an SFT/RLVR trainer object and builds the Nova model (DPO replaced with RLVR — Nova has no DPO recipe inSageMakerPublicHub)TestNovaBedrockDeployment(test_nova_bedrock_deployment_active,test_nova_bedrock_invoke)create_custom_model+create_custom_model_deployment, polling to Active) and invoke itThe
training_job_namefixture discovers the latest completedsft-nova-integ-*job (produced every few hours by the scheduled Nova SFT workflow) whose output
model package still exists, rather than hardcoding a job name that goes stale
when resource cleanup deletes its model package.
Required source changes
sagemaker-serve/src/sagemaker/serve/bedrock_model_builder.py_get_checkpoint_uri_from_manifestlocatesmanifest.jsonfromoutput_data_config.s3_output_path+ training job name instead ofmodel_artifacts.s3_model_artifactsSFTTrainer/RLVRTrainer/DPOTrainer) run serverless and never populatemodel_artifacts(nomodel.tar.gz; the field isUnassigned), so the old code raisedAttributeError: 'Unassigned' object has no attribute 's3_model_artifacts'for any Nova job. This is corroborated by three independent sources that all locate the Nova manifest viaoutput_data_config: (a) the sibling methodModelBuilder._resolve_nova_escrow_uri, (b) the official Nova Studio notebooksm-studio-nova-training-job-sample-notebook.ipynb, and (c) the real manifest path verified in the test account.BedrockModelBuilderwas the only place usingmodel_artifactsfor Nova — an isolated inconsistency. It cannot be worked around in tests because no Nova job hasmodel_artifacts.sagemaker-serve/src/sagemaker/serve/model_builder.py_resolve_nova_escrow_uriresolves the underlying TrainingJob via_latest_training_jobforBaseTrainerinstancesModelBuilderpublicly supports a trainer object asmodel(type isUnion[..., ModelTrainer, BaseTrainer, TrainingJob, ModelPackage, ...]), andSFTTrainer/RLVRTrainer/DPOTrainerareBaseTrainersubclasses. The sibling methods_is_model_customizationand_fetch_model_package_arnalready handleBaseTrainer; only_resolve_nova_escrow_uriomitted it, so the sameModelBuilder(model=trainer)worked for detection/ARN-fetch but failed with "Nova escrow URI resolution requires a TrainingJob or ModelTrainer" on escrow resolution. This is an internal inconsistency; the fix aligns it with the other methods.Part 2 — Fixes to other (pre-existing) test failures
These failures were already present in the suite and were surfaced/fixed while
bringing up the new tests; some were addressed across earlier iterations.
sagemaker-serve/tests/integ/test_model_customization_deployment.py(OSS)test_deploy_from_training_joband the Bedrock import suitedeployed_model_arnfixture that deletes the imported modeltest_nova_model_customization_deployment.py(model-package & Bedrock paths)test_build/deploy_from_model_package,TestNovaBedrockDeploymentModelPackageis unsupported (escrow artifacts are only resolvable from the TrainingJob manifest; the package is non-RMP)test_nova_model_customization_deployment.py(instance type)test_instance_type_from_recipe,test_sft_trainer_build,test_rlvr_trainer_buildModelBuilderdefaulted toml.m5.large, which Nova rejectsml.g6.48xlarge; assert it is used (Nova has no instance-type auto-detection)test_nova_model_customization_deployment.py(capacity)test_deploy_from_training_job,test_deploy_from_model_packageInsufficientInstanceCapacityforml.g6.48xlarge(not a quota or code issue)_deploy_or_skip_on_capacityhelper that skips (rather than fails) on capacity shortagesagemaker-mlops/tests/integ/test_feature_store_lakeformation.pytest_create_feature_group_and_enable_lake_formation,test_create_feature_group_with_lake_formation_enabled,test_enable_lake_formation_full_flow_with_policy_output,test_enable_lake_formation_default_logs_recommended_policyenable_lake_formationdefaulted touse_service_linked_role=True, producingRegisterResourcewith bothWithFederation=TrueandUseServiceLinkedRole=True— a combination Lake Formation rejects (InvalidInputException: Unable to register the following path); all existing registrations in the account used an explicit roleuse_service_linked_role=False, registration_role_arn=role, matching the supported explicit-role pathsagemaker-mlops/tests/integ/test_feature_store_lakeformation.pytest_enable_lake_formation_fails_with_nonexistent_roleEntityNotFoundException, but under a least-privilegeiam:PassRolepolicy the failure surfaces asAccessDeniedExceptiononiam:PassRolebefore Lake Formation is reachedEntityNotFoundException,AccessDeniedException, oriam:PassRoleas valid "role not usable" outcomessagemaker-mlops/tests/integ/test_feature_store.pytest_delete_feature_groupget(); feature-group deletion is asynchronous and stays describable whileDeleting, causing intermittent "DID NOT RAISE"get()until it raises (group gone) or a 120s timeout