Skip to content

Conversation

@dbczumar
Copy link
Owner

No description provided.

Copy link

@sueann sueann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments for you inline. Just to be clear, this is WIP (e.g. _get_assumed_role_arn is not yet used) with https://dbc-c0ef404d-7f59.cloud.databricks.com/?o=2527845275686535#notebook/4455645117498241/command/4455645117498246 as the main testing mechanism (not done yet), right?


def deploy(app_name, model_path, execution_role_arn, bucket, run_id=None,
image="mlflow_sage", region_name="us-west-2"):
def deploy(app_name, model_path, bucket=None, image_url=DEFAULT_IMAGE_URL, run_id=None,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure how we are about breaking API changes for MLflow currently - would leave it as close to original as possible but can ask in the PR against master

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. @tomasatdatabricks - what are your thoughts?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I think it's early enough to get away with breaking API changes although I would double check with Matei on that. I agree with @sueann that we should also update the cli. Not sure whether we want to pass the container url or stick to using the name in cli.

We should also think through what happens if someone wants to built their custom image and use it instead of the default one. E.g. Should build container return url now?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. we should change the argument ordering back to what it was
  2. decide whether to change the image name to url or not (what this allows is being able to use another account's ECR repo, which might be useful - e.g. sagemaker has published tensorflow images you can manually use in their account)
  3. see if we can infer the region name automatically (at least on databricks) - i don't think you can have the cluster in one region and do all the operation in another?? (to be checked)


DEFAULT_IMAGE_NAME = "mlflow_sage"

PYFUNC_IMAGE_URL = "707343435239.dkr.ecr.us-west-2.amazonaws.com/mlflow-pyfunc-test:latest"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: we want to change this to the databricks prod account in submitting the PR to mlflow master

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should tag the image with the current mlflow version instead of latest.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to clarify, so for master, we'd make it the version of the next release, right? is that 0.2.2 now? @tomasatdatabricks

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would make most sense for this to point to the current version of MLflow - i.e. mlflow.version.VERSION. That way we do not have to be managing updating versions on multiple places.

def _get_default_s3_bucket(region_name):
# create bucket if it does not exist
account_id = _get_account_id()
bucket_name = "{pfx}-{aid}".format(pfx=DEFAULT_BUCKET_NAME_PREFIX, aid=account_id)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, is the idea here that multiple accounts could access the same s3 bucket and so we want to distinguish them by account name? what bad things would happen if we just let the bucket name be the default name? what kind of naming convention does sagemaker itself use (when it creates the bucket for you)?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

S3 bucket names are global, so we're forced to create a unique bucket name. Come to think of it, this naming scheme doesn't handle the case where that bucket name is taken by a different account, which would be unfortunate.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar with the sagemaker default bucket naming scheme. That should be explored

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the sagemaker docs: When you make an Amazon SageMaker API call that accesses an S3 bucket location and one is not specified, the Session creates a default bucket based on a naming convention which includes the current AWS account ID.

Copy link

@sueann sueann Jun 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah we used to have some sagemaker-auto-generated S3 buckets visible in SageMaker Models UI (e.g. for your very first tensorflow one) which showed the naming convention. can't quite find them right now...

Copy link

@tomasatdatabricks tomasatdatabricks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, left a few comments.


DEFAULT_IMAGE_NAME = "mlflow_sage"

PYFUNC_IMAGE_URL = "707343435239.dkr.ecr.us-west-2.amazonaws.com/mlflow-pyfunc-test:latest"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should tag the image with the current mlflow version instead of latest.

prefix = run_id + "/" + prefix
run_id = _check_compatible(model_path)
prefix = os.path.join(run_id, prefix)
run_id = _check_compatible(model_path)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong indent here. The last line is not supposed to be part of the if block.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed


def deploy(app_name, model_path, execution_role_arn, bucket, run_id=None,
image="mlflow_sage", region_name="us-west-2"):
def deploy(app_name, model_path, bucket=None, image_url=DEFAULT_IMAGE_URL, run_id=None,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I think it's early enough to get away with breaking API changes although I would double check with Matei on that. I agree with @sueann that we should also update the cli. Not sure whether we want to pass the container url or stick to using the name in cli.

We should also think through what happens if someone wants to built their custom image and use it instead of the default one. E.g. Should build container return url now?

eprint('tag response', response)
return '{}/{}/{}'.format(s3.meta.endpoint_url, bucket, key)

return os.path.join(s3.meta.endpoint_url, bucket, key)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is supposed to return url so maybe don't use os.path.join.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Replaced it with slightly-tweaked slash concatenation. Wanted to make sure that this would work if the arguments contained trailing slashes.

model_path = _get_model_log_dir(model_path, run_id)
prefix = os.path.join(run_id, prefix)
run_id = _check_compatible(model_path)
run_id = _check_compatible(model_path)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this changes the behavior ? was it incorrect before?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah sorry, was viewing a subset of changes - this was changing it back to the original logic

print("activating custom environment")
env = conf[pyfunc.ENV]
env_path_dst = os.path.join("/opt/mlflow/", env)
env_path_dst_dir = os.path.dirname(env_path_dst)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tomasatdatabricks do you know why this might have been added?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we were attempting to copy files to a destination directory that may not exist; ran into an error during copying due to this. There may be an easier way to force directory creation by adding a flag to the copy command.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it looks reasonable to me. Btw, I fix this in my pending PR too so we might need to merge it at some point.

dbczumar added a commit that referenced this pull request Aug 17, 2018
* Preliminary Java support

* Renames

* Package restructuring

* Add mleap python flavor - preliminary

* Implement javafunc with correct loader module

* Input parsing

* Add missing files

* Failed attempt at package installation

* Fix up naming semantics, loader module impl

* Server impl

* Fixes - serving works

* Benchmarking results

* Benchmarking script

* SparkML critical path lats

* add batching to bmark

* Remove runID from SageMakerServer

* ..

* Add method that allows deserialization of MLeap pipeline in native format

* Move JavaFunc into protected sagemaker package

* Modify pyfunc container to handle mleap

* Python API changes progress

* Use package protection to hide modules that we don't yet wish to expose to users

* Fixes

* Container mods

* Remove mlflow-java directory

* Serving is functional. Add sparkml module

* Exception message fixes

* Throw unsupportedop from attempting to load a model from runId

* Remove nonfunctional PackageInstaller

* Package restructuring, remove unnecessary lines

* Improved error handling

* Doc tweaks

* JavaFunc error coherency improvements

* Remove unused main method

* Fix more merge conflicts

* Remove invalid comments

* Attempt at MLeap serialization parameter validation

* Debugging

* Exception tweaks

* Add javafunc experimental comments

* Remove javafunc line

* Pom fixes

* More pom fixes, make container use jar, correct version

* Construct leapframe from pandas

* Add missing files

* Docs fixes

* Comment fix

* Docs fix

* pom fixes

* New tests

* Test improvements

* Fix print statement omission

* Docker build, spark lib fixes

* Renaming, new yaml/json utilities, prediction column selection from MLeap

* Container emits json-serialized predictions column

* lambda syntax fix

* MLeap predictor return json

* Merge docker tests

* Use

* Add test for correct handling of invalid inputs

* helper function documentation

* Pom changes, docs

* Revert MLeap, sagemaker container

* Remove leapframe, sagemaker java infra

* Revert tests

* Revert helper functions for tests

* Fix spacing

* revert setup py

* Remove commented pom

* Remove javafunc

* Remove py

* Add style checker to pom with correct configuration

* Docs

* Add tests - FOR REF LATER

* Remove tests

* Remove unused tracking utils

* Revert "Remove tests"

This reverts commit c88d6b2.

* Remove pipeline deserialization test

* Serialization utils docs

* Add unit tests

* Error message fix

* Revert py changes

* Revert py

* Remove unnecessary test resources

* Doc link fixes

* Change package name in java file

* Change package name in directory structure

* Pom updates

* Print stack trace on modeltest fail

* Remove trailing  from test case names

* Rename tests to comply with style guide, add class argument to serialization  utility

* Format code

* Format tests

* Rename  to

* Update travis yml with Java tests

* Remove superfluous files

* Test fix

* Remove swn files

* Remove swm
dbczumar pushed a commit that referenced this pull request Oct 3, 2018
* begin R service API refactoring

* fix tracking uri getting

* remove old mlflow_create_experiment()

* reorg files

* local server registration

* experiment adding

* mlflow client server url attribute

* delete/restore experiments

* get run

* log metric

* fix log metric docs

* set tag

* log param

* get param

* get metric

* get metric history

* list and get experiment

* set terminated

* reorg files

* refactor mlflow_log_artifact()

* reorg files

* remove unused mlflow_ensure_run_id

* refactor mlflow_end_run()

* delete old cold

* fix get experiment

* options for view_type for list experiments

* revise tests

* fix mlflow_get_run dispatch

* deregister local servers in tests

* fix tests

* pass client to rest calls

* more fixes

* get or start run for all fluent

* actually refresh state for tests

* remove mlflow_connect()

* rename mlflow_tracking_uri to mlflow_get_tracking_uri

* remove mlflow_disconnect()

* remove active connection code

* rename mlflow_connection class to mlflow_server

* remove unused mlflow_connection_url

* remove mentions of connection

* cosmetics

* typo

* remove dead code

* with.mlflow_active_run condition handling

* error message when client isn't specified for tracking service API

* active experiment id

* setting tracking uri shouldn't change other globals

* refactor entities run

* test server -> validate server

* reorg files

* reorg test files

* remove run_uuid parameter from mlflow_log_model

* clean up create/list experiment rest wrapper

* remove unnecessary assignments in rest wrappers

* set experiment id independent of run

* consistent naming getters and setters

* some r cmd check fixes

* fix r cmd check

* rename mlflow_get_active_run -> mlflow_active_run and make private mlflow_set_active_run

* unexport experiment id getting/setting

* add mlflow_get_experiment_by_name() and mlflow_set_experiment()

* typo

* with.mlflow_run

* refactor create experiment

* refactor list experiments

* refactor get run

* refactor create run

* refactor get experiment

* refactor get experiment by name

* refactor delete experiment

* refactor restore experiment

* log metric

* refactor client api

* fluent api refactor

* reorg files

* tracking client docs

* fluent docs

* rbuildignore internal

* typos

* default end_time of mlflow_client_set_terminated to current time

* fix mlflow_end_run()

* default user

* document tracking client api

* shorten function name

* add mlflow_client_list_artifacts()

* ignore noisy lintr warnings (#1)
@dbczumar dbczumar mentioned this pull request May 19, 2020
24 tasks
dbczumar added a commit that referenced this pull request Aug 25, 2020
dbczumar added a commit that referenced this pull request Aug 27, 2020
dbczumar pushed a commit that referenced this pull request Nov 3, 2025
Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>
dbczumar pushed a commit that referenced this pull request Nov 3, 2025
Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants