This Azure DevOps pipeline will help you operationalize the process to:
- Create or reuse a Databricks cluster to serve as a remote compute to Azure ML Service
- Attach this cluster as a Compute to Azure ML Service
- Execute the Azure ML pipeline code on this cluster
- Terminate the cluster after the job is done
az ad sp create-for-rbac \
--name "<name-for-service-principal>" \
--scopes "<Resource Group ID>"Where:
<name-for-service-principal>is any name with no spaces and special characters<Resource Group ID>can be retrieved from the Properties tab of the Resource Group blade that contains all your resources:
You will use azdo_pipelines\build-train.yml to create a new Azure
Pipeline to do all the work to train your model. All the steps are defined
on this file. To use this file to create the Train Pipeline on your account:
- On Azure DevOps, go to
Pipelines > Buildhttps://dev.azure.com/<azdo-tenant>/<team-project>/_buildwhere:your-azdo-tenantis your Azure DevOps tenant containing all the team projectsteam projectis the Team Project you are using to run this sample
- Click
New > New Build Pipelinehttps://dev.azure.com/<azdo-tenant>/<team-project>/_apps/hub/ms.vss-build-web.ci-designer-hub, where:<azdo-tenant>is your Azure DevOps organization<team-project>is the Team Project within your AzDO organization
- Choose where is your code
- For this sample, Azure DevOps Repos was selected
- Choose which repository is your code
- For this sample, the repo is called MLOpsDatabricks
- Select Existing Azure Pipelines YAML file
- Select the branch and the file, in this case
/azdo_pipelines/build-train.yml, and click Continue - The YAML file will be shown. No changes are required to the file at this moment.
- Click Run
As of today, a YAML Azure DevOps pipeline needs to have its first run before you can set other properties, like variables and triggers.
You can either wait for the run to complete (it will fail) or you can cance it right after it was triggered. Just use the Cancel build button on the top right of the build page.
- After the build is stopped/finished, click on the elipisis and then Edit pipeline:
- You will see the build pipeline as below. Click again on the elipisis and then click on Variables
-
On Pipeline variables, you must add the following variables and its values:
DATABRICKS_DOMAINDATABRICKS_ACCESS_TOKEN(Secret)- About cluster usage:
DATABRICKS_CLUSTER_NAME_SUFFIX, if you want to create a new cluster each pipeline runDATABRICKS_CLUSTER_ID, if you want to reuse an existing cluster (recommended)
AML_WORKSPACE_NAMERESOURCE_GROUPSUBSCRIPTION_IDTENANT_IDSP_APP_IDSP_APP_SECRET(Secret)DATABRICKS_WORKSPACE_NAME
Make sure to protect
either DATABRICKS_ACCESS_TOKEN AND SP_APP_SECRET.
The pipeline won't work if these variables aren't protected. On the other hand,
do not protect other variables. If you do it, the pipeline won't be able to
read those as environment variables.
Tip
It's strongly recommended that you add sensitive data to Azure Key Vault and consume the values from the pipeline. Refer to this documentation if you want to implement it.
Tip
Unlike a normal variable, they are not automatically decrypted into environment variables for scripts. You can explicitly map them in, though.
This mapping is already done on the
build-train.ymlfile for this sample. Although it's worth mentioning for learning purposes. You can read more here.
There are other 5 variables that are already set with values at the YAML file:
SOURCES_DIRTRAIN_SCRIPT_PATHMODEL_DIRMODEL_NAMEDATABRICKS_COMPUTE_NAME_AML
You have two options to set different values to these variables:
This pipeline is set to override values for these 5 variables if you create
override variables for them. To do so, just create the variable on the designer
with the _OVERRIDE suffix. For example
MODEL_NAME_OVERRIDE = my-custom-name
You're also free to change the YAML file and modify the variable assignment. The piece of code that is responsible for assigning these variables on the YAML file is this one:
variables:
SOURCES_DIR: $[coalesce(variables['SOURCES_DIR_OVERRIDE'], '$(Build.SourcesDirectory)')]
TRAIN_SCRIPT_PATH: $[coalesce(variables['TRAIN_SCRIPT_PATH_OVERRIDE'], 'src/train/train.py')]
MODEL_DIR: $[coalesce(variables['MODEL_DIR_OVERRIDE'], '/dbfs/model')]
MODEL_NAME: $[coalesce(variables['MODEL_NAME_OVERRIDE'], 'MLOps-model')]
DATABRICKS_COMPUTE_NAME_AML: $[coalesce(variables['DATABRICKS_COMPUTE_NAME_AML_OVERRIDE'], 'ADB-Compute')]For more information about coalesce, refer to
this page.
- After you have set all the variables, you will end up with something similiar to the screen below:
You can now Save or Save & queue your pipeline.
After you pipeline is ran, you are supposed to see a summary like this:
This is a succeeded pipeline. Below are the details of the important tasks this pipeline runs, in case you need to troubleshoot any issue:
- Check code quality: Runs
flake8on the code and checks if there are any coding style and standard issues, according to PEP8. - Publish Test Results: Collects the
flake8analysis results and publishes it as a test result in case you need to see and troubleshoot any code analysis problem. - Initialize Databricks Cluster: whether you chose for creating a new one or using an existing, this task will provision and/or start the cluster so the training process can occur.
- Login to ADB CLI and create DBFS model directory: will take care of
installing and configuring
databricks-clion the agent, and also create thedbfs:/modeldirectory on the cluster. - Train model using AML with Remote Compute: invokes
train_pipeline.pyand uses the environment variable values to train the model using the given Databricks resources and publishes it to the Azure ML Service. - Remove DBFS model directory: makes sure that the cluster is clean after all the training work is done.
If you want to access the experiment job on Azure ML Service, open the logs of the task #5 and look for something similar to below:
To check details of the Pipeline run, go to [experiment-run-URL]If you open this URL, you will end up on the AML Experiment run summary:
Where:
- Is the Experiment Run status
- Is the Training step
- Is the Model Registration step
- Is the link to see details of each step run
- Is where you can check that both were steps ran on Databricks






