MLOPS-END-TO-END-ML-PIPELINE

Building pipeline :

Create a github repo and clone it in local (Add experiments)
Add src folder along with all components(run them individually)
Add data,models,reports directories to .gitignore file
Since changes are made, now do git add,commit,push

Setting up DVC pipeline (without params):

Create dvc.yaml file and add stages to it.
dvc init then do "dvc repro" to test the pipeline automation. (check dvc dag)
Now git add, commit, push

Setting up dvc pipeline (with params)

add params.yaml file
Add the params setup (mentioned below)
Do "dvc repro" again to test the pipeline along with the params
Now git add, commit, push

Expermients with DVC:

pip install dvclive
Add the dvclive code block (mentioned below)
Do "dvc exp run", it will create a new dvc.yaml(if already not there) and dvclive directory (each run will be considered as an experiment by DVC)
Do "dvc exp show" on terminal to see the experiments or use extension on VSCode (install dvc extension)
Do "dvc exp remove {exp-name}" to remove exp (optional) | "dvc exp apply {exp-name}" to reproduce prev exp
Change params, re-run code (produce new experiments)
Now git add, commit, push

Adding Remote S3 Storage to DVC

Login to AWS console
Create an IAM User
Create S3 bucket
To connect DVC to S3

pip install dvc[s3]

To connect to aws

pip install awscli

Configure IAM User with project

aws configure

sets the remote storage address in .dvc/config

dvc remote add -d dvcstore s3://bucketname

dvc commit,push the exp outcome that you want to keep dvc tracks the outcome of each component in the pipeline, the data that belongs to the pipeline is tracked.

dvc commit
dvc push

Finally git add,commit,push
To rollback to a previous code version, fetch the commit hash and then do

dvc pull

Logging setup

# Ensure the "logs" directory exists
log_dir = 'logs'
os.makedirs(log_dir, exist_ok=True)

# logging configuration
logger = logging.getLogger('model_building')
logger.setLevel('DEBUG')

console_handler = logging.StreamHandler()
console_handler.setLevel('DEBUG')

log_file_path = os.path.join(log_dir, 'model_building.log')
file_handler = logging.FileHandler(log_file_path)
file_handler.setLevel('DEBUG')

formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
console_handler.setFormatter(formatter)
file_handler.setFormatter(formatter)

logger.addHandler(console_handler)
logger.addHandler(file_handler)
logger.propagate = False

Params setup

params.yaml setup:

import yaml
add func:

def load_params(params_path: str) -> dict:
    """Load parameters from a YAML file."""
    try:
        with open(params_path, 'r') as file:
            params = yaml.safe_load(file)
        logger.debug('Parameters retrieved from %s', params_path)
        return params
    except FileNotFoundError:
        logger.error('File not found: %s', params_path)
        raise
    except yaml.YAMLError as e:
        logger.error('YAML error: %s', e)
        raise
    except Exception as e:
        logger.error('Unexpected error: %s', e)
        raise

Add to main():


# data_ingestion
params = load_params(params_path='params.yaml')
test_size = params['data_ingestion']['test_size']

# feature_engineering
params = load_params(params_path='params.yaml')
max_features = params['feature_engineering']['max_features']

# model_building
params = load_params('params.yaml')['model_building']

DVCLIVE code block (to be added to model evaluation stage)

import dvclive and yaml:

from dvclive import Live
import yaml

Add the load_params function and initiate "params" var in main
Add below code block to main:

with Live(save_dvc_exp=True) as live:
    live.log_metric('accuracy', accuracy_score(y_test, y_test))
    live.log_metric('precision', precision_score(y_test, y_test))
    live.log_metric('recall', recall_score(y_test, y_test))

    live.log_params(params)

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.dvc		.dvc
dvclive		dvclive
experiments		experiments
src		src
.dvcignore		.dvcignore
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
dvc.lock		dvc.lock
dvc.yaml		dvc.yaml
image-1.png		image-1.png
image-2.png		image-2.png
image-3.png		image-3.png
image.png		image.png
params.yaml		params.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MLOPS-END-TO-END-ML-PIPELINE

Building pipeline :

Setting up DVC pipeline (without params):

Setting up dvc pipeline (with params)

Expermients with DVC:

Adding Remote S3 Storage to DVC

Logging setup

Params setup

DVCLIVE code block (to be added to model evaluation stage)

Visualising the pipeline (how components are connected to each other)

Live Experiment tracking with DVCLIVE output

Pushed data to AWS S3 bucket

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MLOPS-END-TO-END-ML-PIPELINE

Building pipeline :

Setting up DVC pipeline (without params):

Setting up dvc pipeline (with params)

Expermients with DVC:

Adding Remote S3 Storage to DVC

Logging setup

Params setup

DVCLIVE code block (to be added to model evaluation stage)

Visualising the pipeline (how components are connected to each other)

Live Experiment tracking with DVCLIVE output

Pushed data to AWS S3 bucket

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages