Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
---
title: Build ML Workflow Pipelines with Flyte and gRPC on Google Cloud C4A Axion processors

minutes_to_complete: 30

who_is_this_for: This is an introductory topic for developers, data engineers, and ML engineers who want to build scalable machine learning workflow pipelines on Arm64-based Google Cloud C4A Axion processors using Flyte workflow orchestration and gRPC-based microservices.

learning_objectives:
- Deploy Flyte workflow pipelines on Google Cloud C4A Axion processors
- Build distributed machine learning pipelines using Flyte tasks
- Implement gRPC-based services for feature engineering
- Integrate Flyte workflows with distributed services
- Run scalable ML pipelines on Arm-based cloud infrastructure

prerequisites:
- A [Google Cloud Platform (GCP)](https://cloud.google.com/free) account with billing enabled
- Basic familiarity with Python
- Basic understanding of machine learning pipelines
- Familiarity with Linux command-line operations

author: Pareena Verma

##### Tags
skilllevels: Introductory
subjects: ML
cloud_service_providers:
- Google Cloud

armips:
- Neoverse

tools_software_languages:
- Flyte
- Python
- gRPC

operatingsystems:
- Linux

# ================================================================================
# FIXED, DO NOT MODIFY
# ================================================================================

further_reading:
- resource:
title: Google Cloud documentation
link: https://cloud.google.com/docs
type: documentation

- resource:
title: Flyte documentation
link: https://docs.flyte.org/
type: documentation

- resource:
title: gRPC documentation
link: https://grpc.io/docs/
type: documentation

- resource:
title: Flyte GitHub repository
link: https://github.com/flyteorg/flyte
type: documentation

weight: 1
layout: "learningpathall"
learning_path_main_page: yes
---
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
---
# ================================================================================
# FIXED, DO NOT MODIFY THIS FILE
# ================================================================================
weight: 21 # Set to always be larger than the content in this path to be at the end of the navigation.
title: "Next Steps" # Always the same, html page title.
layout: "learningpathall" # All files under learning paths have this same wrapper for Hugo processing.
---
Original file line number Diff line number Diff line change
@@ -0,0 +1,166 @@
---
title: ML Pipeline Architecture
weight: 8

### FIXED, DO NOT MODIFY
layout: learningpathall
---

# ML Pipeline Architecture

In this section, you explore the architecture behind the distributed machine learning pipeline built using Flyte and gRPC on Google Axion Arm-based infrastructure.

This architecture demonstrates how modern ML workflows are orchestrated using workflow engines while delegating specific tasks to distributed services.

Flyte manages the pipeline orchestration, while gRPC enables efficient communication between workflow tasks and external services.


## System architecture

The ML pipeline consists of several tasks executed sequentially within the Flyte workflow.

```text
Flyte Workflow Engine
Dataset Loader Task
Data Preprocessing Task
Feature Engineering Service (gRPC)
Model Training Task
Model Evaluation Task
Pipeline Result
```

Each component in the workflow performs a specific function within the machine learning pipeline.

## Components

### Flyte workflow engine
Flyte orchestrates the pipeline execution. It manages task dependencies, workflow execution, and data flow between tasks.

Key capabilities include:

- defining ML pipelines as Python workflows
- managing task dependencies
- enabling reproducible ML experiments
- scaling pipeline execution

### Dataset loader
The dataset loader task simulates loading a training dataset that will be used for model training.

In real ML systems, this step might include:

- loading datasets from object storage
- retrieving data from data lakes
- accessing distributed datasets

### Data preprocessing
Data preprocessing transforms raw data into a format suitable for model training.

Typical preprocessing steps include:

- cleaning data
- normalizing values
- handling missing data
- encoding categorical variables

### Feature engineering service (gRPC)
Feature engineering is implemented as a gRPC microservice.

This design allows feature-generation logic to run independently of the workflow engine.

Benefits include:

- scalable feature generation
- reusable feature services
- independent scaling of compute resources
- low-latency communication using gRPC

### Model training
The training task uses generated features to train a machine learning model.

In production systems, this stage might include:

- training regression models
- training classification models
- training deep learning models

### Model evaluation
The evaluation step measures model performance.

Typical evaluation metrics include:

- accuracy
- precision
- recall
- F1 score

Based on the results, the workflow can determine whether to retrain the model.

## Pipeline execution flow

The ML pipeline follows this execution sequence.

```text
Load Dataset
Preprocess Data
Feature Engineering (gRPC Service)
Model Training
Model Evaluation
Pipeline Result
```

Each task executes sequentially while Flyte manages the workflow orchestration.

## Benefits of this architecture

This architecture provides several advantages:

- scalable ML pipeline orchestration
- distributed feature engineering services
- modular pipeline components
- efficient task communication using gRPC
- reproducible machine learning workflows

## Running on Axion
This example demonstrates how machine learning workflows can run efficiently on Google Axion Arm-based processors.

Benefits include:

- high performance per watt
- efficient execution of data pipelines
- scalable infrastructure for ML workloads
- optimized performance for modern cloud applications

## What you've learned

In this section, you explored the architecture behind the ML training pipeline.

You learned how:

- Flyte orchestrates ML workflows
- gRPC services enable distributed feature engineering
- pipeline tasks interact through workflow dependencies
- ML pipelines can scale across distributed infrastructure

This architecture underpins modern distributed machine learning systems running on Arm-based cloud infrastructure.
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
---
title: Get started with Flyte ML Workflow Pipelines with gRPC on Google Axion C4A
weight: 2

layout: "learningpathall"
---

## Explore Axion C4A Arm instances in Google Cloud

Google Axion C4A is a family of Arm-based virtual machines built on Google’s custom Axion CPU, which is based on Arm Neoverse-V2 cores. Designed for high-performance and energy-efficient computing, these virtual machines offer strong performance for data-intensive and analytics workloads such as big data processing, in-memory analytics, columnar data processing, and high-throughput data services.

The C4A series provides a cost-effective alternative to x86 virtual machines while leveraging the scalability, SIMD acceleration, and memory bandwidth advantages of the Arm architecture in Google Cloud.

These characteristics make Axion C4A instances well-suited for modern analytics stacks that rely on columnar data formats and memory-efficient execution engines.

To learn more, see the Google blog [Introducing Google Axion Processors, our new Arm-based CPUs](https://cloud.google.com/blog/products/compute/introducing-googles-new-arm-based-cpu).

## Explore Flyte ML Workflow Pipelines with gRPC on Google Axion C4A (Arm Neoverse V2)

Flyte is an open-source workflow orchestration platform used to build scalable and reproducible data and machine learning pipelines. It allows developers to define workflows as Python tasks, simplifying the management of complex ML processes such as data preparation, feature engineering, and model training.

gRPC enables fast communication between distributed services within these pipelines. Running Flyte with gRPC on Google Axion C4A Arm-based processors provides efficient, scalable infrastructure for executing modern ML workflows and distributed data processing tasks.

To learn more, visit the [Flyte documentation](https://docs.flyte.org/) and explore the [gRPC documentation](https://grpc.io/docs/) to understand how distributed service communication enables scalable machine learning workflows.

## What you've learned and what's next

In this section, you learned about:

* Google Axion C4A Arm-based VMs and their performance characteristics
* Flyte as a workflow orchestration platform for machine learning pipelines
* gRPC as a communication layer for distributed services
* How Flyte and gRPC can be used together to build scalable ML training pipelines

Next, You will deploy Flyte tools, create a gRPC-based feature engineering service, and build a distributed ML workflow pipeline that orchestrates data processing and model training tasks on Axion infrastructure.
Loading