Skip to content

Latest commit

 

History

History
94 lines (84 loc) · 3.3 KB

File metadata and controls

94 lines (84 loc) · 3.3 KB

Workflow Plan for Creating a DPH Data Product

This document outlines the end-to-end workflow for creating and publishing a new data product in IBM Data Product Hub (DPH) using Git-native capabilities with GitHub. The workflow leverages AI coding agents and CI/CD pipelines to automate the process.

Step 1: User Prompt to AI Agent

  • User Input: Provide a detailed prompt to the AI agent (e.g., Claude) specifying the data product requirements.
    • Example:
      Create a new DPH data product called invoice_collection_risk_14d that estimates unpaid invoices due in the next 14 days. Owner is Finance Ops. Internal-only sharing.
      

Step 2: AI Agent Generates Artifacts

  • Artifacts Created:
    1. Contract File: Defines the metadata and policies for the data product.
      • Path: contracts/invoice_collection_risk_14d.yaml
      • Example Content:
        metric: invoice_collection_risk_14d
        owner: finance_ops
        description: value of unpaid invoices due within 14 days
        semantic_query: metrics.invoice_collection_risk_14d
        policy: internal_only
        endpoint: /metrics/invoice_collection_risk_14d
    2. SQL Definition: Implements the metric logic.
      • Path: metrics/invoice_collection_risk_14d.sql
      • Example Content:
        SELECT SUM(invoice_amount)
        FROM finance.invoices
        WHERE payment_status != 'paid'
        AND due_date <= CURRENT_DATE + 14;

Step 3: AI Agent Opens Pull Request

  • Action: The AI agent commits the artifacts to the Git repository and opens a pull request (PR).
  • PR Content:
    • Contract file
    • SQL definition
    • Metadata
    • Generated API specification

Step 4: AI Code Review

  • Checks Performed:
    1. SQL correctness
    2. Contract schema validation
    3. Policy compliance
    4. Security risks
    5. Breaking changes
  • Example Review Comment:
    • ⚠ Query may expose customer identifiers. Recommendation: remove customer_id column or apply masking.
  • AI Agent Updates: Fixes issues and updates the PR.

Step 5: Human Approval

  • Reviewer: Data steward (e.g., Finance Ops team).
  • Action: Approves the PR after verifying the metric definition.
  • Outcome: PR is merged into the main branch.

Step 6: CI/CD Pipeline Deployment

  • Pipeline Tasks:
    1. Validate contract
    2. Test semantic query
    3. Publish to DPH
  • Outcome: DPH registers the new data product.

Step 7: API Generation and Consumption

  • DPH Action: Exposes a governed API for the data product.
    • Example Endpoint: GET /metrics/invoice_collection_risk_14d
    • OpenAPI specification is generated automatically.
  • AI Agent Usage:
    • Example Query: "What is the invoice collection risk over the next 14 days?"
    • DPH resolves the query using the contract, semantic query, policy, and execution.

Result for the User

  • Achievements:
    • New governed metric
    • Reviewed SQL
    • Version-controlled contract
    • Deployed data product
    • API for agents
  • Manual Effort Eliminated:
    • Editing YAML
    • Opening PRs
    • Validating SQL
    • Publishing in UI

Architecture Overview

  1. AI Agent (Claude Code)
  2. GitHub Repository
  3. AI Code Review (Claude Code Review)
  4. CI/CD Validation
  5. DPH Runtime
  6. Governed Data APIs
  7. AI Agents and Applications