This document outlines the end-to-end workflow for creating and publishing a new data product in IBM Data Product Hub (DPH) using Git-native capabilities with GitHub. The workflow leverages AI coding agents and CI/CD pipelines to automate the process.
- User Input: Provide a detailed prompt to the AI agent (e.g., Claude) specifying the data product requirements.
- Example:
Create a new DPH data product called invoice_collection_risk_14d that estimates unpaid invoices due in the next 14 days. Owner is Finance Ops. Internal-only sharing.
- Example:
- Artifacts Created:
- Contract File: Defines the metadata and policies for the data product.
- Path:
contracts/invoice_collection_risk_14d.yaml - Example Content:
metric: invoice_collection_risk_14d owner: finance_ops description: value of unpaid invoices due within 14 days semantic_query: metrics.invoice_collection_risk_14d policy: internal_only endpoint: /metrics/invoice_collection_risk_14d
- Path:
- SQL Definition: Implements the metric logic.
- Path:
metrics/invoice_collection_risk_14d.sql - Example Content:
SELECT SUM(invoice_amount) FROM finance.invoices WHERE payment_status != 'paid' AND due_date <= CURRENT_DATE + 14;
- Path:
- Contract File: Defines the metadata and policies for the data product.
- Action: The AI agent commits the artifacts to the Git repository and opens a pull request (PR).
- PR Content:
- Contract file
- SQL definition
- Metadata
- Generated API specification
- Checks Performed:
- SQL correctness
- Contract schema validation
- Policy compliance
- Security risks
- Breaking changes
- Example Review Comment:
- ⚠ Query may expose customer identifiers. Recommendation: remove
customer_idcolumn or apply masking.
- ⚠ Query may expose customer identifiers. Recommendation: remove
- AI Agent Updates: Fixes issues and updates the PR.
- Reviewer: Data steward (e.g., Finance Ops team).
- Action: Approves the PR after verifying the metric definition.
- Outcome: PR is merged into the main branch.
- Pipeline Tasks:
- Validate contract
- Test semantic query
- Publish to DPH
- Outcome: DPH registers the new data product.
- DPH Action: Exposes a governed API for the data product.
- Example Endpoint:
GET /metrics/invoice_collection_risk_14d - OpenAPI specification is generated automatically.
- Example Endpoint:
- AI Agent Usage:
- Example Query: "What is the invoice collection risk over the next 14 days?"
- DPH resolves the query using the contract, semantic query, policy, and execution.
- Achievements:
- New governed metric
- Reviewed SQL
- Version-controlled contract
- Deployed data product
- API for agents
- Manual Effort Eliminated:
- Editing YAML
- Opening PRs
- Validating SQL
- Publishing in UI
- AI Agent (Claude Code)
- GitHub Repository
- AI Code Review (Claude Code Review)
- CI/CD Validation
- DPH Runtime
- Governed Data APIs
- AI Agents and Applications