Skip to content

Add safe log-to-training demo pipeline#10

Open
aayushprsingh wants to merge 1 commit into
OpenAgriNet:mainfrom
aayushprsingh:demo-safe-training-pipeline
Open

Add safe log-to-training demo pipeline#10
aayushprsingh wants to merge 1 commit into
OpenAgriNet:mainfrom
aayushprsingh:demo-safe-training-pipeline

Conversation

@aayushprsingh
Copy link
Copy Markdown

Summary

Adds a small, dependency-free demo pipeline for the DMP 2026 logs-to-training project.

The demo shows the expected end-to-end flow on toy agent logs:

  • ingest JSONL agent/Q&A sessions
  • redact basic PII and secrets
  • preserve tool calls + tool observations as trajectory data
  • emit SFT/LoRA-style JSONL
  • emit DPO-style JSONL
  • tag trajectory complexity
  • write a validation report with PII counts and schema errors

Files

  • scripts/log_to_training_data.js — demo pipeline
  • samples/agent_logs.jsonl — toy Q&A + agentic logs
  • schemas/training_records.schema.json — minimal schema contract
  • out/*.jsonl and out/validation_report.json — generated example artifacts
  • DEMO.md — run instructions and design notes

Verification

Ran locally:

node scripts/log_to_training_data.js samples/agent_logs.jsonl out

Output:

Processed 2 sessions
PII redactions: {"email":1,"phone":1,"name":1,"api_key":1}
Wrote out/sft.jsonl, out/dpo.jsonl, out/validation_report.json

out/validation_report.json shows zero validation errors.

Note

This is intentionally small so mentors can review direction quickly. If this aligns with the project direction, next step can be making schema validation stricter and separating redaction, segmentation, and export into modules.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant