Skip to content

[DMP 2026]: Logs-to-training pipeline for agentic setups #1

@Gautam-Rajeev

Description

@Gautam-Rajeev

Ticket Contents

Description

Build a pipeline that ingests production logs (question answering and agentic setup sessions), removes personally identifiable information and other sensitive content, and produces training datasets suitable for improving a language model used in that setup. Agentic portions must be treated as trajectory-oriented behavior cloning (state, tool actions, observations, multi-step recovery), not only single-turn instruction pairs. Exports must support supervised fine-tuning with LoRA and Direct Preference Optimization, and the data design should support eventually training a smaller model to replace a larger teacher model while preserving behavior. The work includes defining schemas, tagging compositional trajectory complexity, diversity and scheduling hooks for training, and validation so trajectories are consistent with tool results where possible.

Goals & Mid-Point Milestone

Goals

  • log event schema and parsers for Q&A and agentic traces (user, assistant, tool call, tool result, errors).
  • PII detection, redaction or placeholder replacement, and audit sampling workflow with documented residual risk.
  • Export paths for SFT (LoRA-ready) JSONL and DPO JSONL aligned to one chat template and inference format.
  • Trajectory tagging for compositional complexity (steps, tools, ambiguity, error recovery) and stratified sampling setup for diversity
  • Goals Achieved By Mid-point Milestone: working end-to-end prototype on a sampled log subset—PII-stripped JSONL for SFT and a small validated DPO pair set; documented schemas; gold-set alignment for one representative agent workflow; automated checks for schema, tool name validity, and basic trajectory consistency.

Setup/Installation

NA

Expected Outcome

A documented, repeatable pipeline runnable against designated log sources. It normalizes logs, segments Q&A versus agent trajectories, applies configurable PII rules and review hooks, and emits versioned datasets. SFT exports are valid multi-turn chat or completion records matching production templates including tool syntax. DPO exports provide shared prompts with chosen and rejected completions from governed sources (feedback, failure pairs, or approved synthetics). Each row or shard carries metadata for trajectory complexity and domain so trainers can apply staged mixtures or model-aware sampling without ad-hoc rewrites. Evaluation hooks or scripts exist to compare teacher and student checkpoints on held-out behavioral and tool-use checks. Student-training filters respect smaller context and tool sets.

Acceptance Criteria

  • No training artifact ships without passing the configured PII pipeline and a documented audit sample.
  • SFT JSONL validates against the chosen trainer dry run (LoRA) on toy and production-shaped samples without template mismatch.
  • DPO JSONL validates against a small DPO dry run with required prompt, chosen, and rejected fields.
  • Agent trajectories excluded or flagged when tool calls contradict observations or fail schema validation.
  • Documentation lists field definitions, split strategy (no near-duplicate leakage across train and preference sets), and how complexity tags map to recommended training schedules.
  • Clear criteria defined for when a smaller replacement model is acceptable relative to the teacher on the agreed eval set.

Implementation Details

Python-first CLI or service modules for ingest, transform, and export. Rule-based and model-assisted PII detection as appropriate; consistent synthetic placeholders. JSONL as primary interchange. Optional integration with Hugging Face datasets, PEFT, and TRL-style trainers for validation only unless scope expands. Tagging pipeline computes step counts, tool sets, recovery flags, and optional loss-based hardness when a reference model is available. No storage of raw secrets in config or outputs. Tests for redaction, schema validation, and split integrity.

Mockups/Wireframes

Not applicable.

Product Name

OpenAgriNet

Organisation Name

COSS

Domain

⁠Agriculture

Tech Skills Needed

Python

Mentor(s)

@Gautam-Rajeev

Category

Data Science

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions