You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Build a pipeline that ingests production logs (question answering and agentic setup sessions), removes personally identifiable information and other sensitive content, and produces training datasets suitable for improving a language model used in that setup. Agentic portions must be treated as trajectory-oriented behavior cloning (state, tool actions, observations, multi-step recovery), not only single-turn instruction pairs. Exports must support supervised fine-tuning with LoRA and Direct Preference Optimization, and the data design should support eventually training a smaller model to replace a larger teacher model while preserving behavior. The work includes defining schemas, tagging compositional trajectory complexity, diversity and scheduling hooks for training, and validation so trajectories are consistent with tool results where possible.
Goals & Mid-Point Milestone
Goals
log event schema and parsers for Q&A and agentic traces (user, assistant, tool call, tool result, errors).
PII detection, redaction or placeholder replacement, and audit sampling workflow with documented residual risk.
Export paths for SFT (LoRA-ready) JSONL and DPO JSONL aligned to one chat template and inference format.
Trajectory tagging for compositional complexity (steps, tools, ambiguity, error recovery) and stratified sampling setup for diversity
Goals Achieved By Mid-point Milestone: working end-to-end prototype on a sampled log subset—PII-stripped JSONL for SFT and a small validated DPO pair set; documented schemas; gold-set alignment for one representative agent workflow; automated checks for schema, tool name validity, and basic trajectory consistency.
Setup/Installation
NA
Expected Outcome
A documented, repeatable pipeline runnable against designated log sources. It normalizes logs, segments Q&A versus agent trajectories, applies configurable PII rules and review hooks, and emits versioned datasets. SFT exports are valid multi-turn chat or completion records matching production templates including tool syntax. DPO exports provide shared prompts with chosen and rejected completions from governed sources (feedback, failure pairs, or approved synthetics). Each row or shard carries metadata for trajectory complexity and domain so trainers can apply staged mixtures or model-aware sampling without ad-hoc rewrites. Evaluation hooks or scripts exist to compare teacher and student checkpoints on held-out behavioral and tool-use checks. Student-training filters respect smaller context and tool sets.
Acceptance Criteria
No training artifact ships without passing the configured PII pipeline and a documented audit sample.
SFT JSONL validates against the chosen trainer dry run (LoRA) on toy and production-shaped samples without template mismatch.
DPO JSONL validates against a small DPO dry run with required prompt, chosen, and rejected fields.
Agent trajectories excluded or flagged when tool calls contradict observations or fail schema validation.
Documentation lists field definitions, split strategy (no near-duplicate leakage across train and preference sets), and how complexity tags map to recommended training schedules.
Clear criteria defined for when a smaller replacement model is acceptable relative to the teacher on the agreed eval set.
Implementation Details
Python-first CLI or service modules for ingest, transform, and export. Rule-based and model-assisted PII detection as appropriate; consistent synthetic placeholders. JSONL as primary interchange. Optional integration with Hugging Face datasets, PEFT, and TRL-style trainers for validation only unless scope expands. Tagging pipeline computes step counts, tool sets, recovery flags, and optional loss-based hardness when a reference model is available. No storage of raw secrets in config or outputs. Tests for redaction, schema validation, and split integrity.
Ticket Contents
Description
Build a pipeline that ingests production logs (question answering and agentic setup sessions), removes personally identifiable information and other sensitive content, and produces training datasets suitable for improving a language model used in that setup. Agentic portions must be treated as trajectory-oriented behavior cloning (state, tool actions, observations, multi-step recovery), not only single-turn instruction pairs. Exports must support supervised fine-tuning with LoRA and Direct Preference Optimization, and the data design should support eventually training a smaller model to replace a larger teacher model while preserving behavior. The work includes defining schemas, tagging compositional trajectory complexity, diversity and scheduling hooks for training, and validation so trajectories are consistent with tool results where possible.
Goals & Mid-Point Milestone
Goals
Setup/Installation
NA
Expected Outcome
A documented, repeatable pipeline runnable against designated log sources. It normalizes logs, segments Q&A versus agent trajectories, applies configurable PII rules and review hooks, and emits versioned datasets. SFT exports are valid multi-turn chat or completion records matching production templates including tool syntax. DPO exports provide shared prompts with chosen and rejected completions from governed sources (feedback, failure pairs, or approved synthetics). Each row or shard carries metadata for trajectory complexity and domain so trainers can apply staged mixtures or model-aware sampling without ad-hoc rewrites. Evaluation hooks or scripts exist to compare teacher and student checkpoints on held-out behavioral and tool-use checks. Student-training filters respect smaller context and tool sets.
Acceptance Criteria
Implementation Details
Python-first CLI or service modules for ingest, transform, and export. Rule-based and model-assisted PII detection as appropriate; consistent synthetic placeholders. JSONL as primary interchange. Optional integration with Hugging Face datasets, PEFT, and TRL-style trainers for validation only unless scope expands. Tagging pipeline computes step counts, tool sets, recovery flags, and optional loss-based hardness when a reference model is available. No storage of raw secrets in config or outputs. Tests for redaction, schema validation, and split integrity.
Mockups/Wireframes
Not applicable.
Product Name
OpenAgriNet
Organisation Name
COSS
Domain
Agriculture
Tech Skills Needed
Python
Mentor(s)
@Gautam-Rajeev
Category
Data Science