Skip to content

Add log-to-training data pipeline with quality filtering#3

Open
sharanyaa23 wants to merge 1 commit into
OpenAgriNet:mainfrom
sharanyaa23:log-data-pipeline
Open

Add log-to-training data pipeline with quality filtering#3
sharanyaa23 wants to merge 1 commit into
OpenAgriNet:mainfrom
sharanyaa23:log-data-pipeline

Conversation

@sharanyaa23
Copy link
Copy Markdown

Hi, I’ve added a small prototype for the log-to-training data pipeline.

The idea is to take raw agent logs and clean them before using them for training. The pipeline filters out incorrect or low-quality responses, removes PII and also flags inefficient multi-step behaviour like unnecessary tool calls.

It then outputs a clean JSONL dataset that can be directly used for fine-tuning.

I’ve included more details about the pipeline in the README.

This is a simple working version focused on data quality and can be extended further based on requirements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant