Skip to content

Conversation

@SeanClay10
Copy link
Collaborator

Summary:
Implements a minimal LLM-based pipeline for extracting key metrics from preprocessed predator diet survey texts using a Ollama client.

Changes:

  • Added LLM script with structured extraction using Pydantic schemas
  • Extracts: species name, study location, study date, empty/non-empty stomach counts, and sample size
  • Includes post-processing validation to correct LLM calculation errors and compute fraction of feeding predators

Usage:
python src/llm/local_llm.py data/processed-text/paper.txt

Next Steps

This provides the foundational structure for LLM-based extraction. Responses are not the best right now due to preprocessing difficulties and the sheer number of tokens that are sent to the model from the long papers, so changes to the prompts and how the preprocessed text is passed to the LLM needs to be looked into.

Copy link
Collaborator

@raymondcen raymondcen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looks good. The only thing I can really pick out from this are better prompts but we can always change that down the road if we find a better model. We can also just use AI to generate a better prompt. Besides that it looks great.

@SeanClay10 SeanClay10 merged commit 9ab3b1e into main Jan 25, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants