Skip to content

query test sets #266

@ekmpa

Description

@ekmpa

We want some pre-defined query test sets which we can use to assess the quality of our scores (when integrated in RAG).

This includes:

  • Actual benchmarks as listed in credibility-augmented RAG #262
  • Sanity test sets per domain which we ourselves know:
    • Set of academic websites, e.g our and our colleagues' websites
    • Set of queries related to tools we use or 'niche' subjects we know could be easily fooled
  • A new structure for test sets that focuses on testing the retrieval of new material (i.e, material that exceeds LLM's knowledge cutoff dates), e.g,
    • A ^ B pairs that we ask the LLM to compare (A v. B?)
    • Where A is a library or tool or new that happened in the past and is factually grounded
    • And B is a new version, library or concept post cut-off date
    • Cross domain: do tech tools, health debunked stuff, conspiracy theories, political news.

Metadata

Metadata

Assignees

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions