Large LAnguage MOdels for Reference Extraction
A framework to extract and evaluate scientific references and citations from free-form text and PDFs using LLM/VLMs.
pip install llamoreA few things you can do with Llamore.
Define your extractor. You can use the OpenaiExtractor for most of the open model serving frameworks like Ollama, vLLM, etc.
from llamore import GeminiExtractor, OpenaiExtractor
extractor = GeminiExtractor(api_key="MY_GEMINI_API_KEY")Extract references from a PDF or a raw input string.
references = extractor(pdf="path/to/my.pdf")or
text = """4 I have explored the gendered nature of citizenship at greater length in two complementary
papers: ‘Embodying the Citizen’ in Public and Private: Feminist Legal Debates, ed. M.
Thornton (1995) and ‘Historicising Citizenship: Remembering Broken Promises’ (1996) 20
Melbourne University Law Rev. 1072."""
references = extractor(text=text)references.to_xml("./my_references.xml")from llamore import F1
f1 = F1(levenshtein_distance=0.9)
f1.compute_macro_average(references, gold_references)
# or compute metrics per field
f1.compute_micro_average(references, gold_references)You can also have a look at the quick start notebook.
Llamore internally defines a reference via a pydantic BaseModel in llamore.reference.Reference.
It is based on the TEI biblStruct model. Schema files are published on this repository's GitHub page:
