This project aims to facilitate the retrieval of Land Matrix data through natural language queries.
This repository provides several resources:
- An end-to-end Streamlit application with optimal configuration. Explanations.
- A pipeline to reproduce our benchmark of models and methods. See below
- Educational notebooks that describe all the tasks needed for the entire pipeline. Explanations.
git clone https://github.com/tetis-nlp/landmatrix-graphql-python.git-
Installation of the Python environment
conda create -n landmatrix python=3.9 pandas scikit-learn spacy streamlit conda activate landmatrix conda install -c conda-forge sentence-transformers pip install transformers faiss-cpu pip install ollama pip install langchain-openai pip install langchain-community pip install openpyxl
-
Downloading the Spacy model
python -m spacy download en_core_web_sm
-
Installation and launch of Ollama
curl -fsSL https://ollama.com/install.sh | sh ollama serve ollama pull llama3:8b -
Configure API keys (only compatible with chat ISDM): add your own ISDM API keys (without
")cp credentials.ini.default credentials.ini vim credentials.ini
python src/experiments.py - Monitore your pipeline :
tail -f logs/pipeline.log - Stop the pipeline: Kill all the subprocess:
pkill -f src/
| Publication Type | language | Link |
|---|---|---|
| Preprint (Arxiv) | English version | Link |
| EGC'2025 | French version | Link |
