This repository contains multiple research projects and tools for the Lanka Data Foundation.
-
Legislation (
legislation/): Acts Navigation & Analysis system. -
OCR (
hugging-face-deepseek-ocr/): OCR Experiments and Tools.
Each project (e.g., legislation) is structured as its own module with its own:
environment.ymlorrequirements.txtDEVELOPER.md(Project-specific setup)tests/
- Root
DEVELOPER.md: This file. High-level overview and links. - Project
DEVELOPER.md: Specific instructions for setting up and running a single project.
- Mamba/Conda-lock: Use strictly defined environments to avoid conflict between research tools.
- Pre-commit: Ensure hooks are enabled to catch linting issues early.
- Git Submodules: Some external tools/data might be included as submodules; always run
git submodule update --init --recursiveafter cloning.
- Data Separation: Big datasets should not be committed. Use
dvcor specificdata/directories ignored by git. - Secrets: Never commit API keys. Use
.envfiles andpython-dotenv.
-
Clone this repository:
git clone <repo-url> cd research
-
Navigate to your specific project of interest (e.g.,
legislation) and follow itsDEVELOPER.md.