LLKMS - Language Learning Knowledge Management System

LLKMS is a powerful tool for processing and querying documents in various formats, designed to support language learning and knowledge management. It integrates with Amazon S3 for cloud storage, uses LangChain and FAISS for advanced Retrieval Augmented Generation (RAG), and supports configurable language models like OpenAI and DeepSeek. Whether you’re a learner, researcher, or knowledge enthusiast, LLKMS makes it easy to manage and extract insights from your documents.

Key Features

Multi-format Support: Process .pdf, .txt, .png/.jpg/.jpeg (with OCR), .docx, and .html/.htm files.
Cloud Integration: Seamlessly connect to Amazon S3 for document storage and retrieval.
Smart Retrieval: Leverage RAG with FAISS for fast, context-aware answers (limited to three sentences).
Flexible Models: Use language models from OpenAI, DeepSeek, or others via a configurable ModelFactory.
Usage Tracking: Monitor token usage and API costs with a summary on exit.
Detailed Logging: Comprehensive logs for debugging and transparency (logs/llkms.log).

Prerequisites

Python: 3.9 or higher
API Keys:
- OpenAI (OPENAI_API_KEY) or DeepSeek (DEEPSEEK_API_KEY)
- AWS (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY) for S3
Tesseract OCR: For image processing (install separately)

Installation

Clone the Repository

git clone https://github.com/Butterski/llkms.git
cd llkms

Install Dependencies
```
pip install -r requirements.txt
```

Set Up Environment Variables

Copy the example .env file:
```
cp .env.example .env
```

Edit .env with your credentials:

AWS_ACCESS_KEY_ID=your_aws_access_key
AWS_SECRET_ACCESS_KEY=your_aws_secret_key
OPENAI_API_KEY=your_openai_api_key        # Optional
DEEPSEEK_API_KEY=your_deepseek_api_key    # Optional

Install Tesseract OCR
- See the Tesseract installation guide for your OS.

Usage

Start the Application
```
python src/llkms/main.py
```
- Loads config.yaml, connects to S3, processes documents, and opens an interactive menu.
Query Your Documents
- Select "RAG Pipeline with S3" from the menu.
- Ask questions (e.g., "What’s in my documents?") and get concise answers.
- Optionally view retrieved documents.
- Type quit to exit and see usage stats.
Force Reindexing
- Rebuild the vector store (skips cache):
```
python src/llkms/main.py --reindex
```

Configuration

Customize settings in config.yaml:

AWS: Bucket (eng-llkms), prefix (knowledge)
Model: Provider (deepseek/openai), model name, temperature, max tokens
App: Temp directory (temp), vector store cache (vector_store_cache)

Example snippet:

aws:
  bucket: eng-llkms
  prefix: knowledge
model:
  provider: deepseek
  model: deepseek-chat
  temperature: 0.7
  max_tokens: 1024

Usage and Cost Tracking

LLKMS tracks:

Total Tokens: All tokens used
Prompt/Completion Tokens: Detailed breakdown
Requests: Number of successful API calls
Cost: Estimated USD cost
View the summary when exiting the app.

How It Works

Downloads documents from S3 to a temp directory.
Processes files into chunks using RecursiveCharacterTextSplitter.
Indexes chunks with FAISS for efficient retrieval.
Answers queries via a RAG pipeline with your chosen language model.

Contributing

Fork the repo: https://github.com/Butterski/llkms
Create a branch: git checkout -b feature/your-feature
Commit changes: git commit -m "Add your feature"
Push: git push origin feature/your-feature
Submit a Pull Request.

Acknowledgments

LangChain: RAG and document processing framework
OpenAI: Optional LLM provider
DeepSeek: Default LLM provider
FAISS: Vector storage
Tesseract OCR: Image text extraction
Questionary: Interactive CLI

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
.vscode		.vscode
artifacts		artifacts
src/llkms		src/llkms
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
__init__.py		__init__.py
config.yaml		config.yaml
pyproject.toml		pyproject.toml
readme.md		readme.md
requirements.txt		requirements.txt
setup.py		setup.py
sources.md		sources.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLKMS - Language Learning Knowledge Management System

Key Features

Prerequisites

Installation

Usage

Configuration

Usage and Cost Tracking

How It Works

Contributing

Acknowledgments

Resources

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LLKMS - Language Learning Knowledge Management System

Key Features

Prerequisites

Installation

Usage

Configuration

Usage and Cost Tracking

How It Works

Contributing

Acknowledgments

Resources

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages