This guide will walk you through setting up the VCell AI Platform for development or simply running locally, including environment configuration, Langfuse integration, and local LLM setup.
Before you begin, ensure you have the following installed:
- Docker & Docker Compose - For running the full stack
- Node.js 18+ - For frontend development
- Python 3.12+ - For backend development
- Git - For version control
- Poetry - For Python dependency management (install via
pip install poetry)
Clone the repo with submodules (important for Langfuse):
git clone --recurse-submodules https://github.com/virtualcell/VCell-AI.git
cd VCell-AILangfuse is used for LLM observability, tracing, and analytics.
You can use either Langfuse Cloud or self-hosted:
-
Option A – Langfuse Cloud: Sign up here
-
Option B – Self-hosted Langfuse:
cd langfuse docker compose up
- Go to your Langfuse Cloud project or your self-hosted instance (
http://localhost:3000). - Create a new project.
- Copy your API keys from Project Settings.
Example .env values:
LANGFUSE_SECRET_KEY=sk-lf-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
LANGFUSE_PUBLIC_KEY=pk-lf-yyyyyyyy-yyyy-yyyy-yyyy-yyyyyyyyyyyy
LANGFUSE_HOST="http://localhost:3000"When these environment variables are set, the backend automatically integrates with Langfuse. It will track:
- LLM API calls and responses
- Token usage and costs
- Response times and quality metrics
- Tool usage patterns
Open your Langfuse dashboard at your hosted URL or http://localhost:6333.
You can monitor:
- Traces: Individual LLM interactions
- Scores: Quality metrics and feedback
- Costs: Token usage and API expenses
- Analytics: Usage patterns and performance
Create and configure the backend environment file:
cd backend
cp .env.example .envEdit backend/.env with your configuration
Create and configure the frontend environment file:
cd frontend
cp .env.example .env.localEdit frontend/.env.local with your configuration
Ollama is the easiest way to run local LLMs on your machine. It exposes an OpenAI-compatible API that the backend can connect to.
macOS / Linux:
curl -fsSL https://ollama.ai/install.sh | shWindows: Download from Ollama Downloads
You will need two models to run the app properly:
- A chat LLM (for conversations and reasoning)
- An embedding model (for knowledge base search and retrieval)
Examples:
# Pull a chat model (choose one depending on your system resources)
ollama pull deepseek-r1:8b
# or smaller / lighter
ollama pull deepseek-r1:1.5b
# Pull an embedding model
ollama pull nomic-embed-textStart the Ollama background service:
ollama serveOpen a new terminal and run:
ollama run deepseek-r1:1.5b "Hello, how are you?"If this works, Ollama is running correctly.
Edit backend/.env to point to your local models:
# Switch provider to local
PROVIDER=local
# Generic OpenAI-compatible settings
AZURE_API_KEY=ollama
AZURE_ENDPOINT=http://localhost:11434/v1
...
# Models: one LLM + one embedding model
AZURE_DEPLOYMENT_NAME=deepseek-r1:1.5b
AZURE_EMBEDDING_DEPLOYMENT_NAME=nomic-embed-text- When
PROVIDER=azure, the backend uses Azure OpenAI (default). - When
PROVIDER=local, the backend connects to the Ollama server and uses the models you specify in.env.
docker run -d \
--name qdrant \
-p 6333:6333 \
-p 6334:6334 \
-v $(pwd)/qdrant_storage:/qdrant/storage \
qdrant/qdrantcd backend
poetry install --no-root
poetry run uvicorn app.main:app --reload --host 0.0.0.0 --port 8000cd frontend
npm install
npm run dev
# or
pnpm install
pnpm dev# From project root
docker compose up --build -dOnce everything is running:
- Frontend: http://localhost:3000
- Backend API: http://localhost:8000
- API Documentation: http://localhost:8000/docs
- Qdrant Dashboard: http://localhost:6333/dashboard
- Langfuse Dashboard: Check your Langfuse project URL
# Check what's using the port
lsof -i :8000
lsof -i :3000
# Kill the process
kill -9 <PID># Check if Ollama is running
curl http://localhost:11434/v1
# Restart Ollama
ollama serve# Verify .env files exist
ls -la backend/.env
ls -la frontend/.env
# Check if variables are loaded
echo $PROVIDER# Check if Qdrant is running
docker ps | grep qdrant
# Check Qdrant logs
docker logs qdrant
# Restart Qdrant
docker restart qdrant- Use smaller models for development (e.g., llama2:7b instead of llama2:13b)
- Enable GPU acceleration if available (Ollama automatically detects CUDA)
- Monitor memory usage - local LLMs can be memory-intensive
After successful setup:
- Test the API: Visit http://localhost:8000/docs
- Try the Chat: Navigate to the chat interface in the frontend
- Upload Documents: Test the knowledge base functionality
- Monitor with Langfuse: Check your Langfuse dashboard for traces
- Explore Biomodels: Use the search functionality to find VCell models
Happy coding! 🚀