Galileo Python SDK (galileo on PyPI) - the official Python client library for the Galileo AI platform. It enables logging and tracing of LLM calls, experiments, datasets, prompt management, and more.
Key characteristics:
- Public SDK published to PyPI (external contributors welcome)
- Depends on
galileo-corefor shared schemas and infrastructure (see Known Issues) - Uses auto-generated API client from OpenAPI specification
- Supports multiple LLM frameworks: OpenAI, LangChain, CrewAI, OpenAI Agents SDK
# Install dependencies (requires poetry)
poetry install --all-extras --no-root
# Full setup (install + pre-commit hooks)
inv setup
# Run all tests (parallel by default)
poetry run pytest
# Run single test file
poetry run pytest tests/test_decorator.py
# Run single test
poetry run pytest tests/test_decorator.py::test_function_name -v
# Run tests with coverage
inv test
# Type checking
inv type-check
# Linting (via pre-commit)
poetry run ruff check --fix src/
poetry run ruff format src/src/galileo/
βββ __future__/ # New object-centric API (WIP)
β βββ project.py # Project domain object
β βββ dataset.py # Dataset domain object
β βββ experiment.py # Experiment domain object
β βββ prompt.py # Prompt domain object
β βββ log_stream.py # LogStream domain object
β βββ configuration.py # Configuration management
β βββ shared/ # Shared utilities (filters, sorting, base classes)
βββ logger/ # Core logging functionality
β βββ logger.py # GalileoLogger - central trace/span management
βββ handlers/ # Framework-specific integrations
β βββ langchain/ # LangChain callback handler (GalileoCallback)
β βββ crewai/ # CrewAI event listener
β βββ openai_agents/ # OpenAI Agents SDK integration
βββ openai/ # Drop-in OpenAI client wrapper (auto-logging)
βββ resources/ # Auto-generated API client (DO NOT EDIT)
βββ schema/ # Pydantic models for SDK-specific types
βββ utils/ # Utility functions and helpers
βββ datasets.py # Dataset service (current API)
βββ experiments.py # Experiment service (current API)
βββ prompts.py # Prompt service (current API)
βββ projects.py # Project service (current API)
βββ log_streams.py # LogStream service (current API)
βββ decorator.py # @log decorator and galileo_context
βββ config.py # GalileoPythonConfig configuration
GalileoLogger (src/galileo/logger/logger.py): Central class for uploading traces to Galileo. Supports batch and streaming modes. Manages traces, spans (LLM, retriever, tool, workflow, agent), and sessions.
Decorators (src/galileo/decorator.py): The @log decorator and galileo_context context manager for automatic function tracing. Uses ContextVars for thread-safe nested span tracking.
Handlers (src/galileo/handlers/): Framework-specific integrations:
langchain/- LangChain callback handler (GalileoCallback)crewai/- CrewAI handler (uses lazy imports to avoid side effects)openai_agents/- OpenAI Agents SDK integration
OpenAI Wrapper (src/galileo/openai/): Drop-in replacement for OpenAI client that auto-logs calls.
__future__ Package (src/galileo/__future__/): New object-centric API implementing the "Golden Flow" patterns. Provides intuitive, Pythonic interfaces for domain objects (Project, Dataset, Prompt, Experiment, LogStream). Released incrementally as stable.
Resources (src/galileo/resources/): Auto-generated API client from OpenAPI spec. Excluded from linting/type-checking. Never edit manually.
# Regenerate API client
./scripts/import-openapi-yaml.sh https://api.galileo.ai/client
./scripts/auto-generate-api-client.shImportant: The OpenAPI spec comes from the Client API (/client), not the main API (/docs). The Client API is a curated subset designed specifically for SDK consumption.
The SDK depends on galileo-core for shared schemas, helpers, and base classes:
galileo_core.schemas.logging.*- Span types (LlmSpan, ToolSpan, etc.), Trace, Sessiongalileo_core.helpers.*- API key management, execution utilitiesgalileo_core.schemas.protect.*- Protection/guardrails schemas
Note: There is ongoing work to reduce/eliminate this dependency. See Known Issues section.
Domain objects follow consistent patterns:
from galileo.__future__ import Project, Dataset
# Factory methods (class-level)
project = Project.get(name="my-project") # Retrieve existing
projects = Project.list() # List all
# Instance creation with lifecycle
project = Project(name="new-project") # LOCAL_ONLY state
project.create() # β SYNCED state
# Fluent creation
project = Project(name="new-project").create() # 2-in-1
# Relationship methods
log_streams = project.list_log_streams()
dataset = project.create_dataset(name="test-data", content=[...])
# Child β Parent navigation
dataset.project # Returns parent Project objectObjects have explicit sync states: LOCAL_ONLY, SYNCED, DIRTY, FAILED_SYNC, DELETED
project = Project(name="test") # LOCAL_ONLY
project.create() # β SYNCED
project.name = "renamed" # β DIRTY
project.save() # β SYNCED
project.delete() # β DELETEDServices provide functional interfaces for those who prefer procedural style:
from galileo.datasets import create_dataset, get_dataset, list_datasets
from galileo.experiments import run_experiment
dataset = create_dataset(name="test", content=[...])
results = run_experiment(
experiment_name="eval-1",
dataset=dataset,
prompt_template=get_prompt(name="my-prompt"),
metrics=["correctness"],
project="my-project"
)from galileo import log, galileo_context
# Auto-trace function calls
@log
def my_workflow():
call_llm()
call_llm()
# Explicit span types
@log(span_type="retriever")
def retrieve_docs(query: str):
return ["doc1", "doc2"]
# Context manager for explicit control
with galileo_context(project="my-project", log_stream="prod"):
my_workflow()# LangChain
from galileo.handlers.langchain import GalileoCallback
callback = GalileoCallback()
llm = ChatOpenAI(callbacks=[callback])
# CrewAI
from galileo.handlers.crewai import CrewAIEventListener
listener = CrewAIEventListener(project="my-project")
# Listener auto-registers; use auto_setup_listeners=False in tests
# OpenAI (drop-in wrapper)
from galileo.openai import openai
client = openai.OpenAI() # Auto-logs all callsTests use pytest with these key fixtures from tests/conftest.py:
mock_request- HTTP request mocking (fromgalileo_core[testing])mock_healthcheck,mock_login_api_key,mock_get_current_user- Common API mocks
Environment variables are set in conftest.py for pytest-xdist compatibility:
GALILEO_CONSOLE_URL=http://localtest:8088
GALILEO_API_KEY=api-1234567890
GALILEO_PROJECT=test-project
GALILEO_LOG_STREAM=test-log-streamTests run with --disable-socket to prevent real network calls.
def test_example(mock_request, mock_healthcheck, mock_login_api_key):
# Mock API responses
mock_request.post("/datasets").respond(json={"id": "123", "name": "test"})
# Test SDK functionality
dataset = create_dataset(name="test", content=[...])
assert dataset.id == "123"CrewAI imports have global side effects. Use lazy imports and auto_setup_listeners=False:
def test_crewai_handler():
# Import inside test, not at module level
from galileo.handlers.crewai import CrewAIEventListener
listener = CrewAIEventListener(
project="test",
auto_setup_listeners=False # Prevents import side effects
)Use behavioral testing comments to structure tests clearly. Add inline comments before each section:
# Given: <description>- Before setup/arrangement code. Describe the preconditions.# When: <description>- Before the action being tested. Describe what action is performed.# Then: <description>- Before assertions. Describe the expected outcome.
Important rules:
- Comments must include a human-readable description after the colon - never leave them empty
- Use sentence case for descriptions (e.g., "a user with admin permissions", not "A User With Admin Permissions")
- Keep descriptions concise but meaningful
- For tests where the action raises an exception, use
# When/Then: <description>combined
def test_create_project_success(mock_request, mock_healthcheck, mock_login_api_key):
# Given: a valid project name and mocked API response
mock_request.post("/projects").respond(json={"id": "123", "name": "test"})
# When: creating a new project
project = Project(name="test").create()
# Then: the project is created with the expected ID
assert project.id == "123"
assert project.name == "test"- Line length: 120 characters
- Linting: ruff (replaces flake8, isort, etc.)
- Type annotations: Required for public functions (mypy)
- Docstrings: numpy convention
- Pre-commit hooks: Run ruff and mypy on commit
- Use standard Python logging:
import logging; logger = logging.getLogger(__name__) - Duration variables must be suffixed with units:
timeout_seconds,delay_ms - Commit messages:
type(scope): description(conventional commits) - Imports at top of file: Always place imports at the module level, not inside functions
- Exception: Lazy imports for optional dependencies (e.g., crewai) - document why
- Use
from __future__ import annotationsfor forward references
The SDK distinguishes between two types of operations with different error handling needs:
Resource Management Operations (raise exceptions):
- Operations where users explicitly request an action and expect feedback
- Examples:
create_project(),get_dataset(),delete_log_stream(),list_projects() - These operations should raise exceptions on failure for clear user feedback
Telemetry/Ingestion Operations (resilient):
- Background operations that observe user code without interfering
- Examples:
ingest_traces(),ingest_spans(),flush() - These operations swallow infrastructure errors gracefully
- Principle: Observability code should observe, not interfere
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β User Application β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββ΄ββββββββββββββββββββββ
β β
βΌ βΌ
βββββββββββββββββββββ βββββββββββββββββββββββββ
β Resource Mgmt β β Telemetry/Ingestion β
β (Raises on Error) β β (Resilient) β
βββββββββββββββββββββ€ βββββββββββββββββββββββββ€
β Projects β β Traces.ingest_*() β
β Datasets β β Traces.update_*() β
β LogStreams β β Logger streaming β
β Stages β β @warn_catch_exception β
βββββββββββββββββββββ βββββββββββββββββββββββββ
β β
βΌ βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Generated API Client β
β (Always raises HTTP exceptions) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
HTTP-Specific Exceptions:
| Status Code | Exception | Meaning |
|---|---|---|
| 400 | BadRequestError |
Invalid request parameters |
| 401 | AuthenticationError |
Invalid or expired API key |
| 403 | ForbiddenError |
Insufficient permissions |
| 404 | NotFoundError |
Resource doesn't exist |
| 409 | ConflictError |
Resource already exists |
| 422 | HTTPValidationError |
Request body/params failed Pydantic validation |
| 429 | RateLimitError |
Too many requests |
| 5xx | ServerError |
Server-side error |
Infrastructure Exceptions (caught only in telemetry operations):
INFRASTRUCTURE_EXCEPTIONS = (
httpx.HTTPError,
httpx.TimeoutException,
httpx.ConnectError,
ConnectionError,
TimeoutError,
OSError,
)User errors like TypeError, ValueError, and ValidationError are never caught - they propagate immediately.
import logging
logger = logging.getLogger(__name__)
# Log lifecycle events with context
logger.info("Project.create: name=%s β started", name)
logger.info("Project.create: id=%s β completed", project_id)
logger.error("Project.update: id=%s β failed: %s", project_id, error)
# Never log sensitive data (tokens, API keys, PII)When to Add Logging:
- Service methods that perform writes (create, update, delete)
- Error conditions with full context when catching exceptions
- Long-running operations (start/completion with duration)
What NOT to Log:
- Sensitive data: Passwords, API keys, tokens, PII
- Large payloads: Don't log entire request/response bodies
- High-frequency loops: Use sampling or aggregate metrics
The SDK has a deep dependency on galileo-core (private repository). This creates:
- Contributor friction (requires private repo access)
- Contract split between core and SDK
- Inheritance of internal complexity
Mitigation in progress: Gradual migration to OpenAPI-generated types and SDK-owned abstractions.
Configuration exists in three places:
Configurationclass attributes (new__future__API)os.environ(synced by Configuration)GalileoPythonConfig._instance(actual authenticated state from core)
Known issue: connect() must be called explicitly; lazy initialization is incomplete.
Prompt.create_version() creates a NEW prompt (name with timestamp suffix), not a new version of the same template. True version management requires API alignment.
API uses 1-based version indexing, not 0-based:
# Correct: first version is index 1
version_content = dataset.get_version_content(index=1)
# Wrong: index 0 doesn't exist
version_content = dataset.get_version_content(index=0) # Raises ValueErrorThe SDK's Experiment class conflates two distinct API concepts:
- Playground: Interactive workspace for prompt iteration
- Experiment: Immutable logged run with recorded results
Version specification for datasets and prompts is implicit (uses "current" version).
SDK converts all metadata values to strings. API behavior varies:
- Trace API: Skips
Nonevalues and non-primitives - Dataset API: Keeps
Noneas null, JSON-encodes nested dicts
Releases use python-semantic-release with conventional commits:
# Patch release triggers
fix:, perf:, chore:, docs:, style:, refactor:
# Version is managed in:
# - src/galileo/__init__.py:__version__
# - pyproject.toml:project.version| Variable | Required | Description |
|---|---|---|
GALILEO_CONSOLE_URL |
Yes* | Galileo console URL (default: app.galileo.ai) |
GALILEO_API_KEY |
Yes | API key for authentication |
GALILEO_PROJECT |
No | Default project name |
GALILEO_LOG_STREAM |
No | Default log stream name |
GALILEO_LOGGING_DISABLED |
No | Disable trace collection |
*Required for non-production environments
- PyPI: https://pypi.org/project/galileo/
- GitHub: https://github.com/rungalileo/galileo-python
- API Docs: https://docs.galileo.ai
- OpenAPI Spec:
openapi.yaml(generated from Client API)