Firestore Schema and Relationships Visualization

Extract the schema of a Firestore database, identify relationships between collections, and generate visual representations of the schema and relationships using PlantUML and pydot.

Note: this is an exploratory project and not meant for production usage.

graph LR
    A[(Firestore DB)] --> B[Schema Extraction]
    B --> |"field types\nsubcollections"| C{Reference fields?}
    C --> |yes| D[Known Relationships]
    C --> |no| E[LLM Detection]
    D --> F[Merge Relationships]
    E --> F
    F --> G[PlantUML Diagram]
    F --> H[pydot Graph]
    B --> |--skip-llm| I[JSON Export]
    I --> J[AI Coding Assistant]

    style A fill:#f9a825,stroke:#f57f17,color:#000
    style E fill:#90caf9,stroke:#1565c0,color:#000
    style G fill:#a5d6a7,stroke:#2e7d32,color:#000
    style H fill:#a5d6a7,stroke:#2e7d32,color:#000
    style J fill:#ce93d8,stroke:#6a1b9a,color:#000

Features

Extract Firestore Schema: Retrieve the schema of a Firestore database, including collection names, field names, and inferred field types (string, number, boolean, timestamp, reference, etc.).
Subcollection Discovery: Recursively discover and include subcollections (e.g., users.posts.comments).
Identify Relationships: Detect foreign key relationships between collections using two methods (see How relationship detection works).
Generate Schema Graph: Create a visual representation of the Firestore schema and relationships using pydot.
Generate PlantUML Diagram: Generate PlantUML class diagrams with typed fields and relationship arrows.

Installation

Clone the repository.

Create a virtual environment and activate it:

python -m venv venv
source venv/bin/activate

Install dependencies:
```
pip install -r requirements.txt
```
Set up environment variables (only needed if using built-in LLM relationship detection):
```
export OPENAI_API_KEY='your-api-key'      # for --llm-provider openai (default)
export ANTHROPIC_API_KEY='your-api-key'    # for --llm-provider anthropic
```
If you use an AI coding assistant like Claude Code, you can skip this - see Using with an AI coding assistant.

Usage

# Full run with defaults
python main.py

# Quick run - fewer samples, no subcollections, skip LLM
python main.py --sample-size 10 --max-depth 0 --skip-llm

# Only top-level collections with PlantUML output
python main.py --max-depth 0 --format plantuml

# Deeper subcollection discovery with smaller sample
python main.py --max-depth 5 --sample-size 20

# Generate diagrams for specific collections only
python main.py --collections users,posts,comments

# Use Anthropic Claude instead of OpenAI for relationship detection
python main.py --llm-provider anthropic

CLI Options

Flag	Default	Description
`--sample-size N`	50	Number of documents to sample per collection
`--max-depth N`	3	Maximum subcollection nesting depth (0 to skip subcollections)
`--skip-llm`	off	Skip LLM relationship detection (only use reference-type fields)
`--llm-provider`	openai	LLM provider for relationship detection: `openai` or `anthropic`
`--format`	all	Output format: `all`, `plantuml`, or `pydot`
`--no-export-json`	off	Skip exporting schema to a JSON file
`--collections`	all	Comma-separated list of collections to scan (subcollections included automatically)

By default, the schema is exported to a timestamped JSON file (e.g., firestore_schema_20260401120000.json) immediately after extraction, before relationship detection or diagram rendering. This ensures you have the schema saved even if later steps fail.

Quick mode

For a fast overview without LLM costs or subcollection crawling:

python main.py --sample-size 10 --max-depth 0 --skip-llm

How relationship detection works

Relationships between collections are detected in two layers:

Reference fields (automatic, no LLM) - During schema extraction, Firestore DocumentReference fields are detected directly. These are actual pointers to other documents, so the target collection is known with certainty. This happens for free as part of schema extraction.
Name-based inference (LLM) - Many relationships are stored as plain string or number fields (e.g., user_id, author_email) rather than native references. An LLM examines the full schema and field names to infer which fields likely refer to other collections. This requires an OpenAI or Anthropic API key and makes one API call per collection.

What `--skip-llm` does

With --skip-llm, only layer 1 runs. You get relationships for DocumentReference fields but miss name-based ones. For example, if a posts collection has a user_id string field pointing to users, that relationship won't be detected.

Use --skip-llm when you want a quick overview, want to avoid API costs, don't have an API key, or plan to analyze the schema yourself using an AI coding assistant (see below). The schema extraction itself (field names, types, subcollections) is unaffected.

Using with an AI coding assistant

You don't need an LLM API key to get value from this tool. If you use an AI coding assistant like Claude Code, Cursor, GitHub Copilot, or similar, you can extract the schema and let your assistant handle the relationship analysis.

Step 1: Extract the schema without LLM:

python main.py --skip-llm

This produces a timestamped JSON file (e.g., firestore_schema_20260401120000.json) containing the full schema with field names, types, and any DocumentReference relationships.

Step 2: Ask your assistant to analyze it. For example, in Claude Code:

Look at firestore_schema_20260401120000.json. Identify foreign key
relationships between collections - fields that store IDs referencing
other collections (e.g., user_id -> users). Then generate a PlantUML
class diagram showing the schema with these relationships.

This approach gives you full control over the analysis, works with any LLM, and costs nothing beyond what you already pay for your coding assistant.

Tests

python -m pytest tests/ -v

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
docs/plans		docs/plans
scripts		scripts
tests		tests
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
config.py		config.py
main.py		main.py
requirements.txt		requirements.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Firestore Schema and Relationships Visualization

Features

Installation

Usage

CLI Options

Quick mode

How relationship detection works

What `--skip-llm` does

Using with an AI coding assistant

Tests

License

About

Uh oh!

Releases

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Firestore Schema and Relationships Visualization

Features

Installation

Usage

CLI Options

Quick mode

How relationship detection works

What --skip-llm does

Using with an AI coding assistant

Tests

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Uh oh!

Contributors

Uh oh!

Languages

What `--skip-llm` does