Utilities for analyzing and managing local Datastore/Firestore (Datastore mode) data. Works with both the Datastore Emulator and GCP using Application Default Credentials.
pip install local-storage-utilsThis installs the lsu CLI.
If you want to test publishing to TestPyPI and install the package from the test index, prefer doing that inside a virtual environment. This avoids the "externally-managed-environment" / PEP 668 error you saw when trying to install system-wide on Debian/Ubuntu.
Recommended steps:
# create and activate a virtualenv
python3 -m venv .venv
source .venv/bin/activate
# upgrade pip in the venv
python -m pip install --upgrade pip
# install from TestPyPI; use --extra-index-url so runtime dependencies are resolved from the real PyPI
pip install -i https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple local-storage-utilsNotes:
- The error "externally-managed-environment" happens when pip is blocked from modifying a system Python managed by the OS (PEP 668). The recommended fix is to use a virtual environment or pipx — do not use
--break-system-packagesunless you understand the risks. - If you prefer
pipxfor isolated CLI installs, usepipx installinside a separate environment or consult pipx docs for installing from alternate indexes.
git clone <this-repo-url>
cd local-storage-utils
python3 -m venv .venv
source .venv/bin/activate
python -m pip install -U pip
pip install -e .- If you see "Command 'python' not found", use
python3 -m venv .venv(above). Inside the venv,pythonwill point to Python 3. - If you see "externally-managed-environment", you are attempting a system-wide install. Always install into a virtual environment:
- Create a venv:
python3 -m venv .venv && source .venv/bin/activate - Then use the venv pip:
python -m pip install -U pip && pip install -e .
- Create a venv:
macOS:
# Install Python 3 using Homebrew
brew install python@3.12Linux (Debian/Ubuntu):
# Install Python 3 and venv support
sudo apt-get update && sudo apt-get install -y python3 python3-venv python3-pipLinux (Fedora/RHEL):
# Install Python 3 and venv support
sudo dnf install python3 python3-pipLinux (using Homebrew):
# Install Homebrew first (if not already installed): https://brew.sh
# Then install Python 3
brew install python@3.12Create an optional config.yaml in your working directory to customize behavior. By default, all commands iterate over all namespaces and all kinds unless you specify filters.
# Optional: specify project and emulator
project_id: "my-project"
emulator_host: "localhost:8010"# Optional filters (omit to process all namespaces and kinds)
namespaces: ["custom-ns"] # List specific namespaces, or omit to process all
kinds: ["MyKind"] # List specific kinds, or omit to process all
# Cleanup settings
ttl_field: "expireAt" # Field name containing expiry timestamp
batch_size: 500 # Delete batch size
# Analysis settings
sample_size: 500 # Max entities to sample per analysis (0 = no limit)Notes:
- CLI flags always override config values
- If no config is provided, sensible defaults are used
- Environment variables
DATASTORE_PROJECT_IDandDATASTORE_EMULATOR_HOSTare also supported
Lightweight utilities for analyzing and cleaning Datastore (Firestore in Datastore mode). Works with the Datastore emulator for local integration testing or GCP when using Application Default Credentials.
- CLI: run commands via
python3 cli.py <command>(or install the package and use the entrypoint). - Makefile: convenience targets are provided to create a venv, install deps, and run tests locally.
git clone <this-repo-url>
cd local-storage-utils
# create a venv and install the package in editable mode
python3 -m venv .venv
source .venv/bin/activate
pip install -U pip
pip install -e .make venv— create.venvand install package in editable modemake unit— run fast unit testsmake integration— run integration tests (starts/seeds emulator when configured)
Use these targets to get a working dev environment quickly.
# Analyze all kinds in all namespaces (default behavior)
lsu analyze-kinds
# Analyze specific kind across all namespaces
lsu analyze-fields --kind MyKind
# Analyze with grouping
lsu analyze-fields --kind MyKind --group-by batchId
# Dry-run cleanup for all kinds and namespaces
lsu cleanup --dry-run
# Filter to specific namespace and kind
lsu cleanup --kind MyKind --namespace custom-ns --dry-run- Start & seed emulator locally:
./scripts/run_emulator_local.sh(prefers.venv/bin/pythonto run seeder)./scripts/run_emulator_local.sh --no-seedto skip seeding
- The seeder accepts
SEED_COUNTandSEED_NS_COUNTenv vars to increase dataset size for perf tests.
Run integration tests:
# create venv and install deps (see Quickstart), then:
make integration- Run unit tests:
make unit(fast)
- Run full test suite locally:
make integration
This project uses the release workflow to publish releases to PyPI. Follow the packaging tutorial for a complete guide on packaging and publishing: https://packaging.python.org/en/latest/tutorials/packaging-projects/
We support publishing to either TestPyPI (for dry runs) or the real PyPI. The workflow can be triggered automatically on pushes to main or manually via the Actions UI (use the "Run workflow" button). When you run it manually you can set the publish_target input to testpypi to publish to TestPyPI instead of PyPI.
- For production publishing to the real PyPI, set the repository secret named
PYPI_API_TOKENwith a PyPI API token. - For test publishing to TestPyPI, set the repository secret named
TEST_PYPI_API_TOKENwith a TestPyPI API token.
The release workflow selects the appropriate token based on the publish_target input. Use TestPyPI first to validate packaging and metadata before publishing to the real index.
- By default, all commands iterate over all namespaces and all kinds unless you specify filters via config or CLI flags
sample_sizebounds per-kind analysis to avoid scanning entire large datasets (set to 0 to disable)- Multi-threaded processing is enabled by default for better performance