Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
64313d9
first pass at refactor with claude
geospatial-jeff Apr 25, 2026
3add2dc
add ci workflow
geospatial-jeff Apr 26, 2026
9e9ae08
imports top of file
geospatial-jeff Apr 26, 2026
f2556cf
simplify __init__
geospatial-jeff Apr 26, 2026
39979a5
add matryoshka flag
geospatial-jeff Apr 26, 2026
5c7ae6a
rename factory to embedding, rename backbone to layers, move finetune…
geospatial-jeff Apr 26, 2026
36781af
dont lint notebooks
geospatial-jeff Apr 26, 2026
d624e67
move ClayDataModule to train, add finetune extra
geospatial-jeff Apr 26, 2026
897ec55
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 26, 2026
c4f6e85
move ClayDataModule to train, add finetune extra
geospatial-jeff Apr 26, 2026
998a800
Merge branch 'refactor' of https://github.com/Clay-foundation/model i…
geospatial-jeff Apr 26, 2026
eef6ee5
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 26, 2026
57688ca
improve unit test coverage
geospatial-jeff Apr 26, 2026
b2676bf
build/test with hatch
geospatial-jeff Apr 26, 2026
9983897
Potential fix for pull request finding 'CodeQL / Workflow does not co…
geospatial-jeff Apr 26, 2026
00e6f09
add type hints
geospatial-jeff Apr 26, 2026
b5107ef
add pyarrow dep
geospatial-jeff Apr 26, 2026
364f69a
Merge branch 'refactor' of https://github.com/Clay-foundation/model i…
geospatial-jeff Apr 26, 2026
1f2af1f
remove top level utils
geospatial-jeff Apr 26, 2026
bfc9aae
remove dead code, fix doc paths, fix datamodule predict, remove print…
geospatial-jeff Apr 26, 2026
4c3047a
pass custom config file
geospatial-jeff Apr 26, 2026
014931c
add pydantic model for config.yaml
geospatial-jeff Apr 26, 2026
4b5dfd5
add pydantic model for config.yaml
geospatial-jeff Apr 26, 2026
1b125b4
cleanup stale environment setup, define config in one place
geospatial-jeff Apr 26, 2026
968d66f
add mypy
geospatial-jeff Apr 26, 2026
609bf1d
separate training logic from model forward pass
geospatial-jeff Apr 26, 2026
df51f94
add integration tests
geospatial-jeff Apr 26, 2026
b38cb6c
test py3.13
geospatial-jeff Apr 26, 2026
3eb69c1
build: adopt uv ruff and ty tooling (#383)
isaaccorley May 9, 2026
5ff069c
correct learning rate in docs to match code, confirmed through commit…
geospatial-jeff May 9, 2026
e86e80e
remove duplicate variable
geospatial-jeff May 9, 2026
fbd40b1
skip sklearn in test without train extra
geospatial-jeff May 9, 2026
f267669
remove optional dependencies like geopandas/rasterio, will add this b…
geospatial-jeff May 9, 2026
4cbb4e6
clean up docs a bit
geospatial-jeff May 9, 2026
b773a71
fix relative metadata path when installed as package
geospatial-jeff May 9, 2026
4b6bb0e
remove RegressionEncoder and SegmentEncoder, these were legitimately …
geospatial-jeff May 9, 2026
3e63bf3
remove elle, we'll bring it back later. bunch of other small cleanups
geospatial-jeff May 9, 2026
4145511
cleanup more comments, a few lines of dead code
geospatial-jeff May 9, 2026
f1789d2
cleanup more comments, a few lines of dead code
geospatial-jeff May 9, 2026
1ec422e
move configure_training_defaults to train
geospatial-jeff May 9, 2026
798d5b6
add Datacube typedict for static type checking, validate datacube bef…
geospatial-jeff May 9, 2026
fa67083
BREAKING move teacher to train to avoid downloading teacher on inference
geospatial-jeff May 9, 2026
dd7dea2
remove factory functions from public api
geospatial-jeff May 9, 2026
0926539
add s2 inference pipeline
geospatial-jeff May 11, 2026
5a5e8a1
clean up inference pipeline
geospatial-jeff May 11, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 0 additions & 36 deletions .binder/environment.yml

This file was deleted.

38 changes: 38 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
name: CI

on:
push:
branches: [main, refactor]
pull_request:
branches: [main, refactor]

permissions:
contents: read

jobs:
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: astral-sh/setup-uv@v8.1.0

Check warning

Code scanning / CodeQL

Unpinned tag for a non-immutable Action in workflow Medium

Unpinned 3rd party Action 'CI' step
Uses Step
uses 'astral-sh/setup-uv' with ref 'v8.1.0', not a pinned commit hash
- uses: actions/setup-python@v5
with:
python-version: "3.11"
- run: uv sync --locked --all-extras --group dev
- run: uv run ruff check . --exclude "*.ipynb"
- run: uv run ruff format --check . --exclude "*.ipynb"
- run: uv run ty check claymodel tests --exclude "*.ipynb"

test:
Comment thread
github-advanced-security[bot] marked this conversation as resolved.
Fixed
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.11", "3.12", "3.13"]
steps:
- uses: actions/checkout@v4
- uses: astral-sh/setup-uv@v8.1.0

Check warning

Code scanning / CodeQL

Unpinned tag for a non-immutable Action in workflow Medium

Unpinned 3rd party Action 'CI' step
Uses Step
uses 'astral-sh/setup-uv' with ref 'v8.1.0', not a pinned commit hash
- uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
- run: uv sync --locked --all-extras --group dev
- run: uv run pytest tests/ -v --cov=claymodel --cov-report=term-missing --cov-fail-under=90
57 changes: 35 additions & 22 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,26 +1,39 @@
# See https://pre-commit.com for more information
# See https://pre-commit.com/hooks.html for more hooks
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v6.0.0
hooks:
- id: check-added-large-files
args: [ '--maxkb=512', '--enforce-all' ]
exclude: '^docs/tutorials/.*\.ipynb$'
- id: check-yaml
- id: end-of-file-fixer
- id: trailing-whitespace
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.13.3
hooks:
- id: ruff # Run the linter
args: [ --fix ]
types_or: [ python, pyi ]
- id: ruff # Run the linter for Jupyter notebooks with the PLR0913 rule ignored
args: [ --fix, --ignore=PLR0913 ]
types: [ jupyter ]
- id: ruff-format # Run the formatter
types_or: [ python, pyi, jupyter ]
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v6.0.0
hooks:
- id: check-added-large-files
args: ["--maxkb=512", "--enforce-all"]
exclude: "^(docs/tutorials/.*\\.ipynb|uv\\.lock)$"
- id: check-yaml
- id: end-of-file-fixer
exclude: "\\.ipynb$"
- id: trailing-whitespace
exclude: "\\.ipynb$"
- id: check-merge-conflict
- id: debug-statements

- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.15.11
hooks:
- id: ruff
args: ["--fix", "--show-fixes"]
types_or: [python, pyi]
- id: ruff-format
types_or: [python, pyi]

- repo: local
hooks:
- id: ty-check
name: ty check
language: system
entry: uv run ty check claymodel tests --exclude "*.ipynb"
pass_filenames: false
- id: uv-lock
name: Lock dependencies with uv
language: system
entry: uv lock
pass_filenames: false

# https://pre-commit.ci/#configuration
ci:
Expand Down
22 changes: 0 additions & 22 deletions .ruff.toml

This file was deleted.

135 changes: 135 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
# Contributing to Clay Foundation Model

Thank you for your interest in contributing to Clay! This guide covers how to set up a development environment, run tests, and submit changes.

## Development Setup

### Prerequisites

- Python 3.11 or later
- Git
- (Optional) CUDA-capable GPU for model training/inference

### Installation

```bash
# Clone the repository
git clone https://github.com/Clay-foundation/model.git
cd model

# Install in development mode with all extras
uv pip install -e ".[dev]"
```

### Verify Installation

```bash
# Run the test suite
uv run pytest tests/ -v

# Check linting
uv run ruff check claymodel/ tests/

# Check formatting
uv run ruff format --check claymodel/ tests/
```

## Project Structure

```
claymodel/
__init__.py # Package exports
api.py # High-level API: embed(), load_model(), normalize()
cli.py # `clay` commands: info, benchmark
model.py # Core model: Encoder, Decoder, ClayMAE, factory functions
module.py # Lightning module: ClayMAEModule
utils.py # Utilities: position embeddings, weight loading
metadata.py # PlatformMetadata Pydantic model, YAML loading
configs/
metadata.yaml # Bundled sensor metadata (wavelengths, normalization stats)
inference/
deterministic.py # DeterministicInference context manager
elle.py # ELLE quality scoring probe
masking.py # PatchAnalyzer for chip quality filtering
training/ # Training data loading, callbacks
finetune/ # Downstream task examples
tests/ # Test suite
docs/ # Documentation (Jupyter Book)
configs/ # Training configs
```

## Running Tests

```bash
# Run all tests
pytest tests/ -v

# Run a specific test file
pytest tests/test_model.py -v

# Run with coverage
pytest tests/ --cov=claymodel --cov-report=term-missing
```

Tests use a tiny encoder (dim=192, random weights) so they run in seconds without a GPU or checkpoint.

## Code Style

We use [ruff](https://docs.astral.sh/ruff/) for linting and formatting.

```bash
# Check for lint errors
uv run ruff check claymodel/ tests/

# Auto-fix fixable errors
uv run ruff check claymodel/ tests/ --fix

# Check formatting
uv run ruff format --check claymodel/ tests/

# Auto-format
uv run ruff format claymodel/ tests/
```

Configuration is in `pyproject.toml`. Key rules:
- Max line length: 100
- Max function arguments: 6 (with exceptions for model constructors)
- Import sorting enforced (isort-compatible)

## Making Changes

### Before You Start

1. Check [existing issues](https://github.com/Clay-foundation/model/issues) for related work
2. For significant changes, open an issue first to discuss the approach

### Workflow

1. Create a branch from `main`
2. Make your changes
3. Run `ruff check` and `ruff format`
4. Run `pytest tests/`
5. Submit a pull request

### Key Principles

- **No changes to model computation**: Clay v1.5 must produce identical embeddings. Any refactoring must be verified with before/after numerical comparison.
- **Test new functionality**: Add tests for new features. Tests should be fast (<30s total).
- **Follow existing patterns**: Look at how similar features are implemented before adding new ones.
- **Sensor metadata in metadata.yaml**: When adding new sensor support, add entries to `claymodel/configs/metadata.yaml`.

## Adding New Sensors

To add support for a new satellite sensor:

1. Compute normalization statistics (mean/std per band) from a representative sample
2. Find the central wavelength of each band in micrometers
3. Add an entry to `claymodel/configs/metadata.yaml`
4. Test with `clay info --sensor your-sensor` and `normalize(pixels, "your-sensor")`
5. Submit a PR with the metadata and a brief description of the instrument

## Questions?

- Open an [issue](https://github.com/Clay-foundation/model/issues)
- Start a [discussion](https://github.com/Clay-foundation/model/discussions)
- Email: hello@madewithclay.org
50 changes: 19 additions & 31 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,54 +21,42 @@ Launch into a [JupyterLab](https://jupyterlab.readthedocs.io) environment on

## Installation

### Pip Installation (Recommended)
### uv Installation (Recommended)

The easiest way to install Clay Foundation Model is via pip:
The easiest way to install Clay Foundation Model is via `uv`:

pip install git+https://github.com/Clay-foundation/model.git
uv pip install git+https://github.com/Clay-foundation/model.git

This will install the `claymodel` package and all its dependencies. You can then import and use it in your Python code:

```python
from claymodel.datamodule import ClayDataModule
from claymodel.module import ClayMAEModule
from claymodel import load_model, embed
```

### Development Installation

For development or advanced usage, you can set up the full development environment:

To help out with development, start by cloning this [repo-url](/../../)
For development or advanced usage, clone the repository and install with dev extras:

git clone <repo-url>
cd model

Then we recommend [using mamba](https://mamba.readthedocs.io/en/latest/installation/mamba-installation.html)
to install the dependencies. A virtual environment will also be created with Python and
[JupyterLab](https://github.com/jupyterlab/jupyterlab) installed.

mamba env create --file environment.yml

> [!NOTE]
> The command above has been tested on Linux devices with CUDA GPUs.

Activate the virtual environment first.

mamba activate claymodel
uv pip install -e ".[dev]"

Finally, double-check that the libraries have been installed.

mamba list
uv run pytest tests/test_imports.py -q


## Usage

### Running jupyter lab

mamba activate claymodel
python -m ipykernel install --user --name claymodel # to install virtual env properly
jupyter kernelspec list --json # see if kernel is installed
jupyter lab &
uv run jupyter lab

### Using the clay CLI

clay info
clay info --sensor sentinel-2-l2a
clay benchmark


### Running the model
Expand All @@ -77,22 +65,22 @@ The neural network model can be run via
[LightningCLI v2](https://pytorch-lightning.medium.com/introducing-lightningcli-v2supercharge-your-training-c070d43c7dd6).

> [!NOTE]
> If you installed via pip, you'll need to clone the repository to access the trainer script and config files.
> If you installed from the package, you'll need to clone the repository to access the trainer script and config files.

To check out the different options available, and look at the hyperparameter
configurations, run:

python trainer.py --help
uv run python trainer.py --help

To quickly test the model on one batch in the validation set:

python trainer.py fit --model ClayMAEModule --data ClayDataModule --config configs/config.yaml --trainer.fast_dev_run=True
uv run python trainer.py fit --model ClayMAEModule --data ClayDataModule --config configs/config.yaml --trainer.fast_dev_run=True

To train the model:

python trainer.py fit --model ClayMAEModule --data ClayDataModule --config configs/config.yaml
uv run python trainer.py fit --model ClayMAEModule --data ClayDataModule --config configs/config.yaml

More options can be found using `python trainer.py fit --help`, or at the
More options can be found using `uv run python trainer.py fit --help`, or at the
[LightningCLI docs](https://lightning.ai/docs/pytorch/2.1.0/cli/lightning_cli.html).

## Contributing
Expand Down
Loading
Loading