Skip to content

Add monthly embedding experiment#279

Open
favyen2 wants to merge 2 commits intomasterfrom
favyen/20260223-monthly-embedding
Open

Add monthly embedding experiment#279
favyen2 wants to merge 2 commits intomasterfrom
favyen/20260223-monthly-embedding

Conversation

@favyen2
Copy link
Copy Markdown
Collaborator

@favyen2 favyen2 commented Feb 23, 2026

This shows how to save the per-timestep embeddings when running OlmoEarth on 12 30-day periods.

@favyen2
Copy link
Copy Markdown
Collaborator Author

favyen2 commented Mar 2, 2026

@claude can you review this PR?

@claude
Copy link
Copy Markdown

claude bot commented Mar 2, 2026

Claude finished @favyen2's task in 4m 23s —— View job


PR Review: Add monthly embedding experiment

  • Gather context and read changed files
  • Review code and provide feedback

Overall this is a clean proof-of-concept for computing per-timestep embeddings with OlmoEarth. The three files are well-organized and the README provides a clear workflow. A few issues worth addressing:

config.json — Missing min_matches

The worldcover config.json (2026_02_23_worldcover_embedding/config.json) specifies both max_matches: 12 and min_matches: 12 to guarantee exactly 12 timesteps per window. This config only has max_matches: 12 (line 18).

"query_config": {
  "max_matches": 12,
  // missing: "min_matches": 12
  "period_duration": "30d",
  "space_mode": "PER_PERIOD_MOSAIC"
}

Without min_matches: 12, a window with insufficient Sentinel-2 coverage could yield fewer than 12 timesteps, which would cause a shape mismatch in the post-processing script's rearrange call (it hardcodes t=12). Fix this →

README.mdnum_bandsets = 3 is unexplained

The post-processing script (line 37) hardcodes num_bandsets = 3 but doesn't explain what this represents. It would help to add a comment like:

# OlmoEarth processes Sentinel-2's 12 bands in 3 groups of 4 bands each.
num_bandsets = 3

This also helps users know what to change if they use a different sensor with a different number of band groups.

model.yamloptimizer is unused for inference

The optimizer block (line 17) has no configuration parameters. Since rslearn model predict doesn't use the optimizer, this is harmless, but it's a bit misleading. It could either be removed or kept as a minimal placeholder.

Minor: rslearn model predict vs python -m rslp.rslearn_main model predict

The worldcover README uses python -m rslp.rslearn_main model predict for the prediction step, while this README uses rslearn model predict (line 25). Both may work, but it's worth verifying which is the intended entry point for this repo (especially for WEKA/Beaker environments where rslp is expected).

FYI: Large uncompressed output

With "compress": null, the embeddings GeoTIFF will be 27648 bands × float32 × 1024×1024 pixels ≈ 113 GB per window. This is likely intentional for write/read performance on a proof-of-concept, but worth being aware of.


The 27648 = 12 × 3 × 768 band count is correctly consistent across config.json (line 27) and the post-processing script. The downsample_factor: 4 in the RasterMerger correctly aligns with patch_size: 4.

@favyen2
Copy link
Copy Markdown
Collaborator Author

favyen2 commented Mar 2, 2026

@APatrickJ are we still interested in saving the per-timestep embeddings from embedding inference with multiple timesteps? Currently this PR and allenai/rslearn#524 would add that functionality (maybe allenai/rslearn#524 is better implemented with an additional token_feature_maps_to_feature_maps component, instead of changing EmbeddingHead?).

If not then might make sense to close these two PRs for now.

@APatrickJ
Copy link
Copy Markdown
Collaborator

@APatrickJ are we still interested in saving the per-timestep embeddings from embedding inference with multiple timesteps? Currently this PR and allenai/rslearn#524 would add that functionality (maybe allenai/rslearn#524 is better implemented with an additional token_feature_maps_to_feature_maps component, instead of changing EmbeddingHead?).

If not then might make sense to close these two PRs for now.

@favyen2 is this question related to generating sub-annual (e.g. monthly, seasonal) periods for embeddings? If so, then we're evaluating the use cases for that still.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants