Skip to content

Include 90 parenthesized DEM files in catalog#18

Merged
NewGraphEnvironment merged 1 commit into
mainfrom
8-investigate-parenthesized-files
Feb 18, 2026
Merged

Include 90 parenthesized DEM files in catalog#18
NewGraphEnvironment merged 1 commit into
mainfrom
8-investigate-parenthesized-files

Conversation

@NewGraphEnvironment
Copy link
Copy Markdown
Owner

Summary

  • 90 DEM files with parentheses in filenames (e.g., (2).tif) were excluded since Phase 1 under the assumption they "all fail validation"
  • Investigation found the files are valid COGs — the space in the filename just needed URL encoding (%20) for GDAL's /vsicurl/
  • File sizes differ from non-parenthesized counterparts, confirming these are separate acquisitions, not duplicates
  • Added encode_url_for_gdal() helper and applied across validation and item creation pipelines
  • Removed parentheses filter from urls_fetch.R; refreshed URL list to 60,126

Test plan

  • Validated sample parenthesized files as valid COGs with URL encoding
  • 355 items created successfully (including parenthesized) via item_create.py --test
  • Verified item JSON output: unique IDs, correct asset hrefs, spatial metadata intact

Closes #8

Relates to NewGraphEnvironment/sred-2025-2026#3

Root cause: spaces in filenames caused CURL errors with /vsicurl/. Files
are valid COGs — they just need URL encoding (space → %20) for GDAL.

- Add encode_url_for_gdal() helper to stac_utils.py
- Apply encoding in check_geotiff_cog(), item_create.py, item_reprocess.py
- Remove parentheses filter from urls_fetch.R
- Refresh urls_list.txt (now 60,126 URLs including 90 parenthesized)
- Tested: 355 items created successfully including parenthesized files

Closes #8

Relates to NewGraphEnvironment/sred#3

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@NewGraphEnvironment NewGraphEnvironment merged commit c7d3e34 into main Feb 18, 2026
1 of 2 checks passed
@NewGraphEnvironment NewGraphEnvironment deleted the 8-investigate-parenthesized-files branch February 18, 2026 18:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Investigate files with parentheses in filename (90 files excluded)

1 participant