diff --git a/README.md b/README.md index 23a9c4d..cbe18f4 100644 --- a/README.md +++ b/README.md @@ -1,159 +1,28 @@ - ### Automated Coastline Extraction for Erosion Modeling in Alaska +# Automated Coastline Extraction for Erosion Modeling in Alaska -The primary goal of this project is to enhance the accuracy of coastline extraction, particularly for erosion modeling in Deering, Alaska, using high-resolution Planet imagery with a 3-meter resolution. The project focuses on creating reliable ground truth data and labels that will be used to train the [DeepWaterMap algorithm](https://github.com/isikdogan/deepwatermap), a deep convolutional neural network designed to segment surface water on multispectral imagery. Originally trained on 30-meter resolution Landsat data, DeepWaterMap will be adapted to work with higher-resolution data in this project. +**Google Summer of Code (GSoC) 2026 Project** -One of the key challenges in coastline extraction is the application of the Normalized Difference Water Index (NDWI), a widely used remote sensing index for identifying water bodies. However, using a single threshold across an entire image often results in suboptimal accuracy. To address this, I implemented a sliding window approach combined with Otsu thresholding, which dynamically adjusts thresholds over localized regions of the image. This method has shown promising improvements in accuracy. +A comprehensive deep learning pipeline for extracting accurate coastlines from high-resolution PlanetLabs satellite imagery, specifically designed for coastal erosion monitoring in Arctic Alaska communities. -The newly generated labeled data, derived from this approach, will be used to retrain the [DeepWaterMap algorithm](https://github.com/isikdogan/deepwatermap), replacing the original Global Surface Water data. This project aims to produce a more accurate and reliable tool for coastline detection, which is crucial for monitoring and mitigating coastal erosion in vulnerable areas like Alaska. +## Overview -## Installation +The rapidly warming Arctic is leading to increased rates of coastal erosion, placing hundreds of Alaska communities at the frontline of climate change. This project provides an automated solution for extracting vectorized coastlines from 3-meter resolution PlanetLabs imagery to support erosion modeling and forecasting. -### Prerequisites +### Key Features -Before installing this project, ensure you have the following requirements: +- **UDM-Based Cloud Masking**: Integrated QA60/UDM2 quality band processing for reliable cloud and shadow removal +- **Sliding Window NDWI**: Localized Otsu thresholding with majority voting for adaptive water detection +- **DEM Integration**: Terrain analysis for improved cliff/steep slope segmentation +- **Quality Flag System**: 8-bit comprehensive pixel-level quality assessment +- **DeepWaterMap Training**: U-Net based deep learning model adapted for 4-band PlanetLabs imagery +- **Data Expansion Pipeline**: Automated ingestion of imagery from 2016-2026 with seasonal filtering +- **Validation Framework**: Transect-based RMSE and raster-based accuracy metrics -- **Python** - - **Project version**: Tested and developed with **Python 3.13.5** - - **Conda environment**: Recommended to use **Python 3.10** for best compatibility with dependencies +### Project Links -- **Miniconda** (for managing conda environments) - -- **Git** (for cloning the repository) - -- **GDAL** (install via `conda-forge` for easier setup) - -- **Rasterio 1.4.3+** (for geospatial data processing) - ---- - -### Clone the Repository - -Clone the project using the `dev` branch (this branch contains the latest development features): - -```bash -git clone -b dev https://github.com/your-username/coastline-extraction.git -cd coastline-extraction -``` - ---- - -### Environment Setup - -1. **Create a virtual environment**: - -```bash -python -m venv coastline_env -source coastline_env/bin/activate # On Windows: coastline_env\Scripts\activate -``` - -2. **Install required dependencies**: - -```bash -# Core deep learning libraries -pip install torch torchvision - -# Geospatial data processing -pip install rasterio gdal - -# Data manipulation and visualization -pip install numpy pandas matplotlib - -# Image processing -pip install scikit-image opencv-python - -# Utilities -pip install tqdm pillow - -# Additional dependencies for data preprocessing -pip install shapely fiona geopandas -``` +- **Source Code**: https://github.com/fwitmer/CoastlineExtraction +- **My Fork**:https://github.com/janvis11/CoastlineExtraction +- **Discussion Forum**: https://github.com/fwitmer/CoastlineExtraction/discussions +- **Mentors**: Frank Witmer (fwitmer@alaska.edu), Ritika Kumari (rkjane333@gmail.com) --- - -## Configuration - -This project uses a centralized configuration system to manage file paths and parameters. -Configuration is handled through `config_template.json` and the `load_config.py` module. - -### Setting Up Configuration - -1. **Copy the template**: - -```bash -cp config_template.json config.json -``` - -2. **Edit the configuration**: Open `config.json` and modify the paths according to your setup: - -```json -{ - "data_dir": "data", - "image_folder": "sample_data/PlanetLabs", - "raw_data_folder": "raw_data", - "shapefile_folder": "USGS_Coastlines", - "ground_truth_folder": "ground_truth", - "processed_data_folder": "processed_data", - "training": { - "model_save_path": "training_pipeline/unet_model.pth", - "batch_size": 8, - "epochs": 30, - "learning_rate": 1e-4, - "image_size": [256, 256], - "train_split": 0.8, - "device": "auto" - } -} -``` - -### Using the Configuration System - -The `load_config.py` module provides convenient functions to access your data files: - -```python -from load_config import load_config, get_image_path, get_shapefile_path - -# Load configuration -config = load_config() - -# Get specific file paths -image_path = get_image_path(config, 0) # First image file -shapefile_path = get_shapefile_path(config, 0) # First shapefile -``` ---- - - -## Contributing - -### Working with the Dev Branch - -This project uses the `dev` branch for active development. When contributing: - -1. **Fork the repository on GitHub** - -2. **Clone your fork using the `dev` branch**: - -```bash -git clone -b dev https://github.com/your-username/coastline-extraction.git -cd coastline-extraction -``` - -3. **Create a feature branch from `dev`**: - -```bash -git checkout -b feature/your-feature-name -``` - -4. **Make your changes and commit them**: - -```bash -git add . -git commit -m "Add your feature description" -``` - -5. **Push to your fork**: - -```bash -git push origin feature/your-feature-name -``` - -6. **Create a Pull Request** targeting the `dev` branch (not `main`) diff --git a/data_preprocessing/ndwi_with_udm.py b/data_preprocessing/ndwi_with_udm.py new file mode 100644 index 0000000..a91d7dd --- /dev/null +++ b/data_preprocessing/ndwi_with_udm.py @@ -0,0 +1,523 @@ +""" +Module: ndwi_with_udm.py + +Description: Integrated NDWI computation with UDM-based cloud masking. + This module combines cloud masking preprocessing with the + existing NDWI sliding window approach for improved label quality. + + Workflow: + 1. Apply UDM/QA60 cloud masking to input image + 2. Compute NDWI on cloud-masked reflectance values + 3. Use sliding window + Otsu thresholding for water classification + 4. Generate vectorized coastline shapefiles + +Author: Janvis11 (GSoC 2026 Contributor) +Date: 2026-03-28 + +Issue: #100 - Improving NDWI Label Quality via UDM-Based Masking +""" + +import os +import numpy as np +import rasterio as rio +from rasterio.mask import mask +from rasterio.io import MemoryFile +import cv2 +from matplotlib import pyplot as plt +import geopandas as gpd +from shapely.geometry import box + +# Local imports +from load_config import load_config, get_image_path, get_shapefile_path +from utils.check_crs import check_crs, crs_match +from utils.spatial_analysis import log_spatial_info +from utils.shapefile_generator import save_concatenated_ndwi_with_shapefile + +# Import UDM masking functions +from data_preprocessing.udm_masking import ( + apply_udm_mask, + read_qa60_band, + create_cloud_mask_from_qa60, + create_cloud_mask_from_udm2 +) + + +# ============================================================================= +# Configuration +# ============================================================================= + +# Gaussian blur parameters (same as ndwi_labels.py) +KSIZE_BLUR = (9, 9) +SIGMA_X = 6 +SIGMA_Y = 6 + +# Majority voting threshold +MAJORITY_THRESHOLD = 0.55 + +# UDM masking options +UDM_CONFIG = { + 'mask_cirrus': True, # Mask cirrus clouds (Sentinel-2 QA60 bit 11) + 'mask_shadow': True, # Mask cloud shadows (Planet UDM2) + 'mask_snow': False, # Mask snow/ice (set True for winter images) + 'min_clear_percentage': 50 # Minimum clear pixels required (%) +} + + +# ============================================================================= +# Main NDWI with UDM Function +# ============================================================================= + +def get_ndwi_label_with_udm(image_path, points_path, ksize=100, blurring=True, + apply_cloud_mask=True, output_dir=None): + """ + Perform NDWI classification with optional UDM-based cloud masking. + + This function extends the standard NDWI workflow by adding a cloud masking + preprocessing step. Clouds and shadows are filtered before NDWI computation, + reducing noise and invalid classifications in the final labels. + + Args: + image_path (str): Path to input multispectral GeoTIFF + points_path (str): Path to transect points shapefile + ksize (int, optional): Sliding window size in pixels. Defaults to 100. + blurring (bool, optional): Apply Gaussian blur. Defaults to True. + apply_cloud_mask (bool, optional): Apply UDM cloud masking. Defaults to True. + output_dir (str, optional): Output directory. Defaults to 'result_ndwi_udm' + + Returns: + dict: Results containing NDWI arrays, cloud mask, and output paths + """ + if output_dir is None: + output_dir = "result_ndwi_udm" + + os.makedirs(output_dir, exist_ok=True) + + # Validate inputs + check_crs(image_path, verbose=True) + check_crs(points_path, verbose=True) + + # ========================================================================= + # Step 1: Apply UDM Cloud Masking (if enabled) + # ========================================================================= + + cloud_mask = None + spectral_data = None + + if apply_cloud_mask: + print("=" * 60) + print("STEP 1: Applying UDM-based cloud masking") + print("=" * 60) + + udm_result = apply_udm_mask( + image_path, + output_dir=output_dir, + mask_cirrus=UDM_CONFIG['mask_cirrus'], + mask_shadow=UDM_CONFIG['mask_shadow'], + mask_snow=UDM_CONFIG['mask_snow'], + visualize=True + ) + + if udm_result: + cloud_mask = udm_result['cloud_mask'] + spectral_data = udm_result['masked_image'] + + # Check if enough clear pixels + if udm_result['clear_percentage'] < UDM_CONFIG['min_clear_percentage']: + print(f"WARNING: Only {udm_result['clear_percentage']:.1f}% clear pixels.") + print(" Results may be unreliable due to cloud cover.") + + # Use masked image path for subsequent processing + processed_image_path = udm_result['masked_tiff'] + else: + print("Cloud masking skipped (no QA60/UDM band found)") + processed_image_path = image_path + else: + processed_image_path = image_path + + # ========================================================================= + # Step 2: Read spectral bands and compute NDWI + # ========================================================================= + + print("\n" + "=" * 60) + print("STEP 2: Computing NDWI") + print("=" * 60) + + with rio.open(processed_image_path) as src: + # Determine green and NIR band indices + # For 4-band (RGBN): Green=2, NIR=4 + # For 5-band (RGBN+mask): Green=2, NIR=4 or 5 + + green_band = 2 + nir_band = src.count if src.count >= 4 else 4 + + # Read bands (use pre-masked data if available) + if spectral_data is not None: + green = spectral_data[green_band - 1] + nir = spectral_data[nir_band - 1] + else: + green = src.read(green_band).astype(np.float32) + nir = src.read(min(nir_band, src.count)).astype(np.float32) + + # Compute NDWI + np.seterr(divide='ignore', invalid='ignore') + ndwi = (green - nir) / (green + nir) + ndwi[np.isnan(ndwi)] = 0 + + ndwi_profile = src.profile.copy() + + # Apply Gaussian blur + if blurring: + print("Applying Gaussian blur...") + ndwi = cv2.GaussianBlur(ndwi, KSIZE_BLUR, SIGMA_X, SIGMA_Y) + + # Initialize output arrays + label = np.zeros((src.height, src.width), dtype=np.uint8) + buffer_numbers = np.zeros((src.height, src.width), dtype=np.uint8) + water_count = np.zeros((src.height, src.width), dtype=np.uint8) + + src_crs = src.crs + pixel_size = abs(src.transform[0]) + raster_bounds = src.bounds + + print(f"NDWI range: [{ndwi.min():.3f}, {ndwi.max():.3f}]") + + # ========================================================================= + # Step 3: Sliding Window Analysis + # ========================================================================= + + print("\n" + "=" * 60) + print("STEP 3: Sliding window classification") + print("=" * 60) + + # Prepare points + points_shp = gpd.read_file(points_path) + points_geom = points_shp.geometry + + if points_shp.crs != src_crs: + print(f"Reprojecting points from {points_shp.crs} to {src_crs}") + points_geom = points_geom.to_crs(src_crs) + + # Save reprojected points + reprojected_points_path = os.path.join(output_dir, "reprojected_points.shp") + gpd.GeoDataFrame(geometry=points_geom).to_file(reprojected_points_path) + + # Validate CRS + if not crs_match(processed_image_path, reprojected_points_path): + raise ValueError("CRS mismatch after reprojection!") + + # Log spatial info + log_spatial_info(raster_bounds, points_geom) + + # Process each point + otsu_thresholds = [] + skipped = 0 + + for multipoint in points_geom: + for point in multipoint.geoms: + buffer = point.buffer(ksize * pixel_size, cap_style=3) + + ndwi_profile.update(count=1, nodata=0, dtype=rio.float32) + + with MemoryFile() as memfile: + with memfile.open(**ndwi_profile) as mem_data: + mem_data.write_band(1, ndwi) + + with memfile.open() as dataset: + raster_bounds_geom = box(*dataset.bounds) + + if not buffer.intersects(raster_bounds_geom): + skipped += 1 + continue + + out_image, out_transform = mask( + dataset, shapes=[buffer], nodata=-1, crop=False + ) + out_image = out_image[0] + + out_image_clipped, _ = mask( + dataset, shapes=[buffer], nodata=-1, crop=True + ) + out_image_clipped = out_image_clipped[0] + + # Skip small windows + if out_image_clipped.shape[0] < 200 or out_image_clipped.shape[1] < 200: + skipped += 1 + continue + + # Convert to 8-bit for Otsu + out_image_8bit = ((out_image * 127) + 128).astype(np.uint8) + out_image_clipped_8bit = ((out_image_clipped * 127) + 128).astype(np.uint8) + + # Otsu thresholding + threshold, _ = cv2.threshold( + out_image_clipped_8bit, 0, 1, + cv2.THRESH_BINARY + cv2.THRESH_OTSU + ) + + # Validate threshold + if threshold == 0.0 or threshold == 1.0: + skipped += 1 + continue + + otsu_thresholds.append(threshold) + + # Apply threshold to full window + threshold_window = np.where(out_image_8bit >= threshold, 1, 0).astype(np.uint8) + label = label | threshold_window + water_count = water_count + threshold_window + + # Update buffer count + mask_array = np.where(out_image_8bit != -1, 1, 0).astype(np.uint8) + buffer_numbers = buffer_numbers + mask_array + + print(f"Valid windows processed: {len(otsu_thresholds)}") + print(f"Windows skipped: {skipped}") + + # ========================================================================= + # Step 4: Combine Results + # ========================================================================= + + print("\n" + "=" * 60) + print("STEP 4: Combining classification results") + print("=" * 60) + + # Majority voting + label_majority = np.where( + water_count > (buffer_numbers * MAJORITY_THRESHOLD), 1, 0 + ) + + # Global threshold + if otsu_thresholds: + mean_threshold = np.mean(otsu_thresholds) + 10 + else: + mean_threshold = 128 + + ndwi_8bit = ((ndwi * 127) + 128).astype(np.uint8) + ndwi_classified = np.where(ndwi_8bit >= mean_threshold, 1, 0) + + # Combine local and global classification + ndwi_concatenated = ndwi_classified.copy() + sliding_windows = np.where(buffer_numbers > 0, 1, 0) + water_areas = np.where(label == 1, 1, 0) + ndwi_concatenated = np.where( + (sliding_windows == 1) & (water_areas == 1), 1, ndwi_concatenated + ) + + # Print statistics + print(f"\nClassification Statistics:") + print(f" Mean threshold (8-bit): {mean_threshold:.1f}") + print(f" Water pixels (NDWI classified): {np.sum(ndwi_classified):,}") + print(f" Water pixels (majority voting): {np.sum(label_majority):,}") + print(f" Water pixels (concatenated): {np.sum(ndwi_concatenated):,}") + + if apply_cloud_mask and cloud_mask is not None: + cloudy_water = np.sum(ndwi_classified & ~cloud_mask) + print(f" Water pixels in cloudy areas: {cloudy_water:,}") + + # ========================================================================= + # Step 5: Save Outputs + # ========================================================================= + + print("\n" + "=" * 60) + print("STEP 5: Saving outputs") + print("=" * 60) + + # Save shapefile + try: + save_concatenated_ndwi_with_shapefile( + ndwi_concatenated, ndwi_profile, processed_image_path, output_dir + ) + except Exception as e: + print(f"Warning: Shapefile generation failed: {e}") + + # Save GeoTIFF + base_name = os.path.splitext(os.path.basename(processed_image_path))[0] + tiff_path = os.path.join(output_dir, f"{base_name}_ndwi_udm.tif") + + ndwi_profile.update(count=1, dtype='uint8', nodata=255) + with rio.open(tiff_path, 'w', **ndwi_profile) as dst: + dst.write(ndwi_concatenated.astype(np.uint8), 1) + + print(f"Saved: {tiff_path}") + + # Save plots + save_comparison_plots( + ndwi, ndwi_classified, label_majority, ndwi_concatenated, + cloud_mask, output_dir, base_name + ) + + return { + 'ndwi': ndwi, + 'ndwi_classified': ndwi_classified, + 'label_majority': label_majority, + 'ndwi_concatenated': ndwi_concatenated, + 'cloud_mask': cloud_mask, + 'output_tiff': tiff_path, + 'thresholds': otsu_thresholds + } + + +def save_comparison_plots(ndwi, ndwi_classified, label_majority, + ndwi_concatenated, cloud_mask, output_dir, base_name): + """ + Save comparison plots showing the effect of UDM masking. + + Args: + ndwi (np.ndarray): NDWI values + ndwi_classified (np.ndarray): Binary classification + label_majority (np.ndarray): Majority voting result + ndwi_concatenated (np.ndarray): Combined result + cloud_mask (np.ndarray): Cloud mask (None if not applied) + output_dir (str): Output directory + base_name (str): Base name for files + """ + if cloud_mask is not None: + fig, axes = plt.subplots(2, 3, figsize=(20, 12)) + + # Row 1 + axes[0, 0].imshow(ndwi, cmap='RdYlBu') + axes[0, 0].set_title('NDWI Values') + axes[0, 0].axis('off') + + axes[0, 1].imshow(cloud_mask, cmap='gray') + axes[0, 1].set_title('Cloud Mask (White=Clear)') + axes[0, 1].axis('off') + + axes[0, 2].imshow(ndwi_classified) + axes[0, 2].set_title('NDWI Classified (Global Threshold)') + axes[0, 2].axis('off') + + # Row 2 + axes[1, 0].imshow(label_majority) + axes[1, 0].set_title('Majority Voting') + axes[1, 0].axis('off') + + axes[1, 1].imshow(ndwi_concatenated) + axes[1, 1].set_title('Final Combined Result') + axes[1, 1].axis('off') + + # Difference plot + diff = np.abs(ndwi_classified.astype(float) - label_majority.astype(float)) + im = axes[1, 2].imshow(diff, cmap='Reds') + axes[1, 2].set_title('Difference (Local vs Global)') + axes[1, 2].axis('off') + else: + fig, axes = plt.subplots(2, 2, figsize=(16, 12)) + + axes[0, 0].imshow(ndwi, cmap='RdYlBu') + axes[0, 0].set_title('NDWI Values') + axes[0, 0].axis('off') + + axes[0, 1].imshow(ndwi_classified) + axes[0, 1].set_title('NDWI Classified') + axes[0, 1].axis('off') + + axes[1, 0].imshow(label_majority) + axes[1, 0].set_title('Majority Voting') + axes[1, 0].axis('off') + + axes[1, 1].imshow(ndwi_concatenated) + axes[1, 1].set_title('Final Result') + axes[1, 1].axis('off') + + plt.tight_layout() + fig.savefig(os.path.join(output_dir, f"{base_name}_ndwi_udm_comparison.png"), dpi=150) + plt.close() + + print(f"Saved visualization: {os.path.join(output_dir, f'{base_name}_ndwi_udm_comparison.png')}") + + +# ============================================================================= +# Comparison Function: With vs Without UDM +# ============================================================================= + +def compare_ndwi_with_and_without_udm(image_path, points_path, ksize=100): + """ + Run NDWI classification with and without UDM masking for comparison. + + This function helps evaluate the improvement from UDM-based cloud masking + by running both versions and comparing the results. + + Args: + image_path (str): Input image path + points_path (str): Transect points shapefile + ksize (int): Sliding window size + + Returns: + dict: Comparison results + """ + print("\n" + "=" * 70) + print("COMPARISON: NDWI with UDM vs NDWI without UDM") + print("=" * 70) + + # Without UDM + print("\n--- Running NDWI WITHOUT cloud masking ---") + result_no_udm = get_ndwi_label_with_udm( + image_path, points_path, ksize=ksize, + apply_cloud_mask=False, + output_dir="result_ndwi_comparison/no_udm" + ) + + # With UDM + print("\n--- Running NDWI WITH cloud masking ---") + result_with_udm = get_ndwi_label_with_udm( + image_path, points_path, ksize=ksize, + apply_cloud_mask=True, + output_dir="result_ndwi_comparison/with_udm" + ) + + # Compare statistics + print("\n" + "=" * 70) + print("COMPARISON RESULTS") + print("=" * 70) + + if result_no_udm and result_with_udm: + print(f"\nWater pixels detected:") + print(f" Without UDM: {np.sum(result_no_udm['ndwi_concatenated']):,}") + print(f" With UDM: {np.sum(result_with_udm['ndwi_concatenated']):,}") + + print(f"\nOtsu thresholds (mean):") + if result_no_udm['thresholds']: + print(f" Without UDM: {np.mean(result_no_udm['thresholds']):.1f}") + if result_with_udm['thresholds']: + print(f" With UDM: {np.mean(result_with_udm['thresholds']):.1f}") + + if result_with_udm['cloud_mask'] is not None: + clear_pct = (np.sum(result_with_udm['cloud_mask']) / + result_with_udm['cloud_mask'].size) * 100 + print(f"\nCloud cover: {100-clear_pct:.1f}%") + + return { + 'without_udm': result_no_udm, + 'with_udm': result_with_udm + } + + +# ============================================================================= +# Main Execution +# ============================================================================= + +if __name__ == "__main__": + config = load_config() + + # Get paths from config + try: + image_path = get_image_path(config, 0) + points_path = get_shapefile_path(config, 0) + + print(f"Image: {image_path}") + print(f"Points: {points_path}") + + # Run with UDM masking + result = get_ndwi_label_with_udm( + image_path, points_path, + apply_cloud_mask=True, + output_dir="result_ndwi_udm" + ) + + print("\nProcessing complete!") + print(f"Output: {result['output_tiff']}") + + except Exception as e: + print(f"Error: {e}") + print("Ensure config.json is properly configured") diff --git a/data_preprocessing/udm_masking.py b/data_preprocessing/udm_masking.py new file mode 100644 index 0000000..3d2b370 --- /dev/null +++ b/data_preprocessing/udm_masking.py @@ -0,0 +1,360 @@ +""" +Module: udm_masking.py + +Description: Apply UDM (User Data Mask) / QA60-based cloud masking for Sentinel-2 + and Planet Labs imagery before NDWI computation. This preprocessing + step reduces noise and invalid regions in water classification. + +Author: Janvis11 (GSoC 2026 Contributor) +Date: 2026-03-28 + +References: +- Sentinel-2 QA60 band: Cloud and snow masking +- Planet Labs UDM2: Cloud, shadow, and snow detection +""" + +import os +import numpy as np +import rasterio as rio +from rasterio.mask import mask +from rasterio.io import MemoryFile +import cv2 +from matplotlib import pyplot as plt + +# Local imports +from load_config import load_config, get_image_path +from utils.check_crs import check_crs + + +# ============================================================================= +# UDM/QA60 Bit Flag Definitions +# ============================================================================= + +# Sentinel-2 QA60 bit flags +QA60_CLOUD_BIT = 10 # Bit 10: Opaque clouds +QA60_CIRRUS_BIT = 11 # Bit 11: Cirrus clouds + +# Planet Labs UDM2 bit flags (if available) +UDM2_CLOUD_BIT = 0 # Cloud pixels +UDM2_SHADOW_BIT = 1 # Cloud shadow pixels +UDM2_SNOW_BIT = 2 # Snow/ice pixels +UDM2_WATER_BIT = 3 # Water pixels + + +def read_qa60_band(image_path): + """ + Read the QA60 band from a Sentinel-2 or Planet Labs image. + + For Sentinel-2, QA60 is typically band 10 (quality control band). + For Planet Labs with UDM2, it may be included as an additional band. + + Args: + image_path (str): Path to the input GeoTIFF file + + Returns: + tuple: (qa60_data, profile) - QA60 band array and raster profile + Returns (None, None) if QA60 band not found + """ + with rio.open(image_path) as src: + # Check if QA60 band exists (usually last band or band 10) + qa60_band_idx = None + + # Try to find QA60 by name in band descriptions + for i, desc in enumerate(src.descriptions): + if desc and ('QA60' in desc or 'qa60' in desc or 'UDM' in desc or 'udm' in desc): + qa60_band_idx = i + 1 + break + + # If not found by name, try common positions + if qa60_band_idx is None: + # For Sentinel-2, QA60 is often band 10 + if src.count >= 10: + qa60_band_idx = 10 + # For Planet Labs with UDM2, it might be the last band + elif src.count >= 5: + qa60_band_idx = src.count + + if qa60_band_idx is None or qa60_band_idx > src.count: + print(f"QA60/UDM band not found in {image_path}") + return None, None + + qa60 = src.read(qa60_band_idx).astype(np.uint16) + profile = src.profile.copy() + + return qa60, profile + + +def create_cloud_mask_from_qa60(qa60_data, mask_cirrus=True): + """ + Create a binary cloud mask from QA60 bit flags. + + Args: + qa60_data (np.ndarray): QA60 band data as uint16 array + mask_cirrus (bool, optional): Also mask cirrus clouds. Defaults to True. + + Returns: + np.ndarray: Boolean mask where True = clear pixels, False = cloudy pixels + """ + # Create mask for opaque clouds (bit 10) + cloud_mask = (qa60_data & (1 << QA60_CLOUD_BIT)) == 0 + + # Optionally mask cirrus clouds (bit 11) + if mask_cirrus: + cirrus_mask = (qa60_data & (1 << QA60_CIRRUS_BIT)) == 0 + cloud_mask = cloud_mask & cirrus_mask + + return cloud_mask + + +def create_cloud_mask_from_udm2(udm2_data, mask_shadow=True, mask_snow=False): + """ + Create a cloud mask from Planet Labs UDM2 data. + + Args: + udm2_data (np.ndarray): UDM2 band data as uint8 array + mask_shadow (bool, optional): Also mask cloud shadows. Defaults to True. + mask_snow (bool, optional): Also mask snow/ice. Defaults to False. + + Returns: + np.ndarray: Boolean mask where True = valid pixels, False = masked pixels + """ + # Start with all pixels valid + valid_mask = np.ones_like(udm2_data, dtype=bool) + + # Mask cloud pixels (bit 0) + cloud_mask = (udm2_data & (1 << UDM2_CLOUD_BIT)) == 0 + valid_mask = valid_mask & cloud_mask + + # Optionally mask shadows + if mask_shadow: + shadow_mask = (udm2_data & (1 << UDM2_SHADOW_BIT)) == 0 + valid_mask = valid_mask & shadow_mask + + # Optionally mask snow + if mask_snow: + snow_mask = (udm2_data & (1 << UDM2_SNOW_BIT)) == 0 + valid_mask = valid_mask & snow_mask + + return valid_mask + + +def apply_udm_mask(image_path, output_dir=None, mask_cirrus=True, + mask_shadow=True, mask_snow=False, visualize=True): + """ + Apply UDM/QA60-based cloud masking to an image. + + This function reads the QA60/UDM2 band from the input image, creates a cloud + mask, and applies it to all spectral bands. The masked image can then be + used for NDWI computation with reduced cloud contamination. + + Args: + image_path (str): Path to input GeoTIFF with QA60/UDM2 band + output_dir (str, optional): Directory for output files. Defaults to 'result_udm_masking' + mask_cirrus (bool, optional): Mask cirrus clouds. Defaults to True. + mask_shadow (bool, optional): Mask cloud shadows. Defaults to True. + mask_snow (bool, optional): Mask snow/ice. Defaults to False. + visualize (bool, optional): Create visualization plots. Defaults to True. + + Returns: + dict: Contains masked_image, cloud_mask, and output paths + """ + if output_dir is None: + output_dir = "result_udm_masking" + + os.makedirs(output_dir, exist_ok=True) + + # Check CRS + check_crs(image_path, verbose=False) + + # Read the QA60/UDM band + print(f"Reading QA60/UDM band from {image_path}") + qa60_data, profile = read_qa60_band(image_path) + + if qa60_data is None: + print("No QA60/UDM band found. Skipping cloud masking.") + return None + + # Read all spectral bands (exclude QA60/UDM band) + with rio.open(image_path) as src: + # Determine which bands are spectral (not QA60) + spectral_bands = [] + for i in range(1, src.count + 1): + if src.descriptions and ('QA60' in src.descriptions[i-1] or + 'UDM' in src.descriptions[i-1]): + continue + spectral_bands.append(i) + + spectral_data = src.read(spectral_bands) + src_profile = src.profile.copy() + src_transform = src.transform + src_crs = src.crs + + # Create cloud mask + print("Creating cloud mask...") + if qa60_data.dtype == np.uint16: + # Sentinel-2 QA60 + cloud_mask = create_cloud_mask_from_qa60(qa60_data, mask_cirrus=mask_cirrus) + else: + # Planet Labs UDM2 + cloud_mask = create_cloud_mask_from_udm2(qa60_data, + mask_shadow=mask_shadow, + mask_snow=mask_snow) + + # Apply mask to spectral data + print("Applying cloud mask to spectral bands...") + masked_spectral = spectral_data.copy().astype(np.float32) + + # Set cloudy pixels to nodata (will be handled in NDWI calculation) + for i in range(masked_spectral.shape[0]): + masked_spectral[i, ~cloud_mask] = np.nan + + # Calculate statistics + total_pixels = cloud_mask.size + cloudy_pixels = np.sum(~cloud_mask) + clear_percentage = (np.sum(cloud_mask) / total_pixels) * 100 + + print(f"Cloud masking complete: {clear_percentage:.1f}% clear pixels") + print(f" Cloudy pixels: {cloudy_pixels:,} ({100-clear_percentage:.1f}%)") + print(f" Clear pixels: {np.sum(cloud_mask):,}") + + # Save masked image + base_name = os.path.splitext(os.path.basename(image_path))[0] + output_tiff = os.path.join(output_dir, f"{base_name}_cloud_masked.tif") + + # Update profile for output + src_profile.update({ + 'count': spectral_data.shape[0], + 'dtype': 'float32', + 'nodata': np.nan + }) + + with rio.open(output_tiff, 'w', **src_profile) as dst: + dst.write(masked_spectral) + + print(f"Saved masked image: {output_tiff}") + + # Save cloud mask as GeoTIFF + mask_tiff = os.path.join(output_dir, f"{base_name}_cloud_mask.tif") + mask_profile = src_profile.copy() + mask_profile.update(count=1, dtype='uint8', nodata=255) + + with rio.open(mask_tiff, 'w', **mask_profile) as dst: + dst.write(cloud_mask.astype(np.uint8) * 255, 1) + + print(f"Saved cloud mask: {mask_tiff}") + + # Create visualization + if visualize: + save_udm_visualization(cloud_mask, masked_spectral, spectral_data, + output_dir, base_name) + + return { + 'masked_image': masked_spectral, + 'cloud_mask': cloud_mask, + 'masked_tiff': output_tiff, + 'mask_tiff': mask_tiff, + 'clear_percentage': clear_percentage + } + + +def save_udm_visualization(cloud_mask, masked_spectral, original_spectral, + output_dir, base_name): + """ + Save visualization plots comparing original and masked images. + + Args: + cloud_mask (np.ndarray): Boolean cloud mask + masked_spectral (np.ndarray): Cloud-masked spectral data + original_spectral (np.ndarray): Original spectral data + output_dir (str): Output directory + base_name (str): Base name for output files + """ + fig, axes = plt.subplots(1, 3, figsize=(18, 6)) + + # Plot 1: Cloud mask + axes[0].imshow(cloud_mask, cmap='gray') + axes[0].set_title('Cloud Mask (White = Clear)') + axes[0].axis('off') + + # Plot 2: Original RGB (bands 3, 2, 1 or 4, 3, 2) + if original_spectral.shape[0] >= 3: + rgb_idx = [2, 1, 0] # Use first 3 bands + else: + rgb_idx = [0, 0, 0] # Fallback + + rgb_original = np.stack([original_spectral[i] for i in rgb_idx], axis=-1) + rgb_original = (rgb_original - rgb_original.min()) / (rgb_original.max() - rgb_original.min()) + axes[1].imshow(rgb_original) + axes[1].set_title('Original Image') + axes[1].axis('off') + + # Plot 3: Masked RGB + if masked_spectral.shape[0] >= 3: + rgb_masked = np.stack([masked_spectral[i] for i in rgb_idx], axis=-1) + rgb_masked = np.nan_to_num(rgb_masked, nan=0) + rgb_masked = (rgb_masked - rgb_masked.min()) / (rgb_masked.max() - rgb_masked.min() + 1e-10) + axes[2].imshow(rgb_masked) + axes[2].set_title('Cloud-Masked Image') + axes[2].axis('off') + + plt.tight_layout() + fig.savefig(os.path.join(output_dir, f"{base_name}_udm_masking.png"), dpi=150) + plt.close() + + print(f"Saved visualization: {os.path.join(output_dir, f'{base_name}_udm_masking.png')}") + + +def compare_ndwi_with_udm(image_path, points_path, use_udm=True): + """ + Compare NDWI results with and without UDM masking. + + This function computes NDWI both with and without cloud masking to + demonstrate the improvement in label quality. + + Args: + image_path (str): Path to input image + points_path (str): Path to transect points shapefile + use_udm (bool): Whether to apply UDM masking + + Returns: + dict: NDWI comparison results + """ + from ndwi_labels import get_ndwi_label + + # Apply UDM masking first + if use_udm: + udm_result = apply_udm_mask(image_path) + if udm_result: + masked_image_path = udm_result['masked_tiff'] + # Run NDWI on masked image + # Note: This would need modification to return NDWI data + print(f"NDWI with UDM masking: {masked_image_path}") + + return None + + +# ============================================================================= +# Main execution for testing +# ============================================================================= + +if __name__ == "__main__": + # Example usage + config = load_config() + + # Get first image from config + try: + image_path = get_image_path(config, 0) + print(f"Processing: {image_path}") + + # Apply UDM masking + result = apply_udm_mask(image_path, visualize=True) + + if result: + print(f"\nResults:") + print(f" Clear pixels: {result['clear_percentage']:.1f}%") + print(f" Output: {result['masked_tiff']}") + print(f" Mask: {result['mask_tiff']}") + + except Exception as e: + print(f"Error: {e}") + print("Make sure config.json is set up correctly")