Terasaki Hart et al, 2025, Nature (DOI: 10.1038/s41586-025-09410-3)
This repo contains all the code used to produce the analysis for this paper and for its precursor, chapter 3 of my PhD dissertation.
The study does the following main things:
- presents a simple method for near-global, consistent estimation of characteristic, annual land surface phenology (plain English: presents a technique to estimate, nearly everywhere, the average timing of seasonal plant growth cycles)
- analyzes the patterns of global diversity and regional heterogeneity that result from application of that method to 20 years of MODIS near infrared reflectance of vegetation (
$NIR_{V}$ ) data (plain English: applies that technique to map plant seasonality nearly everywere, using pictures of Earth collected from satellites, in space!) - maps spatial asynchrony of LSP and assesses its potential drivers (plain English: makes a map showing places where seasonal timing is very different in some places than in other not-too-far-away places, and another map trying to figure out why that happens in the places where it does)
- describes a latitudinal gradient in isoclimatic phenological asynchrony (plain English: finds that sometimes, mainly in the tropics, the seasonal timing can be pretty different between not-too-far-away sites even if they have similar overall climates --- for example, if they're both the same type of rainforest)
- uses a number of ground datasets to demonstrate that remotely sensed LSP maps can predict patterns of phenological asynchrony and genetic divergence across a range of species and regions (plain English: shows that maps of plant seasonality based on satellite pictures can be used to predict how the timing of plant and animal reproduction differs between different places, and thus how related the plants and animal are between those places)
Code was written by me (Drew), with contributions from Lauren Di Maggio and Thao-Nguyen Bui.
Code is made freely available under the MIT License, to be distributed and/or modified with due attribution.
Data are archived here (10.5281/zenodo.15654956), and the main mapping results are also made publicly available as a Google Earth Engine asset ('projects/lyrical-ring-231401/assets/Terasaki_Hart_2025_LSP_main_map_results').
That same asset can also be explored through the Google Earth Engine app:
Any questions, concerns, interests, or requests can be directed to: drew dot terasaki dot hart at gmail dot com.
All analysis code is located in src. Analyses were run in two major stages, so the two major subdirectories of src are organized to reflect this:
src/phen/: calculation of global maps of characteristic seasonal phenology, and associated analysissrc/asynch/: calculation of global maps of phenological asynchrony, and associated analysis
Additional src subdirectories include:
src/data_prep: code used to download the SIF dataset used, convert to Geotiff, and upload to Google Earth Enginge (GEE)src/etc: includes a script of commonly used helper functions, a Python port of [MMRR] (http://onlinelibrary.wiley.com/doi/10.1111/evo.12134), and other odds and ends
Other directories include:
data: ancillary data (not our main datasets) that lives in this repo and is used in analysisres: results (organized into subdirectories for figures and tables)
Each step of the following workflow was executed in the environment indicated in italics (either on GEE, on the UC Berkeley Savio cluster, or on my laptop). See working environments, below, for the software specifications of each of these environments.
- On laptop, download SIF dataset
- On laptop, run
bash src/data_prep/sif_gee_ul/convert_SIF_OCO2_ANN_NetCDF_to_GTiff.shto translate SIF data from NetCDF to GeoTIFF. - On laptop, run
python src/data_prep/sif_gee_ul/prep_SIF_OCO2_ANN_upload_metadata.pyto create CSV of SIF metadata. - On laptop, python
src/data_prep/sif_gee_ul/upload_SIF_OCO2_ANN_data_to_GEE_collections.pyto upload SIF data to GEE as individual Images. - On laptop, run
bash src/data_prep/sif_gee_ul/make_image_collection.shto combine GEE SIF data into an ImageCollection. - On laptop, manually download gridded TROPOMI data.
- On laptop, run
python src/phen/eval/orbital_gaps/evaluate_ANN-gridded_OCO2_orbital_gaps.pyto check that the seasonality of the OCO2-SIF ANN-interpolated data products within OCO2 orbital gaps compares favorably to another gridded SIF dataset, for three regions across the tropics.
- On laptop, manually download and unzip into the 'data' subdirectory the global level-1 administrative boundaries Shapefile at 1:10m scale from Natural Earth.
-
On laptop, manuall download and unzip into the 'data' subdirectory the level-1 jurisdictional boundaries GeoJSON datasets for select large-area nations, mostly
$>2.5\times10^6\ km^2$ land area (Canada, USA, Mexico, Brazil, Argentina, Kazakhstan, India, Russia, China, Australia) from GADM. -
On laptop, run
python src/data_prep/juris_bounds/simplify_global_adm1_bounds.pyto transform those files into three files containing global level-0 bounds, global level-1 bounds, and level-1 bounds for just the large-area nations (1 task, <1m runtime).
- On GEE, establish all desired parameter values for data reading and masking, harmonic regression fitting, significance calculation, and data exports in
src/phen/calc/GEE/params.js. (Note: To produce all needed outputs, this file will be resaved with altered commenting a few times during the following steps.) - On GEE, run
src/phen/calc/GEE/masking/calc_month_data_avail_props.jsto produce a set of 12 global Image assets, each indicating the average proportion of monthly MODIS NBAR data availability at each pixel, to be used for masking based on proportional data availability and to derive the monthly data-availability evenness mask. (12 tasks, each ~20-30m runtime, but note that a few may fail the first time and need to be rerun until successful) - On GEE, run
src/phen/calc/GEE/masking/calc_month_evenness.jsto produce a global map of monthly Pielou's evenness of the MODIS NBAR data, as a GEE asset, derived from the outputs ofsrc/phen/calc/GEE/masking/calc_month_data_avail_props.js. (2 tasks, both <10m runtime) - On GEE, run
src/phen/calc/GEE/masking/calc_land_cover_mask.jsto produce a global map indicating all pixels that are valid (i.e., not water, urban, barren, or permanent ice/snow) and that are always 'natural' across our time series (i.e., forest, savanna, shrub, grass, wetland) and/or always agricultural. (All of those pixels will be used in the LSP analyses, but only the 'always natural' pixels will be used in the asynchrony analyses, to try to avoid places subject to anthropogenically created spatial asynchrony.) (2 tasks, both <10m runtime) - On GEE, run
src/phen/calc/GEE/masking/calc_ts_pct_data_availability.jsto produce a map of overall data availability for the entire 20-year archive. (1 task, ~4h runtime) - On GEE, run
src/phen/calc/GEE/masking/calc_permutation_fail_count_maps.jsto produce GEE assets representing, in total, 20 permutations of the seasonal phenology-fitting code for the NIRv data (unreasonable to produce more because of compute limitations). (10 tasks, each ~6-20h runtime) - On GEE, run
src/phen/calc/GEE/masking/calc_permutation_signif_mask.jsto combine the outputs of the previous step into the overall significance masks, to be used in the masking procedure for the final phenology maps. (1 task, <10m runtime) - On GEE, run
src/phen/calc/GEE/masking/create_overall_mask.jstwice, once with 'maskingMode' set to 'default' insrc/phen/calc/GEE/params.jsand once with it set to 'strict', to combine all previously created masks into a pair of default (used for LSP analyses) and strict (used for asynchrony analyses) mask assets. (7 tasks total, <10m runtime) - On GEE, run
src/phen/calc/GEE/io/calc_min_pos_NIRv.jsto calculate a global map asset of each pixel's minimum positive NIRv value (to be used to backfill negative NIRv values that occur, mainly in places and times with solid snow cover. (1 task, ~8h runtime)
- On GEE, run
src/phen/calc/GEE/fft/create_NIRv_harmonic_regression_asset.jsto run the main harmonic regression analysis on the MODIS NIRv dataset. (This is by far the most computationally intensive, so it's best to just run it once and save the result as an asset, which is then loaded, masked, and export to Drive bysrc/phen/calc/GEE/main.js.) (1 task, ~20h runtime) - On GEE, run
src/phen/calc/GEE/main.jsto launch a GEE task that will save tiled results to Google Drive. NOTE: Results will be formatted as overlapping TFRecord tiles. NOTE: In order to produce all results, this script must be manually called once for each combination of NIRv or SIF and default or strict masking, as well as once per climate variable (TerraClimate min temperature, precipitation, and climatic water deficit, as well as MODIS cloud). To do this, thedatasetName,climateVar, andmaskingModevariables must be manually swapped in this script. NOTE: To execute the full phenology-mapping workflow, main.js will in turn call a variety of other scripts located insrc/phen/calc. (9 tasks total, ranging from 10s of minutes for the NIRv, SIF and TerraClimate tasks to a handful of hours for the MODIS cloud task)
- On both laptop and Savio, navigate to the 'GEE_outputs' directory where all fitted LSP and seasonality files from GEE should be stored, then run
bash src/phen/calc/dl_and_organize_GEE_data.shto a.) download all results into the parent directory (usingrclone) (both on UC Berkeley's Savio cluster and locally on external hard drive), then b.) move each run's results into corresponding child directory.
- On Savio, run
bash src/asynch/calc/run_all_asynch_jobs.shto feed all three job scripts, one per asynchrony neighborhood radius, to slurm'ssbatchcommand, calculating each neighborhood radius' set of asynchrony maps for all fitted phenological and climatic seasonality datasets (3 slurm jobs total, runtimes of ~13h for the 50km neighborhood job, ~30h for 100km, and ~46h for 150km).
-
On Savio, run
sbatch src/asynch/calc/mosaic_job.shto mosaic the regression coefficient, regression$R^2$ , and asynchrony-result files for all LSP and climate variables and for all three asynchrony neighborhoods (50 km, 100 km, 150 km), producing a set of GeoTIFF outputs for downstream plotting and analysis (1 job, ~30m runtime). -
On Savio, run
bash src/asynch/calc/ul_mosaicked_results_from_savio_to_bdrive.shto copy all mosaicked results from Savio back up to Google Drive (1 task, ~15m runtime) -
On laptop, run
bash src/asynch/calc/dl_mosaicked_results_from_bdrive.shto then also copy those results down to their intended location on laptop external hard drive (1 task, ~1h runtime).
-
On laptop, run
bash src/phen/calc/masking/dl_GEE_masks.shto download all 6 mask GeoTIFFs output by GEE into the appropriate directory on the external hard drive (1 task, <5m runtime). -
On laptop, run
python src/phen/calc/masking/make_masking_maps_supp.pyto produce supplemental figure showing all masks used on the LSP datasets (1 task, <5m runtime). -
On laptop, run
python src/phen/calc/R2/make_R2_maps_supp.pyto produce supplemental figure showing the $R^2$s of the harmonic regressions fitted to all LSP and climate datasets (1 task, <5m runtime).
-
On Savio, run
sbatch src/phen/anal/div/LSP_EOF_clust_job.shto calculate the global NIRv LSP EOF map and save results (1 job, <1h total runtime). -
On Savio, run
bash src/phen/anal/div/ul_EOF_results_to_bdrive.shto push the EOF results back to BDrive (1 job, <5m runtime). -
On laptop, run
bash src/phen/anal/div/dl_LSP_EOFs.shto download the EOF results to the local hard drive (1 job, <5m runtime). -
On laptop, download ancillary cheatgrass data from Maestas et. al (to be used in a statistical test embedded in
src/phen/anal/div/plot_EOF_and_RGB_results.py). -
On laptop, run
bash src/phen/anal/div/aggregate_great_basin_cheatgrass_data.shto aggregate that dataset to our analysis resolution of$0.05^{\circ}$ . -
On laptop, work through the manual steps listed in the notes at the top of
python src/phen/anal/div/compose_ITCZ_shapefile.pyto digitize the Dec-Jan-Feb and Jun-Jul-Aug mean ITCZ locations delineated by Zhisheng et al. 2015, save the output CSVs to the 'data/' subdirectory of the local clone of this repo, then runpython src/phen/anal/div/compose_ITCZ_shapefile.pyto produce a Shapefile of those digitized lines. Then also copy the Shapefile output to the project's parent scratch directory on Savio. -
On laptop, run
bash src/phen/anal/div/make_EOF_and_RGB_map_figs.shto produce main figures showing the global EOF RGB map and the focal-region EOF RGB maps, and the supplemental figures showing the raw EOF maps and the unfolded EOF RGB maps (1 task, <30m runtime). - On laptop, manually assemble each focal-region map figure's .odg file (for both the main regional figure and the supplementals) in LibreOffice Draw, using the outputs of the previous step, then save a static image (by selecting all, clicking "File" > "Export...", checking the 'Selection' checkbox in the bottom left corner, selecting .png as the file type, clicking "Save", clicking the "Modify resolution" radio-button on the pop-up and setting the resolution to 500 pixels/inch, then clicking the "Modify dimensions" radio button, setting the width to 7 inches and letting the height auto-adjust, then clicking "OK").
- On GEE, run
src/phen/calc/GEE/io/extract_raw_NIRv_at_CA_flux_sites.jsto extract the raw, 20-year NIRv time series at a transect of 3 California FLUXNET sites with divergent phenologies (1 task, 1m runtime). - On GEE, run
src/phen/calc/GEE/io/extract_raw_NIRv_at_other_example_sites.jsto extract the raw, 20-year NIRv time series at the points used in the other raw-versus-fitted supplemental figures (1 task, 1m runtime). - On laptop, run
bash src/phen/viz/model_fitting/plot_all_LSP_fitting_examples.shto produce the four supplemental figures demonstrating the LSP phenocycles fitted to the raw NIRv data at sites across California's Mediterranean climates and in arid Australian outback climates, in Colombian tropical montane climates, and in different land cover types within boreal Saskatchewan (1 task, <5m runtime).
- On laptop, run
python src/phen/eval/modality/map_ann_sem_LSP_modality.pyto calculate the global map of LSP modality, by searching for <=2 peaks to each pixel's fitted, minmax-scaled LSP time series, then calculating the absolute difference between the 2 peaks (or assigning a difference of 1 if only a single peak exists), such that 0s refer to places with 2 exactly equal-height fitted LSP peaks per year, 1s refer to places with only a single peak, and values in between have 2 peaks varying in relative height (1 task, ~1h runtime).
- On laptop, run
python src/phen/viz/video/make_LSP_global_video.pyto produce an mp4 video of one year of global, minmax-scaled LSP variability (1 task, ~3h runtime).
-
On Savio, run
sbatch src/phen/eval/NIRv_SIF_comp/ch3_phen_comparison_job.shto calculate a global map of$R^2$ values between the fitted annual NIRv-based and SIF-based LSP patterns (1 task, ~12h runtime). -
On Savio, run
bash src/phen/eval/NIRv_SIF_comp/ul_NIRv_SIF_phen_comparison_results_to_bdrive.shto upload the resulting GeoTIFF to Google Drive (1 task, ~1m runtime). -
On laptop, run
bash src/phen/eval/NIRv_SIF_comp/dl_NIRv_SIF_phen_comparison_results_from_bdrive.shto download the GeoTIFF to the necessary local directory (1 task, ~1m runtime). 4 On laptop, runpython src/phen/eval/NIRv_SIF_comp/map_NIRv_SIF_LSP_comparison.pyto map the$R^2$ values between the fitted NIRv and SIF LSP datasets (1 task, <1m runtime).
- On laptop, run
Rscript --vanilla src/phen/eval/phenocam/get_all_phenocam_ROIs.rto download a table of all PhenoCam regions of interest (ROIs) across all PhenoCam cameras (1 task, ~1m runtime). - On laptop, run
python src/phen/eval/phenocam/get_all_phenocam_NDVI_data_and_site_metadata.pyto download all available PhenoCam NDVI 3-day summary datasets and to compile a table of all PhenoCam site metadata (1 task, <2h runtime). - On laptop, run
bash src/phen/eval/phenocam/run_phenocam_evaluations.shto run intercomparison with PhenoCam for both NIRv and SIF datasets (1 taks, <10m runtime). - On laptop, run
python src/phen/eval/phenocam/plot_phenocam_eval_results.pyto generate the PhenoCam intercomparison results supplemental figure and combined supplemental table of both NIRv and SIF results (1 task, <1m runtime).
- On laptop, manually download all subset data products (using DownThemAll!) from the FLUXNET network's download page (1 task, runs roughly overnight).
- On laptop, run
bash src/phen/eval/fluxnet/run_flux_evaluations.shto run the flux-tower GPP intercomparison, at all usable FLUXNET2015 sites, for both the fitted NIRv and SIF LSP results (1 task, ~15m runtime). - On laptop, run
python src/phen/eval/fluxnet/combine_fluxnet_val_outputs.pyto combine flux-tower evaluation results for all FLUXNET sites into a single supplemental table (1 task, <10s runtime). - On laptop, run
python src/phen/eval/fluxnet/plot_flux_eval_results.pyto generate the flux-tower intercomparison results supplemental figure (1 task, <1m runtime).
- On laptop, run
python src/asynch/eval/neigh_rad_comp/calc_asynch_r2s_btwn_neighborhood_radii.pyto produce Table S2, containing R2s for all neighborhood radius comparisons and for all variables for which we produced asynchrony maps (1 task, <1m runtime).
- On laptop, run
python src/asynch/eval/NIRv_SIF_comp/compare_SIF_and_NIRv_asynch.pyto compare the two datasets' phenological asynchrony maps across all three neighborhood radii (50 km, 100 km, 150 km) (1 task, <5m runtime).
- On laptop, run
python src/asynch/viz/make_conceptual_asynch_fig.pyto create the asynchrony-calculation conceptual figure (1 task, <1m runtime). - On laptop, run
python src/asynch/viz/plot_all_asynch_maps.pyto create each LSP and climate variable's supplemental figure, displaying the asynchrony maps for all three neighborhood radii (50, 100, and 150 km) (1 task, <30m runtime). - On laptop, run
python src/asynch/viz/make_triangular_asynh_correlation_plots.pyto create a column of triangular plots depicting inter-neighborhood correlations for all variables' asynchrony maps (1 task, <1m runtime).
- On GEE, run
src/asynch/anal/drivers/prep_data/calc_veg_entropy.jsto produce the vegetation cover entropy map that will be used as a covariate in the phenological asynchrony predictive model. (1 task, <10m runtime) - On GEE, run
src/asynch/anal/drivers/prep_data/calc_lulcc_percent.jstwice, first to aggregate the Hansen et al. 2022 global land use and land cover dataset to 1km, then to further aggregate it and summarize it to produce a map of the each pixel's neighborhood mean of the taret-res pixels' percent of subpixels exhibiting LULCC, to be used as a covariate in the phenological asynchrony predictive model. (2 tasks, ~40m and <10m runtimes) - On GEE, run
src/asynch/anal/drivers/prep_data/calc_fire_frequency.jsto produce a map of neighborhood mean fire frequency, to be used as a covariate in the phenological asynchrony predictive model. (1 task, ~10m runtime) - On laptop, manually download the GMTED-derived, 100-km median-aggregated vector ruggedness metric (VRM) dataset from EarthEnv (file 'vrm_100KMmd_GMTEDmd.tif'), then uploade to the 'LSP_ancillary_datasets' folder on Google Drive (1 task, <10s runtime)
- On Savio, run
bash src/asynch/anal/drivers/prep_data/dl_asynch_drivers_files_from_bdrive.shto download both of these covariate maps into the correct directory (1 task, <1m runtime). - On Savio, run
python src/asynch/anal/drivers/prep_data/flip_lulcc_raster.pyto flip the y-dimension CRS metadata on the neighborhood-mean LULCC raster (written out by GEE upside-down, perhaps becaue of inheriting projection from original asset it reads) (1 task, <1m runtime). - On Savio, run
bash src/asynch/anal/drivers/prep_data/cp_asynch_drivers_files_to_rf_data.shto copy all asynch maps into the random forest-modelling data directory (1 task, <1m runtime).
-
On Savio, run
bash src/asynch/anal/drivers/prep_data/prep_NIRv_100km_rf_data_for_model_tuning.shto prep data for random forest analysis using 100 km-neighborhood NIRv asynchrony dataset (1 task, <5m runtime). -
On Savio, run
conda activate asynchto use the R v4.0.3 environment we've been using since the start, runsrun ...to create an interactive job on the savio3 partition, then start an R session and interactively run the code insrc/asynch/anal/drivers/run_rf/run_phen_asynch_rf.r, withasynch.varset to 'NIRv',neigh.radset to '100', andcoords.as.covarsset to 'y' (by switching the commenting on the corresponding lines at the top of the file), to execute the random forest analysis on the main phenological asynchrony dataset (NIRv-based LSP asynchrony calculated within 100 km radial neighborhoods) and save hyperparameter tuning results. (Be sure to interactively execute the code blocks captured by theif (F){ ... }blocks, which will save the hyperparameter-tuning results as CSVs and the Boruta feature selection results as a boxplot PNG.) Manually inspect the results, then use them to set the hyperparameters and variable inclusion to be used in all 12 random forest models (2 LSP datasets × 3 neighborhood radii × 2 options for including/excluding geographic coordinates as covariates) (set these in the code block starting at line 410 and 486) (1 task, ~3h interactive runtime). -
On Savio, call
conda activate asynchto load the conda environment that has R v4.0.3 and the necessary packages, then runsbatch src/asynch/anal/drivers/run_rf/ch3_rf_job.shto loop over vars (NIRv, SIF) and neighborhood radii, each time prepping data layers, running the random forest analysis, and generating identical results (1 task, ~27h runtime). -
On Savio, run
sbatch src/asynch/anal/drivers/summ_results/ch3_rasterize_SHAP_job.shto convert output CSVs of global SHAP values to GeoTIFFs (1 task, ~9h runtime). -
On Savio, run
sbatch src/asynch/anal/drivers/summ_results/ch3_rasterize_err_job.shto convert output CSVs of global RF prediction errors to GeoTIFFs (can run synchronously with previous step) (1 task, ~1h runtime). -
On Savio, run
bash src/asynch/anal/drivers/summ_results/ul_asynch_drivers_results_to_bdrive.shto upload all drivers-modeling results to Google Drive (1 task, <1 h runtime). -
On laptop, run
bash src/asynch/anal/drivers/summ_results/dl_asynch_drivers_results_from_bdrive.shto download all drivers-modeling results into the appropriate directory on the external hard drive (1 task, <1h runtime). -
On laptop, run
python src/asynch/anal/drivers/summ_results/tabulate_model_summaries.pyto combine all permuation-based and SHAP-based importance values and model $R^2$s and MSEs into a single output table, for supplmental materials (1 task, <1m runtime). -
On laptop, manually copy the contents of each of the three sheets in both of the .xlsx files created by the last step into their corresponding sections in
res/tabs/TAB_model_summary_table_full.xlsxusing LibreOffice Calc, save the file, then save a static image of the table (by manually selecting the entire tan boxed area, clicking "File" > "Export...", checking the 'Selection' checkbox in the bottom left corner, selecting .png as the file type, clicking "Save", clicking the "Modify resolution" radio-button on the pop-up and setting the resolution to 500 pixels/inch, then clicking the "Modify dimensions" radio button, setting the width to 10 inches and letting the height auto-adjust, then clicking "OK"). -
On laptop, run
bash src/asynch/anal/drivers/make_figs/make_all_asynch_analysis_figs.shto produce the main asynch figure (global map, as well as map summarizing the predominance of the two top-importance covariates), as well as the supplemental figures showing the random forest error map and the map of SHAP-value predominance across all random forest covariates (1 task, <10m runtime).
- On laptop, run
python src/asynch/anal/isoclim/compare_phen_clim_geog_dist.pyto run all iterations of the analysis of the latitudinal trend in the phenological distance~climatic distance relationship and produce the analysis summary figure (NOTE: analysis saves results for each paramater-combination as it runs, so can be stopped and restarted if necessary and will pick up where it left off.) (1 task, ~26h runtime).
- On laptop, run
python src/asynch/anal/phen/get_all_inat_plant_phen_taxa.pyto save a table of the counts of all phenology-annotated and valid (i.e., native, non-captive, research-grade, with positional accuracy ≤ 1000 m) observations for all iNaturalist taxa with at least one observation. (~5 min runtime; last run 2024-06-05T11:24:00UTC) - On laptop, run
python src/asynch/anal/phen/get_all_inat_phen_obs_and_fit_npeaks.pyto download phenology observation histograms and raw observations for all iNat taxa with at least 50 phenology-annotated and otherwise valid observations, fit alpha hulls to the first ≤ 5000 observations (α = 0.75, the mid-value from the climate-dependence analysis above), and estimate the number of peaks (0, 1, or 2) in their flowering-week histograms (using a KDE with bandwidth = 5, a peak detection algorithm detecting all peaks ≥ 60% of the max histogram height, and significance of the number of peaks being estimated using a 100-permutation test). (~4 day runtime; run started about 2024-06-05T23:00:00UTC and ran until about 2024-06-09T00:00:00UTC, with a couple overnight stops to avoid excessive obstacles because of API rate-limiting; script has try/except blocks that seem to handle API rate limiting in an unsophisticated but technically passable way, but it still works best if it is stopped for some hours when 429 errors become too common, which is fine becuase it stashes results along the way and picks up wherever it left off) - On laptop, run
python src/asynch/anal/phen/fit_inat_flow_phen_LSP_MMRR_models.pyto run an MMRR, predicting flowering observation-time distance as a function of geographic and LSP distances, for all iNat taxa with non-unimodal flowering-week histograms (i.e., 0 or 2 peaks) (1 task, ~10h runtime). - NOTE: The results of the previous step will be visualized by the last step in the section 'run genetic analyses', below.
- On laptop, work through the steps at the top of src/asynch/anal/phen/comparar_LSP_y_cosecha_cafetera_colombiana.py to digitize a set of sampling points for each of the four harvest region colors in the map in Figure 2 of Mujeres en la caficultura tradicional colombiana, 1910-1970 and download the resulting CSV files.
- NOTE: The analysis is src/asynch/anal/phen/comparar_LSP_y_cosecha_cafetera_colombiana.py will be executed by the last step in the section 'run genetic analyses', below.
- On laptop, download the supplemental data in Dryad from Thomé et al. 2021, the only genomic test of the asynchrony of seasons hypothesis (ASH).
- On laptop, run
Rscript --vanilla src/asynch/anal/gen/rhinella/convert_STRUCTURE_to_gendist_mat.rto produce a genetic distance matrix for the Rhinella granulosa genetic data (i.e., the Thomé et al. supplemental data). - On laptop, manually compile the sample locations (from the supplemental data in Zenodo) and FASTA-format sequences (from NCBI, based on sample voucher IDs in the supplemental data) for all samples of Xiphorhychus fuscus, the only species in Quintero et al. 2014 (a multi-species test of the ASH using archived microsatellite data) that is sympatric with the Rhinella granulosa, the eastern Brazilian toad studied by Thomé et al.
- On laptop, run
bash src/asynch/anal/gen/xiphorhynchus/align_xiphorhynchus_fuscus.shto use MAFFT v7.520 to align all raw X. fuscus sequence data. - On laptop, run
Rscript --vanilla src/asynch/anal/gen/xiphorhynchus/calc_gen_dist_mat_xiphorhynchus_fuscus.rto produce a genetic distance matrix from the aligned Xiphorhynchus fuscus sequences. - On laptop, run
bash src/asynch/anal/plot_flowphen_landgen_cafe_results.shto execute both of the landscape genetic analyses (for R. granulosa and X. fuscus) and visualize the results, as well as running and visualizing the Colombian coffee harvest analysis and visualizing results from the iNaturalist flowering phenology analysis (1 task, ~30m runtime).
-
On laptop, run
python src/data_prep/data_viewer/prep_data_for_GEE_data_viewer.pyto produce a single GeoTIFF containing all of the main mapping results (coefficients, in bands 1-5; EOFS, in bands 6-9; folded EOFS, for visualization only, in bands 10-12; 100-km asynchrony, in band 13; and the asynch P-value,$R^2$ , and sample-size maps, in bands 14-16) (1 task, <5m runtime). - On GEE, upload the resulting GeoTIFF as a new asset, setting the Asset ID to 'Terasaki_Hart_2024_LSP_main_map_results' in the cloud project 'lyrical-ring-231401', setting the pyramiding policy to MEAN (works fine for every layer except the last one, which is not of major interest), and setting the masking mode to None (because missing values are already NaN within the file) (1 task, ~30m runtime).
- On GEE, once the asset is uploaded, make it publicly accessible.
- On GEE, use the 'Get Link' button to create a URL pointing to the script saved at 'src/phen/viz/data_viewer/viz_results_GEE_APP.js' (saved at a different path within my GEE cloud project's repo), then copy that URL to the appropriate place at the top of this README.
- On GEE, interested individuals can now use the link at the top of this README to navigate to a very basic data viewer for our main mapping results.
- Compose multipanel items (all items except 3, 8, and 10) in Google Drawings.
- Download as PDF at appropriate pixel dimensions.
- Use
pdf2toppmto convert to 300-DPI PNGs.
- laptop:
- Pop!_OS 22.04 LTS (except Ubuntu 18 was used for SIF data prep and evaluation)
- Bash 5.1.16 (except Bash 4 was used for SIF data prep and evaluation)
- GDAL 3.3.1
- Python 3.9 (except Python 3.7 was used for SIF data prep and evaluation)
- numpy 1.22.4
- rasterio 1.2.10
- xarray 2022.3.0
- rioxarray 0.15.0
- pandas 2.2.1
- shapely 2.0.3
- geopandas 0.14.3
- zipfile36 0.1.3
- tensorflow 2.4.1
- pyproj 3.6.1
- cartopy 0.20.2
- geopy 1.13.0
- scipy 1.13.0
- sklearn 1.0.2
- alphashape 1.3.1
- rasterstats 0.16.0
- statsmodels 0.13.2
- seaborn 0.11.2
- fuzzywuzzy 0.18.0
- Bio 1.79
- pyinaturalist 0.19.0
- json 2.0.9
- nlmpy
- matplotlib 3.7.0
- h3 3.7.4
- descartes
- contextily 1.2.0
- xyzservices 2022.4.0
- palettable 3.3.0
- cmocean 2.0
- cmcrameri 1.8
- colormap 1.0.4
- imageio 2.19.0
- cv2 4.9.0
- R 4.0.5
- adegent 2.1.5
- ape 5.6.2
- phenocamapi 0.1.5
- MAFFT 7.520
- LibreOffice 7.3.7.2 30(Build:2)
- UC Berkeley Savio Cluster (savio3 partition):
- Scientific Linux 7.9
- Bash 4.2.26
- GDAL 2.2.3
- Python 3.7
- numpy 1.21.5
- rasterio 1.1.5
- xarray 0.20.2
- rioxarray 0.9.1
- pandas 1.3.5
- geopandas 0.8.1
- json 2.0.9
- tensorflow 2.3.1
- affine 2.3.0
- haversine
- eofs 1.4.0
- scipy 1.4.1
- sklearn 0.21.3
- matplotlib 3.1.1
- R 4.0.3
- dplyr 1.0.8
- rgdal 1.5.18
- sp 1.4.6
- sf 0.9.7
- raster 3.4.5
- maps 3.4.0
- rsample 0.1.1
- ranger 0.13.1
- Boruta 7.0.0
- fastshap 0.0.7
- vip 0.3.2
- pdp 0.7.0
- ggplot2 3.3.5
- ggthemes 4.2.4
- grid 4.0.3
- RColorBrewer 1.1.2
- Julia 1.4.1
- Distributed
- OrderedCollections 1.4.1
- StaticArrays 1.3.5
- Glob 1.3.0
- TFRecord 0.1.0
- JSON 0.21.3
- ArchGDAL 0.7.4
- Distances 0.10.7
- NearestNeighbors 0.4.9
- Statistics
- StatsBase 0.33.16
- GLM 1.6.1
- Colors 0.12.8
- GEE Javascript API (browser-hosted):
- GEE API ≥0.1.404
