Skip to content

kpd19/Two_Pathogen_Evolution

Repository files navigation

Summary

This library contains code used in the manuscript titled Synthesizing Selection Mosaic Theory and Host-Pathogen Theory to Explain Large-Scale Pathogen Coexistence. This paper uses data collected from field experiments and observational studies to explore the mechanisms supporting the coexistence of two viral morphotypes across populations of the Douglas-fir tussock moth. The model and analysis are implemented in R and Julia.

Requirements and Setup

The code was built using R version 4.3.2 and Julia version 1.7.2. R can be downloaded here and Julia can be downloaded here. The code requires several packages that are not part of the base R installations. After installing R, navigate to the main repository directory and run the installation.R script to install necessary packages. After installing Julia, run the installation.jl script to install necessary packages. The project also uses the Stan programming language, Stan version 2.32.2, to estimate transmission parameters. Stan can be downloaded here.

Neonate Dose Response Experiment

In the NeonateDoseResponse directory, the neonate_dose.R script analyzes the two dose response experiments performed before the transmission experiment. Data from the experiment performed in 2020 is found in dose_response_2020_wide.csv, data from 2021 is found in dose_response_2021_wide.csv. The number of insects for each dose in the experiment for 2020 is found in starting_values.csv. ${\color{DarkOrchid}{\textbf{Figures S7 and S8 in SI}}}$

Fitting Bayesian model to field transmission experiment results

In the TransmissionExperiment directory, the script fit_transmission_models.R uses the Stan programming language to fit Bayesian hierarchical models to the data from our field experiments, contained in STAN_input_data.csv. Examples of the output can be found in output and figures. Each Stan model should take approximately 1 minute to run. The script has plots for analyzing the trace, R_hat, and posterior distributions, which are within the functions.R script. The fitting script also calculates the leave-one-out cross-validation (LOO-CV) criterion and sum of squared errors (SSE) for each model. The script also plots the model and the data for the best model determined by LOO-CV and SSEs at the morphotype and isolate level. ${\color{DarkOrchid}{\textbf{Figure 3A, Figures S9, S10 and S11 in SI, Table S6 and S7 in SI}}}$

The script thinned_posterior.R draws 225 parameter sets from the posterior distribution for the best model. The script noevo_posteriors.jl calculates the fraction infected at the end of a single epizootic over a range of initial host densities for the single-pathogen model without host evolution. The fraction infected results are analyzed in thinned_posterior.R. ${\color{DarkOrchid}{\textbf{Figure 3}}}$

Morphotype frequency and forest composition from Forest Inventory Analysis

In the MorphotypeFrequency directory, the spatial dataset on morphotype frequency data from the literature and our field collections was combined in the aggregating_morphotype_data.R script in R. We tabulated the total number of isolates identified as the multi-capsid morphotype or the single-capsid morphotype as a result of our PCR analyses. The code includes analysis of coinfections. ${\color{DarkOrchid}{\textbf{Table S3 in SI, Figure S1, S2, S3, and S4 in SI}}}$

To calculate the percent of the forest that was made up of each tree species, we used the National Forest Type Dataset credated by the Forest Service Inventory and Analysis (FIA) program. The raster data can be downloaded here. In the script forest_composition.R, for the United States sites, we calculated the percent of each tree species in a 5 kilometer radius around each field site. For sites that had no trees identified in the FIA dataset, we extended the radius to 10 kilometers. The script then bins the percent that is composed of Douglas-fir to the size of spatial grid used in our mechanistic model, which has 37 patches.

The script ConceptualMNPV.R plots the morphotype frequency binned by percent Douglas-fir for the conceptual graphs in ${\color{DarkOrchid}{\textbf{Figure 1}}}$ .

The script morph_douglas_glm.R compares a generalized linear model there the percent of the multi-capsid morphotype is a function of percent Douglas-fir at each site to a model where there is no relationship to forest composition using AIC analysis. ${\color{DarkOrchid}{\textbf{Figure S6 in SI}}}$

The script map_figure.R visualizes the morphotype distribution on a map with distributions of Douglas-fir and Abies species using shapefiles. ${\color{DarkOrchid}{\textbf{Figure 2, Figure S5 in SI}}}$

Running the parallelized fitting routine with Message Passing

The ModelFitting directory contains the scripts for running the line search routine and increased realizations for the evolution model with a selection mosaic, the model without the selection mosaic, and the model without host evolution. These scripts are highly parallelized using the Message Passing Interface (MPI) and are only designed to be executed on a computing cluster. The realization scripts have been adapted to run only one value for percent Douglas-fir. To parallelize and run as a batch on slurm, the line that says idx = 18 can be changed to idx = parse(Int64, ENV["SLURM_ARRAY_TASK_ID"]). The scripts have also been adapted to run only one parameter set for 10 realizations as an example, instead of the top 30 parameter sets for 2000 realizations. Examples of the line search output for evolution model with a selection mosaic are in linesearch_op/op1: the output for all values tried are in tried_morep1.csv and the best recorded likelihood score as each round progresses is best_morep1.csv. Examples for 10 realizations from the realization output are in realization_op/op1/: the likelihood values for all the data points that correspond to forests of that percent Douglas-fir are ll/ndoug19_p1_sig0.5.csv and the population values over time are in all/ndoug19_p1sig0.5.csv. The naming convention for this example indicates this is the first parameter set for the top 30 parameter sets from the line search, epsilon = 0.5 for environmental stochasitcity, and the model is performed on a hexagonal grid where is 19 out of 37 trees are Douglas-fir. The distances for all the grid coordinates for hexagonal grids with a radius of 2-8 are included in data. The hexagonal grids are made with the hexagonal_grid_making.R script. Examples for the evolution model without a selection mosaic are in realization_op/op_nt1/ and examples for the model without host evolution are in realization_op/ne1/ The top 30 parameter sets for each model as a result of the line search routines are in the data directory.

This directory also contains a version of the non-spatial evolutionary model in the non_spatial.jl script. This script runs a one patch version of the model on either Douglas-fir or grand fir. The time series output of the model, which is plotted in plot_non_spatial.R, can be found in the figures directory. The time series are the fluctuations in host and pathogen populations, mean infectiousness, and fraction infected over 200 generations of the host-pathogen model.

Model comparison

The ModelComparison directory contains the script analze_realizations.R, which reads in the line search results, selects the top 30 parameter sets from each model type, and then calculates the likelihood and percent multi-capsid morphotype for the round of increased realizations for each model type. The script model_comparison.R, which compares the results of all models tested using AIC analysis, which are reported in data/aic_table.csv. The likelihood values for the parameter sets for each model are contained in the data directory. The data directory also contains the results of the fraction infected by the multi-capsid morphotype for each parameter set for each model. The script plots the percent multi-capsid morphotype as a function of percent Douglas-fir, which are shown in figures directory. ${\color{DarkOrchid}{\textbf{Figure 4 in main text, Figure S14 in SI}}}$

Analyzing simulations for best parameter set

The script simulations_over_pdoug.R in the AnalyzeSimulations directory analyzes the population dynamics for the best parameter set from the AIC analysis for a single example stochastic realization. The figure shows the host and pathogen population dynamics, the fluctuating transmission risk, the fraction infected, the summary of fraction infected by each morphotype for high infection years, and phase diagrams over time. For the phase diagrams, we also include a deterministic realization to show the difference in dynamics between including stochasticity and not. The realization data is included in the data directory as t6s_4k.csv for 4000 years of a stochastic simulation and t6s_2k_deterministic.csv for 2000 years of a deterministic simulation. ${\color{DarkOrchid}{\textbf{Figure 5}}}$

The script single_pathogen.R script analyzes the transmission risk and host population dynamics over time for the best population dynames over three cases: both pathogens present, single-capsid morphotype only present, and multi-capsid morphotype only present. The script uses a function to find the maxmimum host outbreak size and the package WaveletComp to find the periods between the host outbreaks. The script plots an example of the fluctuations in average transmission risk over time and the distribution in host outbreak sizes for each case.

The script non_evolution_simulations.R plots the host and two pathogen dynamics for simulations without eco-evolutionary dynamics for the best parameter set over varying amounts of environmental stochasticity. The file simulation_no_evo.csv contains the simulated data. ${\color{DarkOrchid}{\textbf{Figure S12 in SI}}}$

The script vary_phi.R plots the dynamics and variation in morphotype frequency when the overwintering rate of the pathogen (φ) is varied. The file phi_pmnpv.csv is the frequency of the multi-capsid morphotype over varying Douglas-fir frequencies and the file phi_dynamics.csv are example dynamics across the range of φ. ${\color{DarkOrchid}{\textbf{Figures S15 and S16 in SI}}}$

Biocontrol

The Biocontrol directory contains scripts to analyze the effects of using virus as a biopesticide on host population and transmission risk. The script biocontrol.jl creates simulations for a range of biopesticide levels at different probabilities of spraying and thresholds for cases with varying ratios of each morphotype. The script threshold_and_amount.R analyzes the different levels of biopesticide applied and host density thresholds that trigger a spray. The script biocontrol_comparison.R . The script biocontrol_freq.R analzes the effects on transmission risk and host population sizes when the likelihood of spraying biopesticide is varied. ${\color{DarkOrchid}{\textbf{Figure 6 in main text, Figures S20, S21, and S22 in SI}}}$

Host Mobility

To explore the effects of dispersal of first instars on the eco-evolutionary model, we built a two patch model parameterized with our best parameter estimates and varied the distance between the two populations. The script hexagonal_grid_huge.R creates the distance matrix for a system of two patches at varying distances. The script host_mobility.jl runs the model, which is very similar to the regular model with the exception of using the two patch distances and writing a .csv for the proportion dispersing between all populations in the closed system. The script proportion_dispersing.R calculates the proportion dispersing between the two populations over each patch distance. The script host_mobility.R analyzes the output of the model. We compared the effects of two extremes, 5% Douglas-fir and 95% Douglas-fir, to see how host mixing affected the average infectiousness for both morphotypes. Although some of our populations sampled in our dataset are within a few kilometers of each other, nearby populations have very similar forest compositions, so we do not believe including dispersal between populations is warranted because Douglas-fir tussock moths are dispersal limited for distances above 200m. We include these results to provide more information about the model, which could be relevant in selection mosaics with very different selection environments close to each other. Additionally, our model could be adjusted to explore systems with higher host mixing. ${\color{DarkOrchid}{\textbf{Figures S13, S17 and S18 in SI}}}$

Generalist Predator

The GeneralistPredator directory includes the script to run the model with the best parameter set with the inclusion of a generalist predator, ode_rlzns_predator.jl. The script generalist_predators.R analyzes the example simulation data for high and low attack rates and computes the % multi-capsid morphotype for varying rates of maximum fraction killed (a) and the hsot population density at which the fraction killed is maximized (ω). ${\color{DarkOrchid}{\textbf{Figure S19 in SI}}}$

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors