This repository contains all data (except raw sequencing files) and scripts associated with the study:
Rosenbaum, Torabi-Marashi et al. 2025. Evolution of sociability: Genome scans and gene validation. Evolution Accepted.
Raw Sequence data for this study is available on NCBI SRA bioproject PRJNA1311514.
A static version of this repository (frozen at time of final acceptance) is available on Figshare. Please note: figshare has a complete copy of all of the .sync files (file sizes are too big for github).
Please note: If you wish to use this pipeline for the genomic analysis, and you are new to such pipelines, before using the scripts before, you will likely find the README for the data and scripts associated with Audet et al. 2024 a simple overview of the pipeline. Then the scripts below will likely be more useful for replicating the specific analyses from this study.
JR_SociabilityScores_AllCrosses_formatted.csv Behavioural data (sociability) for the RNAi experiments to test candidate genes. These are analyzed in the script JR_SA_SociabilityRNAiCrosses_March2025_ID.Rmd
Day: Day of experiment within an experimental block Time: Time (in 15 minute intervals) of phenotyping. So 1 corresponds to the first measure at 15 minutes into the behavioural assessment, 2 is 30 minutes etc...
Arena: The arena (unit of sampling) unique to a trial within a day.
Treatment: RNAi or control treatment
Soc_index: Measure of social aggregation
ExperimentalBlock: Blocking variable for the entire experiment (across all RNAi strains)
StrainID: Bloomington Drosophila Stock Center identifer
Additional meta-data for the experiment is in file Construct_meta_data_JR_correct_EXPblocks.csv.
stock_number: Bloomington Drosophila Stock Center identifer
TestDates: Beginning date for experiment (corresponding to Day 1)
TRiP: Whether the RNAi strain was from the TRiP or VDRC collection (for this study all were TRiP).
vector: Which vector was used to make this particular strain
genes: gene name
FBgn: Flybase gene identifier
Insertion location: Location of transgene insertion
Insertion Chromosom: Chromosomal location of transgene insertion
full_strain_genotype: Formal genotypic information
Notes: Any other notes about the strains either from Bloomington or our own work.
.sync files of allele counts and coverage by samples. Because of file size limitations only the .sync files for the X chromosomes are part of this repo. Sync files for autosomes need to be regenerated from the analysis pipeline.See below for more information on how they are generated (mostly via the shell scripts) and the Rscripts for analysis. Alternatively, go to the static version of the repository on figshare which has all of the sync files.
All .xlsx files beginning with "JR_" as well as rg_JRR_5122025_data.xls are the output from the qPCR runs. These are analyzed in the script SociabilityPopGen_qPCR_results_JRR.R
SnpEff outputs and bed files from CMH and FST outputs (to be used for SnpEff) in all contrasts.
Plotting text file to pull relevant gene names. Note that anywhere in the R scripts where alleleF_plotting.txt is being used, that is just snp_eff.txt for a given contrast, just with the snpeff header removed.
gene lists from all analyses as well as overlapping with gene lists found in Torabi-Marashi et al. 2025
JR_SA_SociabilityRNAiCrosses_March2025_ID.Rmd - Analysis of behavioural data for RNAi experiments
alleleFrequencyPlottingLoop__Sept8_2025.Rmd - Allele Frequency plotting loop to plot identified SNPs into single PDF
cmh_updated.R - CMH test using the ACER package in R (split between autosomes and X chromosome)
dokb_snps_AF_plotting.Rmd - alternate Allele Frequency plotting for SNPs in dokb gene
Figure1_allele_frequency_plotting.R - alternate Code used to help generate allele frequency plot in figure 1
fst_functions.R - helper functions to plot FST
gene_curations.R - overlaps with Torabi-Marashi et al 2025 and between FST and CMH
geneCurations_popGen_PostSubmit.R - overlapping genes with other studies
overlapSimulation.R - simulation to test how likely the number of observed genes would overlap by chance
read_sync_SpaceDelimited.R - helper function to read sync files with space delimitation
revisions_paper_FST_Plot.R - Plotting FST
updated_all_points_dokb_snps_AF_plotting.Rmd - Allele Frequency plotting for SNPs in dokb gene
updated_All_points_Figure1_allele_frequency_plotting.R - Code used to help generate allele frequency plot in figure 1
Unix files to generate from BAM to sync files. All scripts are light and their titles are self explanatory to their function
snp_eff_cmh_fst_overlaps - SnpEff outputs and bed files from CMH and FST outputs (to be used for SnpEff) in all contrasts
Note that anywhere in the R scripts where alleleF_plotting.txt is being used, that is just snp_eff.txt for a given contrast, just with the snpeff header removed
Unix files for genomics analysis going from raw .fastq files to generate BAM to sync files. Titles are self explanatory to their function.
The relative order of the scripts (from .fastq to .bam to .sync) is:
fastqc_dna.sh
trim_rename.sh
trim_fastqc.sh
index_bwa.sh
bwa_map.sh
sam_to_bam.sh
filter.sh
bam_fastqc.sh
bam_quality_check.sh
extractCoreGenomes.sh
sort_byName.sh
sort_byCoord.sh
fixmated.sh
markdup.sh
picard_RG.sh
gatk_mark_indels.sh
gatk_realign.sh
ANC_merge.sh
U_merge.sh
C_merge.sh
D_merge.sh
make_mpileup.sh
run_PoolSNP.sh
repeat_masker.sh
detectIndels_Kapun.sh
mp2sync.sh
subset_syncByVCF.sh
run_fst.sh
local_snp_eff.sh
snpEff.sh
subset_vcf_byBed.sh
cmh_updated.R - CMH test using the ACER package in R (split between autosomes and X chromosome)
revisions_paper_FST_Plot.R - Plotting FST
fst_functions.R - helper functions to plot FST
alleleFrequencyPlottingLoop__Sept8_2025.Rmd - Allele Frequency plotting loop to plot identified SNPs into single PDF
gene_curations.R - overlaps with Torabi-Marashi et al 2025 and between FST and CMH
geneCurations_popGen_PostSubmit.R - overlapping genes with other studies
overlapSimulation.R - simulation to test how likely the number of observed genes would overlap by chance
read_sync_SpaceDelimited.R - helper function to read sync files with space delimitation
updated_all_points_dokb_snps_AF_plotting.Rmd - Allele Frequency plotting for SNPs in dokb gene
updated_All_points_Figure1_allele_frequency_plotting.R - Code used to help generate allele frequency plot in figure 1
dokb_snps_AF_plotting.Rmd - alternate Allele Frequency plotting for SNPs in dokb gene
Figure1_allele_frequency_plotting.R - alternate Code used to help generate allele frequency plot in figure 1