Skip to content

Simulation methods usage #1

@salsalsal97

Description

@salsalsal97

Hello,

I have sent an email about this but thought it would make sense to post a GitHub issue as well.

By introduction, I am an MPhil student at Imperial College, London, under the department of neuroscience (Faculty of Medicine). My project is funded by the UKDRI, and am working under Professor Nathan Skene (https://www.neurogenomics.co.uk/ – link to our lab’s page). The main goal of my project is centred around power analysis of single-cell RNA sequencing data (when it comes to pseudobulk-based differential expression analysis), and I have thus been looking at different simulation methods (given that most existing power calculators take a simulation-based approach in computing power for datasets). Ultimately, I aim to develop a novel power calculation tool for such datasets.

The reason I am writing to you today is that I read your paper recently, and found it really interesting. Your conclusions are very useful for my project, since at the current stage of my project, I am exploring different simulation methods and generating datasets using these, and obtaining power estimates in each case. What I wanted to enquire from you is that:

  • So essentially, I want to use the “top simulators” for my purpose (to generate synthetic scRNA-seq data, run DGE analysis and estimate power). Your paper had quantified and ranked simulators in two ways – based on how much of the information they recapitulate from the reference (experimental) dataset, and how they perform when it comes to benchmarking of various methods. In my case, since ultimately I want to be able to say something about the power of (real) scRNA-seq datasets, I believe I would need the top ranked simulators in the former category. I was thinking of selecting about 6-8 of the top ones and using these – according to your paper, the top 6 were: ZINB-WaVE, scDesign2, muscat, SCRIP, SPsimSeq, and scDesign. Therefore, just wanted to ask – would you say it’s reasonable to use these 6 for my purposes? Or would you suggest others as well?
  • Now the main thing I am unsure about – I am not too sure how exactly you used these simulators to generate the datasets you used. For instance, I am aware that all your relevant scripts are on GitHub (https://github.com/HelenaLC/simulation-comparison), but I am having difficulty in understanding how to run these. For example, in the case of ZINB-WaVE, I would think the relevant scripts are https://github.com/HelenaLC/simulation-comparison/blob/master/code/03-est_pars-ZINB-WaVE.R and https://github.com/HelenaLC/simulation-comparison/blob/master/code/04-sim_data-ZINB-WaVE.R, and I would think they would be used sequentially (one after the other), but I am not quite sure how exactly to run these and what input they actually take. So I wanted to ask – could you possibly guide me in this? For the top 6 methods I mentioned, could you possibly please explain what steps you took to generate the synthetic data? It would be amazing if you could also share the scripts you used (if any that aren’t present already in the GitHub).
  • Finally, I wanted to ask – for the benchmarking of different methods, you had also carried this out for differential expression analysis. Wanted to ask, for this purpose, did you use edgeR, or DESeq2? Or a different package? And is the code for this available anywhere? I couldn’t seem to locate it on GitHub so was just wondering.

I would really appreciate if you could let me know regarding my queries, as this will be extremely beneficial for my project!

Many thanks,
Salman

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions