Skip to content

Recommended deployment strategy? #1

@nrminor

Description

@nrminor

Hello,

I've been reviewing WEPP since I came across it on the SPHERES slack last week. I think the idea of calling haplotypes against an UShER MAT is really exciting, and I'd like to try it on some of our own wastewater datasets.

That said, I'm finding myself a bit puzzled with the expected deployment strategy. It looks like you generally recommend running snakemake with 32 cores, and from what I can tell, the C++ code appears to bring entire alignment map datasets into memory. To my mind, this seems intensive enough that I'd want to put it on our HPC cluster. But then snakemake itself launches the dashboard frontend using nginx on a privileged port, a persistent node.js server, and direct web browser access to localhost, all implying that WEPP should instead be run locally.

Given that and how tightly coupled the C++ build, the data processing, and the dashboard are, would you recommend that users only run WEPP on high-end personal workstations (32+ cores, 32GB+ RAM, sudo access)? I would try running on my HPC with DASHBOARD_ENABLED=False and then copy all the snakemake outputs to a local clone of WEPP, but then I need to compile WEPP a second time on a different target, and because build/wepp is an input dependency for multiple Snakemake rules, I worry that that would cause much of the workflow to re-run locally--all just to run the dashboard!

Hopefully I'm just overthinking this and there's a better way. Thanks again for your work on this and happy to clarify anything further,

--Nick

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions