Skip to content
Matteo Turilli edited this page Oct 22, 2018 · 8 revisions

Experiments

Resource: Titan

Queue policies in #cores instead of #nodes:

Bin Min Cores Max Cores Max Walltime (Hours) Aging Boost (Days)
1 180000 24 15
2 60000 179984 24 5
3 5008 59984 12 0
4 1016 4992 6 0
5 16 1000 2 0

Workload:

  • Executable: Gromacs compiled against OpenMPI v??
  • Executable runtime: ~15 minutes
  • Physical system: ??
  • Input files: ??

Software:

Runs:

  • #units: >= 65,536
  • #cores: <= 299,008
  • walltime: 1 hour total

Weak Scalability

Definitions:

  • Weak scalability: how the solution time varies with the number of processors for a fixed problem size per processor
  • Ideal Runtime: the time taken by the workload to run on the given resources without any overhead
  • Walltime: the walltime asked for the pilot
  • Max Walltime: the maximum walltime that can be asked for the given size of the pilot
N runs N tasks N core/task N generations N pilot N core/pilot Ideal Runtime Walltime Max Walltime Resource
2 32 32 1 1 1024 15m 1h 1h Titan
2 64 32 1 1 2048 15m 1h 6h Titan
2 128 32 1 1 4096 15m 1h 6h Titan
2 256 32 1 1 8192 15m 1h 12h Titan
2 512 32 1 1 16384 15m 1h 12h Titan
2 1024 32 1 1 32768 15m 1h 12h Titan
2 2048 32 1 1 65536 15m 1h 24h Titan
2 4096 32 1 1 131072 15m 1h 24h Titan
2 8192 32 1 1 262144 15m 1h 24h Titan

Weak Scalability with BoT Scheduler (Sep 2018)

N runs N tasks N core/task N generations N pilot N core/pilot Ideal Runtime Walltime Max Walltime Resource
2 256 32 1 1 8192 15m 1h 12h Titan
2 2048 32 1 1 65536 15m 1h 24h Titan

Strong Scalability

Definitions:

  • Strong scalability: how the solution time varies with the number of processors for a fixed total problem size
  • Ideal Runtime: the time taken by the workload to run on the given resources without any overhead
  • Walltime: the walltime asked for the pilot
  • Max Walltime: the maximum walltime that can be asked for the given size of the pilot
N runs N tasks N core/task N generations N pilot N core/pilot Ideal Runtime Walltime Max Walltime Resource
2 16384 32 32 1 16384 480m (8h) 12h 12h Titan
2 16384 32 16 1 32768 240m (4h) 6h 12h Titan
2 16384 32 8 1 65536 120m (2h) 3h 24h Titan
2 16384 32 4 1 131072 60m (1h) 2h 24h Titan
2 16384 32 2 1 262144 30m (0.5h) 2h 24h Titan

Actual Wworkload

The above runs have been performed with a Synapse drop-in replacement. To close out, the largest runs from weak and strong scaling are now verified with an actual GROMACS workload:

N runs N tasks N core/task N generations N pilot N core/pilot Ideal Runtime Walltime Max Walltime Resource
2 16384 32 2 1 262144 30m (0.5h) 2h 24h Titan
2 8192 32 1 1 262144 15m 1h 24h Titan