-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Matteo Turilli edited this page Oct 22, 2018
·
8 revisions
Resource: Titan
- Cores: 299,008
- Nodes: 18,688 (16 cores)
- Queues and queue policies
- Login node: userid@titan.ccs.ornl.gov
- DTN node: userid@dtn.ccs.ornl.gov
Queue policies in #cores instead of #nodes:
| Bin | Min Cores | Max Cores | Max Walltime (Hours) | Aging Boost (Days) |
|---|---|---|---|---|
| 1 | 180000 | — | 24 | 15 |
| 2 | 60000 | 179984 | 24 | 5 |
| 3 | 5008 | 59984 | 12 | 0 |
| 4 | 1016 | 4992 | 6 | 0 |
| 5 | 16 | 1000 | 2 | 0 |
Workload:
- Executable: Gromacs compiled against OpenMPI v??
- Executable runtime: ~15 minutes
- Physical system: ??
- Input files: ??
Software:
Runs:
- #units: >= 65,536
- #cores: <= 299,008
- walltime: 1 hour total
Definitions:
- Weak scalability: how the solution time varies with the number of processors for a fixed problem size per processor
- Ideal Runtime: the time taken by the workload to run on the given resources without any overhead
- Walltime: the walltime asked for the pilot
- Max Walltime: the maximum walltime that can be asked for the given size of the pilot
| N runs | N tasks | N core/task | N generations | N pilot | N core/pilot | Ideal Runtime | Walltime | Max Walltime | Resource |
|---|---|---|---|---|---|---|---|---|---|
| 2 | 32 | 32 | 1 | 1 | 1024 | 15m | 1h | 1h | Titan |
| 2 | 64 | 32 | 1 | 1 | 2048 | 15m | 1h | 6h | Titan |
| 2 | 128 | 32 | 1 | 1 | 4096 | 15m | 1h | 6h | Titan |
| 2 | 256 | 32 | 1 | 1 | 8192 | 15m | 1h | 12h | Titan |
| 2 | 512 | 32 | 1 | 1 | 16384 | 15m | 1h | 12h | Titan |
| 2 | 1024 | 32 | 1 | 1 | 32768 | 15m | 1h | 12h | Titan |
| 2 | 2048 | 32 | 1 | 1 | 65536 | 15m | 1h | 24h | Titan |
| 2 | 4096 | 32 | 1 | 1 | 131072 | 15m | 1h | 24h | Titan |
| 2 | 8192 | 32 | 1 | 1 | 262144 | 15m | 1h | 24h | Titan |
- Workload: See Figure 4 of https://arxiv.org/pdf/1801.01843.pdf
- Plotting: See Figure 5 of https://arxiv.org/pdf/1801.01843.pdf
| N runs | N tasks | N core/task | N generations | N pilot | N core/pilot | Ideal Runtime | Walltime | Max Walltime | Resource |
|---|---|---|---|---|---|---|---|---|---|
| 2 | 256 | 32 | 1 | 1 | 8192 | 15m | 1h | 12h | Titan |
| 2 | 2048 | 32 | 1 | 1 | 65536 | 15m | 1h | 24h | Titan |
Definitions:
- Strong scalability: how the solution time varies with the number of processors for a fixed total problem size
- Ideal Runtime: the time taken by the workload to run on the given resources without any overhead
- Walltime: the walltime asked for the pilot
- Max Walltime: the maximum walltime that can be asked for the given size of the pilot
| N runs | N tasks | N core/task | N generations | N pilot | N core/pilot | Ideal Runtime | Walltime | Max Walltime | Resource |
|---|---|---|---|---|---|---|---|---|---|
| 2 | 16384 | 32 | 32 | 1 | 16384 | 480m (8h) | 12h | 12h | Titan |
| 2 | 16384 | 32 | 16 | 1 | 32768 | 240m (4h) | 6h | 12h | Titan |
| 2 | 16384 | 32 | 8 | 1 | 65536 | 120m (2h) | 3h | 24h | Titan |
| 2 | 16384 | 32 | 4 | 1 | 131072 | 60m (1h) | 2h | 24h | Titan |
| 2 | 16384 | 32 | 2 | 1 | 262144 | 30m (0.5h) | 2h | 24h | Titan |
The above runs have been performed with a Synapse drop-in replacement. To close out, the largest runs from weak and strong scaling are now verified with an actual GROMACS workload:
| N runs | N tasks | N core/task | N generations | N pilot | N core/pilot | Ideal Runtime | Walltime | Max Walltime | Resource |
|---|---|---|---|---|---|---|---|---|---|
| 2 | 16384 | 32 | 2 | 1 | 262144 | 30m (0.5h) | 2h | 24h | Titan |
| 2 | 8192 | 32 | 1 | 1 | 262144 | 15m | 1h | 24h | Titan |