Computing resources

Our clusters are grouped by CPU generations, available RAM size and infiniband networks. They are then sliced into partitions (See Clusters/Partitions overview).

Big picture

Hardware specifications per node:

Clusters

CPU family

nb cores

RAM (GB)

Network

main Scratch

Best use case

E5

E5

16

62, 124, 252

56Gb/s

/scratch/E5N

training, sequential, small parallel

Lake

E5 + GPU

8

124

56Gb/s

/scratch/Lake

sequential, small parallel , GPU computing

Sky Lake

32

94, 124, 190, 380

medium parallel, sequential

Cascade Lake

AMD Epyc

128

510

100Gb/s

large parallel

Cascade

Cascade Lake

96

380

100Gb/s

/scratch/Cascade

large parallel

See Clusters/Partitions overview for more hardware details and partitions slicing. Available RAM size may vary a little (not all RAM is available for computing, GB vs GiB, etc.).

GPU Specifications

PSMN offers two types of NVIDIA GPUs which are available on E5-GPU and Cascade-GPU Partitions.

GPU specifications per cluster:

Partition

login nodes

CPU Model

GPU

CUDA support

Compute Cap. Version

E5-GPU

r730gpu01

E5-2637v3 @ 3.5GHz

RTX2080Ti

11.7 -> 12.2

7.5

Cascade-GPU

Platinum 9242 @ 2.3GHz

L4

11.7 -> 12.2

8.9

Hardware specifications per GPU Type:

Specification

L4

RTX2080Ti

Architecture

Ada Lovelace

Turing

Cores

7424

4352

FP32 (TFLOPS)

30.3

13.45

TF32 Tensor Core (TFLOPS)

120

FP16 Tensor Core (TFLOPS)

242

26.90

BFLOAT16 Tensor Core (TFLOPS)

242

FP8 Tensor Core (TFLOPS)

485

INT8 Tensor Core (TOPs)

485

Boost clock speed (MHz)

2040

1640

Core clock speed (MHz)

795

1350

GPU Memory (GB)

24

11

GPU Memory Bandwidth (GB/s)

300

616

Max Thermal Design Power (W)

72

260

Available resources

Use the sinfo [1] command to get the dynamic view of partitions (default one is noted with a ‘*’, also sinfo -l, sinfo -lNe and sinfo --summarize):

$ sinfo
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
E5*          up 8-00:00:00      4   idle c82gluster[1-4]
Cascade      up 8-00:00:00     77   idle s92node[02-78]

Or informations state about a particular partition:

$ sinfo -p Epyc
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
Epyc         up 8-00:00:00      1    mix c6525node002
Epyc         up 8-00:00:00     12  alloc c6525node[001,003-006,008-014]
Epyc         up 8-00:00:00      1   idle c6525node007

To see more informations (cpus and cpu organization, RAM size [in MiB], state/availability), use one of these:

$ sinfo --exact --format="%9P %.8z %.8X %.8Y %.8c %.7m %.5D %N"
PARTITION    S:C:T  SOCKETS    CORES     CPUS  MEMORY NODES NODELIST
E5*          2:8:1        2        8       16  128872     4 c82gpgpu[31-34]
E5*          2:8:1        2        8       16   64328     3 c82gluster[2-4]
E5-GPU       2:4:1        2        4        8  128829     1 r730gpu20
Lake        2:16:1        2       16       32  385582     3 c6420node[172-174]
Cascade     2:48:1        2       48       96  385606    77 s92node[02-78]

$ sinfo --exact --format="%9P %.8c %.7m %.5D %.14F %N"
PARTITION     CPUS  MEMORY NODES NODES(A/I/O/T) NODELIST
E5*             16  128872     4        3/1/0/4 c82gpgpu[31-34]
E5*             16   64328     3        3/0/0/3 c82gluster[2-4]
E5-GPU           8  128829     1        0/1/0/1 r730gpu20
Lake            32  385582     3        1/2/0/3 c6420node[172-174]
Cascade         96  385606    77     47/26/4/77 s92node[02-78]

$ sinfo --exact --format="%9P %.8c %.7m %.20C %.5D %25f" --partition E5,E5-GPU
PARTITION     CPUS  MEMORY        CPUS(A/I/O/T) NODES AVAIL_FEATURES
E5*             16  256000       248/120/16/384    24 local_scratch
E5*             16  128828         354/30/0/384    24 (null)
E5*             16  257852          384/0/0/384    24 (null)
E5*             32  257843          384/0/0/384    12 (null)
E5*             16   64328            48/0/0/48     3 (null)
E5*             16  128872            64/0/0/64     4 (null)
E5-GPU           8  127000         32/128/0/160    20 gpu

A/I/O/T standing for Allocated/Idle/Other/Total, in CPU terms.

$ sinfo -lN | less
NODELIST     NODES PARTITION       STATE CPUS    S:C:T MEMORY TMP_DISK WEIGHT AVAIL_FE REASON
[...]
c82gluster4      1       E5*        idle 16      2:8:1  64328        0      1   (null) none
s92node02        1   Cascade        idle 96     2:48:1 385606        0      1   (null) none
[...]

Important

  • HyperThreading [2] is activated on all Intel nodes, but not available as computing resources (real cores vs logical cores).

  • RAM size is in MiB, and you cannot reserve more than 94% of it, by node.

Basic defaults

  • default partition: E5

  • default time: 10 minutes

  • default cpu(s): 1 core

  • default memory size: 4GiB / core

Features

Some nodes have features [3] (gpu, local_scratch, etc.).

To request a feature/constraint, you must add the following line to your submit script: #SBATCH --constraint=<feature>. Example:

#!/bin/bash
#SBATCH --name=my_job_needs_local_scratch
#SBATCH --time=02:00:00
#SBATCH --ntasks=8
#SBATCH --mem-per-cpu=4096M
#SBATCH --constraint=local_scratch

Only nodes having features matching the job constraints will be used to satisfy the request.

Maximums

Here are some maximums of usable resources per job:

  • maximum wall-time : 8 days (‘8-0:0:0’ as ‘day-hours:minutes:secondes’)

  • maximum nodes per job and/or maximum cores per job:

Partition

nodes

cores

gpu

E5

24

384

E5-GPU

18

144

18

Lake

24

768

Epyc

14

1792

Cascade

76

7296

Cascade-GPU

12

1152

12

Anything more must be justified using our contact forms.