Calculating on GPU
PSMN offers L4 (Cascade-GPU) and RTX2080Ti (E5-GPU), see Computing resources for more detailed information about specifications.
Basic Commands
nvidia-smi
will show information about the NVIDIA GPU.
$ nvidia-smi
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.183.01 Driver Version: 535.183.01 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 2080 Ti On | 00000000:82:00.0 Off | N/A |
| 28% 25C P8 1W / 250W | 1MiB / 11264MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
nvtop
, similar to the top
command, will give you real-time information about processes running on the GPU.
$ nvtop
Device 0 [NVIDIA GeForce RTX 2080 Ti] PCIe GEN 1@16x RX: 0.000 KiB/s TX: 0.000 KiB/s
GPU 300MHz MEM 405MHz TEMP 24°C FAN 28% POW 1 / 250 W
GPU[ 0%] MEM[| 0.248Gi/11.000Gi]
┌────────────────────────────────────────────────────────────────────────────────────────────────┐
100│ GPU 0│
75%│ MEM│
│ │
50%│ │
│ │
25%│ │
0%│────────────────────────────────────────────────────────────────────────────────────────────────│
└────────────────────────────────────────────────────────────────────────────────────────────────┘
PID USER GPU TYPE GPU MEM CPU HOST MEM Command
Getting to know your GPU with CUDA
CUDA is a proprietary application programming interface (API) from NVIDIA that can help you optimize your use of GPU devices by:
Specifiying thread parallelism
Optimizing memory access patterns
Managing occupancy
Tip
To run NVIDIA CUDA you need to be connected to a node with a CUDA-capable GPU and load a gcc compiler and toolchain.
You can gather fundamental information about the GPU by running the deviceQuery program (see below).
Compiled CUDA Sample Programs
Samples of compiled CUDA programs for RTX2080Ti GPU are available at: /applis/PSMN/debian11/CUDA/RTX2080Ti
Samples of compiled CUDA programs for L4 GPU are available at: /applis/PSMN/debian11/CUDA/L4
All source code for programs can be found at: https://github.com/NVIDIA/cuda-samples/tree/master/Samples
List of CUDA sample programs available:
bandwidthTest
This is a simple test program to measure the memcopy bandwidth of the GPU and memcopy bandwidth across PCI-e. This test application is capable of measuring device to device copy bandwidth, host to device copy bandwidth for pageable and page-locked memory, and device to host copy bandwidth for pageable and page-locked memory.
clock
This example shows how to use the clock function to measure the performance of block of threads of a kernel accurately.
deviceQuery
This sample enumerates the properties of the CUDA devices present in the system.
deviceQueryDrv
This sample enumerates the properties of the CUDA devices present using CUDA Driver API calls.
eigenvalues
The computation of all or a subset of all eigenvalues is an important problem in Linear Algebra, statistics, physics, and many other fields. This sample demonstrates a parallel implementation of a bisection algorithm for the computation of all eigenvalues of a tridiagonal symmetric matrix of arbitrary size with CUDA.
graphMemoryFootprint
This sample demonstrates how graph memory nodes re-use virtual addresses and physical memory.
matrixMul
This sample implements matrix multiplication and is exactly the same as Chapter 6 of the programming guide. It has been written for clarity of exposition to illustrate various CUDA programming principles, not with the goal of providing the most performant generic kernel for matrix multiplication. To illustrate GPU performance for matrix multiply, this sample also shows how to use the new CUDA 4.0 interface for CUBLAS to demonstrate high-performance performance for matrix multiplication.
MonteCarloMultiGPU
This sample evaluates fair call price for a given set of European options using the Monte Carlo approach, taking advantage of all CUDA-capable GPUs installed in the system. This sample uses double precision hardware if a GTX 200 class GPU is present. The sample also takes advantage of CUDA 4.0 capability to supporting using a single CPU thread to control multiple GPUs.
simpleOccupancy
This sample demonstrates the basic usage of the CUDA occupancy calculator and occupancy-based launch configurator APIs by launching a kernel with the launch configurator, and measures the utilization difference against a manually configured launch.
topologyQuery
A simple example on how to query the topology of a system with multiple GPU
vectorAdd
This CUDA Runtime API sample is a very basic sample that implements element by element vector addition. It is the same as the sample illustrating Chapter 3 of the programming guide with some additions like error checking.
Compiling Programs with Cuda (NVCC)
You can compile your code with the NVIDIA CUDA Compiler Driver (NVCC).
$ nvcc -o clock clock.cu -I/path/to/custom/library/include
Additional documentation for the CUDA compiler is available here: https://docs.nvidia.com/cuda/pdf/CUDA_Compiler_Driver_NVCC.pdf
Important
Our NVIDIA GeForce RTX 2080 Ti GPUs currently have driver version 535.183.01, which is compatible with CUDA 12.2 Toolkit. You cannot currently compile a program with NVCC with a Cuda toolkit greater than 12.2.
Toolchain compatibility with CUDA Toolkit 12.2:
GCC 10.2.0
Clang 9.0.0