CentOS Linux release 7.7.1908 TITAN V x2 CUDA 10.1 Samples nbody を単精度 倍精度で動作させてみた 17596.655 6569.545 GFLOP s

[chibi@centos7 nbody]$ cat /etc/redhat-release
CentOS Linux release 7.7.1908 (Core)
[chibi@centos7 nbody]$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243
[chibi@centos7 nbody]$ sudo hddtemp /dev/sda
[sudo] chibi password:
/dev/sda: ST2000LX001-1RG174: 18°C
[chibi@centos7 nbody]$ ./nbody –benchmark –numbodies=256000 -numdevices=2
Run “nbody -benchmark [-numbodies=<numBodies>]” to measure performance.
-fullscreen (run n-body simulation in fullscreen mode)
-fp64 (use double precision floating point values for simulation)
-hostmem (stores simulation data in host memory)
-benchmark (run benchmark to measure performance)
-numbodies=<N> (number of bodies (>= 1) to run in simulation)
-device=<d> (where d=0,1,2…. for the CUDA device to use)
-numdevices=<i> (where i=(number of CUDA devices > 0) to use for simulation)
-compare (compares simulation results running once on the default GPU and once on the CPU)
-cpu (run n-body simulation on the CPU)
-tipsy=<file.bin> (load a tipsy model file for simulation)

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

number of CUDA devices = 2
> Windowed mode
> Simulation data stored in system memory
> Single precision floating point simulation
> 2 Devices used for simulation
GPU Device 0: “TITAN V” with compute capability 7.0

> Compute 7.0 CUDA device: [TITAN V]
> Compute 7.0 CUDA device: [TITAN V]
number of bodies = 256000
256000 bodies, total time for 10 iterations: 744.869 ms
= 879.833 billion interactions per second
= 17596.655 single-precision GFLOP/s at 20 flops per interaction
[chibi@centos7 nbody]$ ./nbody –benchmark -fp64 –numbodies=256000 -numdevices=
2
Run “nbody -benchmark [-numbodies=<numBodies>]” to measure performance.
-fullscreen (run n-body simulation in fullscreen mode)
-fp64 (use double precision floating point values for simulation)
-hostmem (stores simulation data in host memory)
-benchmark (run benchmark to measure performance)
-numbodies=<N> (number of bodies (>= 1) to run in simulation)
-device=<d> (where d=0,1,2…. for the CUDA device to use)
-numdevices=<i> (where i=(number of CUDA devices > 0) to use for simulation)
-compare (compares simulation results running once on the default GPU and once on the CPU)
-cpu (run n-body simulation on the CPU)
-tipsy=<file.bin> (load a tipsy model file for simulation)

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

number of CUDA devices = 2
> Windowed mode
> Simulation data stored in system memory
> Double precision floating point simulation
> 2 Devices used for simulation
GPU Device 0: “TITAN V” with compute capability 7.0

Compute 7.0 CUDA device: [TITAN V]
> Compute 7.0 CUDA device: [TITAN V]
number of bodies = 256000
256000 bodies, total time for 10 iterations: 2992.719 ms
= 218.985 billion interactions per second
= 6569.545 double-precision GFLOP/s at 30 flops per interaction

データ詳細 CentOS Linux release 7.7.1908 TITAN V x2 CUDA 10.1 Samples nbody 単精度 倍精度 17596.655 6569.545 GFLOP s

GPU温度推移 CentOS Linux release 7.7.1908 TITAN V x2 CUDA 10.1 Samples nbody 単精度 倍精度 17596.655 6569.545 GFLOP s nvidia-smi

カテゴリー: centos7, nvidia パーマリンク

コメントを残す

メールアドレスが公開されることはありません。 が付いている欄は必須項目です