chibi@2004:~/NVIDIA_CUDA-11.4_Samples/5_Simulations/nbody$ ./nbody -benchmark -numbodies=256000 -device=0
Run “nbody -benchmark [-numbodies=
-fullscreen (run n-body simulation in fullscreen mode)
-fp64 (use double precision floating point values for simulation)
-hostmem (stores simulation data in host memory)
-benchmark (run benchmark to measure performance)
-numbodies=
-device=
-numdevices= (where i=(number of CUDA devices > 0) to use for simulation)
-compare (compares simulation results running once on the default GPU and once on the CPU)
-cpu (run n-body simulation on the CPU)
-tipsy=
NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
> Windowed mode
> Simulation data stored in video memory
> Single precision floating point simulation
> 1 Devices used for simulation
gpuDeviceInit() CUDA Device [0]: “Ampere
> Compute 8.6 CUDA device: [NVIDIA RTX A5000]
number of bodies = 256000
256000 bodies, total time for 10 iterations: 932.152 ms
= 703.061 billion interactions per second
= 14061.221 single-precision GFLOP/s at 20 flops per interaction
chibi@2004:~/NVIDIA_CUDA-11.4_Samples/5_Simulations/nbody$
RTX A5000-24GB CUDA 11.4 ■単精度結果 =14061.221 single-precision GFLOP/s at 20 flops per interaction
Tesla V100-SXM2-16GB CUDA 10.1 ■単精度結果 =9033.258 single-precision GFLOP/s at 20 flops per interaction
AMD EPYC 7742 64-Core Processor x2 512GB Ubuntu 20.04.6 LTS RTX A5000-24GB CUDA 11.4 Samples nobody benchmark を単精度 で動作させてみた 14061.221 GFLOP s deviceQueryAMD EPYC 7742 64-Core Processor x2 512GB Ubuntu 20.04.6 LTS RTX A5000-24GB CUDA 11.4 Samples nobody benchmark を単精度 で動作させてみた 14061.221 GFLOP s nvidia-smi