user0034@10-111-34-13:~/NVIDIA_CUDA-10.1_Samples/5_Simulations/nbody$  ./nbody -benchmark -numbodies=256000 -device=0
Run "nbody -benchmark [-numbodies=<numBodies>]" to measure performance.
        -fullscreen       (run n-body simulation in fullscreen mode)
        -fp64             (use double precision floating point values for simulation)
        -hostmem          (stores simulation data in host memory)
        -benchmark        (run benchmark to measure performance)
        -numbodies=<N>    (number of bodies (>= 1) to run in simulation)
        -device=<d>       (where d=0,1,2.... for the CUDA device to use)
        -numdevices=<i>   (where i=(number of CUDA devices > 0) to use for simulation)
        -compare          (compares simulation results running once on the default GPU and once on the CPU)
        -cpu              (run n-body simulation on the CPU)
        -tipsy=<file.bin> (load a tipsy model file for simulation)

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

> Windowed mode
> Simulation data stored in video memory
> Single precision floating point simulation
> 1 Devices used for simulation
gpuDeviceInit() CUDA Device [0]: "Tesla V100-SXM2-16GB
> Compute 7.0 CUDA device: [Tesla V100-SXM2-16GB]
number of bodies = 256000
256000 bodies, total time for 10 iterations: 1450.994 ms
= 451.663 billion interactions per second
= 9033.258 single-precision GFLOP/s at 20 flops per interaction
user0034@10-111-34-13:~/NVIDIA_CUDA-10.1_Samples/5_Simulations/nbody$ ./nbody -f
p64 -benchmark -numbodies=256000 -device=0
Run "nbody -benchmark [-numbodies=<numBodies>]" to measure performance.
        -fullscreen       (run n-body simulation in fullscreen mode)
        -fp64             (use double precision floating point values for simulation)
        -hostmem          (stores simulation data in host memory)
        -benchmark        (run benchmark to measure performance)
        -numbodies=<N>    (number of bodies (>= 1) to run in simulation)
        -device=<d>       (where d=0,1,2.... for the CUDA device to use)
        -numdevices=<i>   (where i=(number of CUDA devices > 0) to use for simulation)
        -compare          (compares simulation results running once on the default GPU and once on the CPU)
        -cpu              (run n-body simulation on the CPU)
        -tipsy=<file.bin> (load a tipsy model file for simulation)

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

> Windowed mode
> Simulation data stored in video memory
> Double precision floating point simulation
> 1 Devices used for simulation
gpuDeviceInit() CUDA Device [0]: "Tesla V100-SXM2-16GB
> Compute 7.0 CUDA device: [Tesla V100-SXM2-16GB]
number of bodies = 256000
256000 bodies, total time for 10 iterations: 3783.535 ms
= 173.214 billion interactions per second
= 5196.411 double-precision GFLOP/s at 30 flops per interaction
user0034@10-111-34-13:~/NVIDIA_CUDA-10.1_Samples/5_Simulations/nbody$ cd
user0034@10-111-34-13:~$ cat /etc/os-release
NAME="Ubuntu"
VERSION="16.04.6 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.6 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial
user0034@10-111-34-13:~$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243
user0034@10-111-34-13:~$ nvidia-smi nvlink -c
GPU 0: Tesla V100-SXM2-16GB (UUID: GPU-645fd6fb-5884-66a2-d47c-238743a797c3)
user0034@10-111-34-13:~$