Microsoft Windows [Version 10.0.19042.928] (c) Microsoft Corporation. All rights reserved. C:\Windows\system32>cd C:\ProgramData\NVIDIA Corporation\CUDA Samples\v11.3\bin\win64\Debug C:\ProgramData\NVIDIA Corporation\CUDA Samples\v11.3\bin\win64\Debug>nvidia-smi Mon Apr 19 14:21:40 2021 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 466.11 Driver Version: 466.11 CUDA Version: 11.3 | |-------------------------------+----------------------+----------------------+ | GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA TITAN RTX WDDM | 00000000:41:00.0 On | N/A | | 40% 34C P8 22W / 280W | 670MiB / 24576MiB | 1% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 1 NVIDIA TITAN RTX WDDM | 00000000:61:00.0 Off | N/A | | 41% 41C P8 34W / 280W | 670MiB / 24576MiB | 1% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 2856 C+G Insufficient Permissions N/A | | 0 N/A N/A 12400 C+G ...nputApp\TextInputHost.exe N/A | | 0 N/A N/A 12736 C+G ...artMenuExperienceHost.exe N/A | | 0 N/A N/A 13256 C+G ...5n1h2txyewy\SearchApp.exe N/A | | 1 N/A N/A 2856 C+G Insufficient Permissions N/A | | 1 N/A N/A 12400 C+G ...nputApp\TextInputHost.exe N/A | | 1 N/A N/A 12736 C+G ...artMenuExperienceHost.exe N/A | | 1 N/A N/A 13256 C+G ...5n1h2txyewy\SearchApp.exe N/A | +-----------------------------------------------------------------------------+ C:\ProgramData\NVIDIA Corporation\CUDA Samples\v11.3\bin\win64\Debug>nvidia-smi nvlink -c GPU 0: NVIDIA TITAN RTX (UUID: GPU-7fb51c1d-c1e7-35cc-aad7-66971f05ddb7) Link 0, P2P is supported: true Link 0, Access to system memory supported: true Link 0, P2P atomics supported: true Link 0, System memory atomics supported: true Link 0, SLI is supported: true Link 0, Link is supported: false Link 1, P2P is supported: true Link 1, Access to system memory supported: true Link 1, P2P atomics supported: true Link 1, System memory atomics supported: true Link 1, SLI is supported: true Link 1, Link is supported: false GPU 1: NVIDIA TITAN RTX (UUID: GPU-5a71d61e-f130-637a-b33d-4df555b0ed88) Link 0, P2P is supported: true Link 0, Access to system memory supported: true Link 0, P2P atomics supported: true Link 0, System memory atomics supported: true Link 0, SLI is supported: true Link 0, Link is supported: false Link 1, P2P is supported: true Link 1, Access to system memory supported: true Link 1, P2P atomics supported: true Link 1, System memory atomics supported: true Link 1, SLI is supported: true Link 1, Link is supported: false C:\ProgramData\NVIDIA Corporation\CUDA Samples\v11.3\bin\win64\Debug>nvcc -V nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2021 NVIDIA Corporation Built on Sun_Mar_21_19:24:09_Pacific_Daylight_Time_2021 Cuda compilation tools, release 11.3, V11.3.58 Build cuda_11.3.r11.3/compiler.29745058_0 C:\ProgramData\NVIDIA Corporation\CUDA Samples\v11.3\bin\win64\Debug>p2pBandwidthLatencyTest [P2P (Peer-to-Peer) GPU Bandwidth Latency Test] Device: 0, NVIDIA TITAN RTX, pciBusID: 41, pciDeviceID: 0, pciDomainID:0 Device: 1, NVIDIA TITAN RTX, pciBusID: 61, pciDeviceID: 0, pciDomainID:0 Device=0 CAN Access Peer Device=1 Device=1 CAN Access Peer Device=0 ***NOTE: In case a device doesn't have P2P access to other one, it falls back to normal memcopy procedure. So you can see lesser Bandwidth (GB/s) and unstable Latency (us) in those cases. P2P Connectivity Matrix D\D 0 1 0 1 1 1 1 1 Unidirectional P2P=Disabled Bandwidth Matrix (GB/s) D\D 0 1 0 562.27 9.38 1 9.61 560.84 Unidirectional P2P=Enabled Bandwidth (P2P Writes) Matrix (GB/s) D\D 0 1 0 548.66 46.92 1 46.90 560.78 Bidirectional P2P=Disabled Bandwidth Matrix (GB/s) D\D 0 1 0 549.89 9.53 1 10.03 552.87 Bidirectional P2P=Enabled Bandwidth Matrix (GB/s) D\D 0 1 0 548.64 93.19 1 90.10 552.58 P2P=Disabled Latency Matrix (us) GPU 0 1 0 3.43 188.41 1 167.31 3.44 CPU 0 1 0 2.57 98.02 1 122.12 2.21 P2P=Enabled Latency (P2P Writes) Matrix (us) GPU 0 1 0 3.71 2.22 1 2.42 3.43 CPU 0 1 0 2.35 1.32 1 1.34 2.16 NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled. C:\ProgramData\NVIDIA Corporation\CUDA Samples\v11.3\bin\win64\Debug>