chibi@1804:~/NVIDIA_CUDA-11.4_Samples/1_Utilities/p2pBandwidthLatencyTest$ ./p2pBandwidthLatencyTest [P2P (Peer-to-Peer) GPU Bandwidth Latency Test] Device: 0, NVIDIA TITAN RTX, pciBusID: 81, pciDeviceID: 0, pciDomainID:0 Device: 1, NVIDIA TITAN RTX, pciBusID: 82, pciDeviceID: 0, pciDomainID:0 Device=0 CAN Access Peer Device=1 Device=1 CAN Access Peer Device=0 ***NOTE: In case a device doesn't have P2P access to other one, it falls back to normal memcopy procedure. So you can see lesser Bandwidth (GB/s) and unstable Latency (us) in those cases. P2P Connectivity Matrix D\D 0 1 0 1 1 1 1 1 Unidirectional P2P=Disabled Bandwidth Matrix (GB/s) D\D 0 1 0 561.65 5.90 1 5.96 563.37 Unidirectional P2P=Enabled Bandwidth (P2P Writes) Matrix (GB/s) D\D 0 1 0 539.21 47.07 1 47.10 563.04 Bidirectional P2P=Disabled Bandwidth Matrix (GB/s) D\D 0 1 0 504.62 8.55 1 8.60 551.71 Bidirectional P2P=Enabled Bandwidth Matrix (GB/s) D\D 0 1 0 548.72 94.12 1 93.67 552.84 P2P=Disabled Latency Matrix (us) GPU 0 1 0 1.37 13.27 1 12.58 1.29 CPU 0 1 0 3.40 10.30 1 10.11 3.34 P2P=Enabled Latency (P2P Writes) Matrix (us) GPU 0 1 0 1.32 1.73 1 1.72 1.29 CPU 0 1 0 3.38 2.67 1 2.94 3.36 NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled. chibi@1804:~/NVIDIA_CUDA-11.4_Samples/1_Utilities/p2pBandwidthLatencyTest$