chibi@1804:~/NVIDIA_CUDA-11.4_Samples/1_Utilities/p2pBandwidthLatencyTest$ ./p2pBandwidthLatencyTest
[P2P (Peer-to-Peer) GPU Bandwidth Latency Test]
Device: 0, NVIDIA TITAN RTX, pciBusID: 81, pciDeviceID: 0, pciDomainID:0
Device: 1, NVIDIA TITAN RTX, pciBusID: 82, pciDeviceID: 0, pciDomainID:0
Device=0 CAN Access Peer Device=1
Device=1 CAN Access Peer Device=0

***NOTE: In case a device doesn't have P2P access to other one, it falls back to normal memcopy procedure.
So you can see lesser Bandwidth (GB/s) and unstable Latency (us) in those cases.

P2P Connectivity Matrix
     D\D     0     1
     0         1     1
     1         1     1
Unidirectional P2P=Disabled Bandwidth Matrix (GB/s)
   D\D     0      1
     0 561.65   5.90
     1   5.96 563.37
Unidirectional P2P=Enabled Bandwidth (P2P Writes) Matrix (GB/s)
   D\D     0      1
     0 539.21  47.07
     1  47.10 563.04
Bidirectional P2P=Disabled Bandwidth Matrix (GB/s)
   D\D     0      1
     0 504.62   8.55
     1   8.60 551.71
Bidirectional P2P=Enabled Bandwidth Matrix (GB/s)
   D\D     0      1
     0 548.72  94.12
     1  93.67 552.84
P2P=Disabled Latency Matrix (us)
   GPU     0      1
     0   1.37  13.27
     1  12.58   1.29

   CPU     0      1
     0   3.40  10.30
     1  10.11   3.34
P2P=Enabled Latency (P2P Writes) Matrix (us)
   GPU     0      1
     0   1.32   1.73
     1   1.72   1.29

   CPU     0      1
     0   3.38   2.67
     1   2.94   3.36

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
chibi@1804:~/NVIDIA_CUDA-11.4_Samples/1_Utilities/p2pBandwidthLatencyTest$