C:\ProgramData\NVIDIA Corporation\CUDA Samples\v10.2\bin\win64\Debug>p2pBandwidthLatencyTest
[P2P (Peer-to-Peer) GPU Bandwidth Latency Test]
Device: 0, GeForce RTX 2080 Ti, pciBusID: 1, pciDeviceID: 0, pciDomainID:0
Device: 1, GeForce RTX 2080 Ti, pciBusID: 21, pciDeviceID: 0, pciDomainID:0
Device=0 CAN Access Peer Device=1
Device=1 CAN Access Peer Device=0

***NOTE: In case a device doesn't have P2P access to other one, it falls back to normal memcopy procedure.
So you can see lesser Bandwidth (GB/s) and unstable Latency (us) in those cases.

P2P Connectivity Matrix
     D\D     0     1
     0       1     1
     1       1     1
Unidirectional P2P=Disabled Bandwidth Matrix (GB/s)
   D\D     0      1
     0 515.61   9.12
     1   9.25 520.36
Unidirectional P2P=Enabled Bandwidth (P2P Writes) Matrix (GB/s)
   D\D     0      1
     0 518.01  46.95
     1  46.97 516.70
Bidirectional P2P=Disabled Bandwidth Matrix (GB/s)
   D\D     0      1
     0 524.65  11.07
     1  10.79 529.32
Bidirectional P2P=Enabled Bandwidth Matrix (GB/s)
   D\D     0      1
     0 528.71  92.30
     1  93.57 518.58
P2P=Disabled Latency Matrix (us)
   GPU     0      1
     0   3.97 153.08
     1 147.86   3.44

   CPU     0      1
     0   2.16  45.88
     1  48.63   2.15
P2P=Enabled Latency (P2P Writes) Matrix (us)
   GPU     0      1
     0   4.77   1.69
     1   1.62   4.25

   CPU     0      1
     0   2.17   1.23
     1   1.21   2.11

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

C:\ProgramData\NVIDIA Corporation\CUDA Samples\v10.2\bin\win64\Debug>