[chibi@manjaro p2pBandwidthLatencyTest]$ ./p2pBandwidthLatencyTest
[P2P (Peer-to-Peer) GPU Bandwidth Latency Test]
Device: 0, GeForce RTX 2080 Ti, pciBusID: 81, pciDeviceID: 0, pciDomainID:0
Device: 1, GeForce RTX 2080 Ti, pciBusID: 82, pciDeviceID: 0, pciDomainID:0
Device=0 CAN Access Peer Device=1
Device=1 CAN Access Peer Device=0
***NOTE: In case a device doesn’t have P2P access to other one, it falls back to normal memcopy procedure.
So you can see lesser Bandwidth (GB/s) and unstable Latency (us) in those cases.
P2P Connectivity Matrix
D\D 0 1
0 1 1
1 1 1
Unidirectional P2P=Disabled Bandwidth Matrix (GB/s)
D\D 0 1
0 475.79 5.89
1 5.85 542.91
Unidirectional P2P=Enabled Bandwidth (P2P Writes) Matrix (GB/s)
D\D 0 1
0 527.62 47.10
1 47.10 542.91
Bidirectional P2P=Disabled Bandwidth Matrix (GB/s)
D\D 0 1
0 532.76 8.25
1 8.24 531.49
Bidirectional P2P=Enabled Bandwidth Matrix (GB/s)
D\D 0 1
0 527.22 94.15
1 94.04 530.54
P2P=Disabled Latency Matrix (us)
GPU 0 1
0 1.37 14.29
1 12.45 1.33
CPU 0 1
0 3.33 10.16
1 10.11 3.28
P2P=Enabled Latency (P2P Writes) Matrix (us)
GPU 0 1
0 1.32 1.68
1 1.81 1.32
CPU 0 1
0 3.34 2.68
1 2.70 3.31
NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
[chibi@manjaro p2pBandwidthLatencyTest]$
AMD EPYC 7302P 16-Core Processor Manjaro Linux RTX2080Ti x2 CUDA 11.2 simpleP2P p2pBandwidthLatencyTest bandwidthTest deviceQuery