{"id":5063,"date":"2021-04-09T02:09:53","date_gmt":"2021-04-08T17:09:53","guid":{"rendered":"https:\/\/wp.study3.biz\/?p=5063"},"modified":"2021-04-09T02:10:55","modified_gmt":"2021-04-08T17:10:55","slug":"amd-epyc-7302p-16-core-processor-red-hat-enterprise-linux-release-8-3-rtx2080ti-x2-cuda-11-1-simplep2p-p2pbandwidthlatencytest-bandwidthtest-devicequery%e3%82%92%e5%8b%95%e4%bd%9c%e3%81%95%e3%81%9b","status":"publish","type":"post","link":"https:\/\/wp.study3.biz\/?p=5063","title":{"rendered":"AMD EPYC 7302P 16-Core Processor Red Hat Enterprise Linux release 8.3 RTX2080Ti x2 CUDA 11.1 simpleP2P p2pBandwidthLatencyTest bandwidthTest deviceQuery\u3092\u52d5\u4f5c\u3055\u305b\u3066\u307f\u305f"},"content":{"rendered":"<p><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/wp.study3.biz\/wp-content\/uploads\/2021\/01\/Screenshot-from-2021-01-11-16-31-19.jpg\" alt=\"\" width=\"1024\" height=\"576\" class=\"alignnone size-full wp-image-5069\" \/><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/wp.study3.biz\/wp-content\/uploads\/2021\/01\/nvidia-smi-nvlink-c-nvidia-smi.jpg\" alt=\"\" width=\"1920\" height=\"1080\" class=\"alignnone size-full wp-image-5070\" \/><\/p>\n<p>[chibi@rhel8 simpleP2P]$ .\/simpleP2P<br \/>\n[.\/simpleP2P] &#8211; Starting&#8230;<br \/>\nChecking for multiple GPUs&#8230;<br \/>\nCUDA-capable device count: 2<\/p>\n<p>Checking GPU(s) for support of peer to peer memory access&#8230;<br \/>\n> Peer access from GeForce RTX 2080 Ti (GPU0) -> GeForce RTX 2080 Ti (GPU1) : Yes<br \/>\n> Peer access from GeForce RTX 2080 Ti (GPU1) -> GeForce RTX 2080 Ti (GPU0) : Yes<br \/>\nEnabling peer access between GPU0 and GPU1&#8230;<br \/>\nAllocating buffers (64MB on GPU0, GPU1 and CPU Host)&#8230;<br \/>\nCreating event handles&#8230;<br \/>\ncudaMemcpyPeer \/ cudaMemcpy between GPU0 and GPU1: <strong>43.54GB\/s<\/strong><br \/>\nPreparing host buffer and memcpy to GPU0&#8230;<br \/>\nRun kernel on GPU1, taking source data from GPU0 and writing to GPU1&#8230;<br \/>\nRun kernel on GPU0, taking source data from GPU1 and writing to GPU0&#8230;<br \/>\nCopy data back to host from GPU0 and verify results&#8230;<br \/>\nDisabling peer access&#8230;<br \/>\nShutting down&#8230;<br \/>\nTest passed<br \/>\n[chibi@rhel8 simpleP2P]$<br \/>\n[chibi@rhel8 deviceQuery]$ .\/deviceQuery<br \/>\n.\/deviceQuery Starting&#8230;<\/p>\n<p> CUDA Device Query (Runtime API) version (CUDART static linking)<\/p>\n<p>Detected 2 CUDA Capable device(s)<\/p>\n<p><strong>Device 0: &#8220;GeForce RTX 2080 Ti&#8221;<\/strong><br \/>\n  CUDA Driver Version \/ Runtime Version          11.1 \/ 11.1<br \/>\n  CUDA Capability Major\/Minor version number:    7.5<br \/>\n  Total amount of global memory:                 11019 MBytes (11554324480 bytes)<br \/>\n  (68) Multiprocessors, ( 64) CUDA Cores\/MP:     4352 CUDA Cores<br \/>\n  GPU Max Clock rate:                            1635 MHz (1.63 GHz)<br \/>\n  Memory Clock rate:                             7000 Mhz<br \/>\n  Memory Bus Width:                              352-bit<br \/>\n  L2 Cache Size:                                 5767168 bytes<br \/>\n  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)<br \/>\n  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers<br \/>\n  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers<br \/>\n  Total amount of constant memory:               65536 bytes<br \/>\n  Total amount of shared memory per block:       49152 bytes<br \/>\n  Total shared memory per multiprocessor:        65536 bytes<br \/>\n  Total number of registers available per block: 65536<br \/>\n  Warp size:                                     32<br \/>\n  Maximum number of threads per multiprocessor:  1024<br \/>\n  Maximum number of threads per block:           1024<br \/>\n  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)<br \/>\n  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)<br \/>\n  Maximum memory pitch:                          2147483647 bytes<br \/>\n  Texture alignment:                             512 bytes<br \/>\n  Concurrent copy and kernel execution:          Yes with 3 copy engine(s)<br \/>\n  Run time limit on kernels:                     Yes<br \/>\n  Integrated GPU sharing Host Memory:            No<br \/>\n  Support host page-locked memory mapping:       Yes<br \/>\n  Alignment requirement for Surfaces:            Yes<br \/>\n  Device has ECC support:                        Disabled<br \/>\n  Device supports Unified Addressing (UVA):      Yes<br \/>\n  Device supports Managed Memory:                Yes<br \/>\n  Device supports Compute Preemption:            Yes<br \/>\n  Supports Cooperative Kernel Launch:            Yes<br \/>\n  Supports MultiDevice Co-op Kernel Launch:      Yes<br \/>\n  Device PCI Domain ID \/ Bus ID \/ location ID:   0 \/ 129 \/ 0<br \/>\n  Compute Mode:<br \/>\n     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) ><\/p>\n<p><strong>Device 1: &#8220;GeForce RTX 2080 Ti&#8221;<\/strong><br \/>\n  CUDA Driver Version \/ Runtime Version          11.1 \/ 11.1<br \/>\n  CUDA Capability Major\/Minor version number:    7.5<br \/>\n  Total amount of global memory:                 11019 MBytes (11554717696 bytes)<br \/>\n  (68) Multiprocessors, ( 64) CUDA Cores\/MP:     4352 CUDA Cores<br \/>\n  GPU Max Clock rate:                            1635 MHz (1.63 GHz)<br \/>\n  Memory Clock rate:                             7000 Mhz<br \/>\n  Memory Bus Width:                              352-bit<br \/>\n  L2 Cache Size:                                 5767168 bytes<br \/>\n  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)<br \/>\n  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers<br \/>\n  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers<br \/>\n  Total amount of constant memory:               65536 bytes<br \/>\n  Total amount of shared memory per block:       49152 bytes<br \/>\n  Total shared memory per multiprocessor:        65536 bytes<br \/>\n  Total number of registers available per block: 65536<br \/>\n  Warp size:                                     32<br \/>\n  Maximum number of threads per multiprocessor:  1024<br \/>\n  Maximum number of threads per block:           1024<br \/>\n  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)<br \/>\n  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)<br \/>\n  Maximum memory pitch:                          2147483647 bytes<br \/>\n  Texture alignment:                             512 bytes<br \/>\n  Concurrent copy and kernel execution:          Yes with 3 copy engine(s)<br \/>\n  Run time limit on kernels:                     No<br \/>\n  Integrated GPU sharing Host Memory:            No<br \/>\n  Support host page-locked memory mapping:       Yes<br \/>\n  Alignment requirement for Surfaces:            Yes<br \/>\n  Device has ECC support:                        Disabled<br \/>\n  Device supports Unified Addressing (UVA):      Yes<br \/>\n  Device supports Managed Memory:                Yes<br \/>\n  Device supports Compute Preemption:            Yes<br \/>\n  Supports Cooperative Kernel Launch:            Yes<br \/>\n  Supports MultiDevice Co-op Kernel Launch:      Yes<br \/>\n  Device PCI Domain ID \/ Bus ID \/ location ID:   0 \/ 130 \/ 0<br \/>\n  Compute Mode:<br \/>\n     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) ><br \/>\n<strong>> Peer access from GeForce RTX 2080 Ti (GPU0) -> GeForce RTX 2080 Ti (GPU1) : Yes<br \/>\n> Peer access from GeForce RTX 2080 Ti (GPU1) -> GeForce RTX 2080 Ti (GPU0) : Yes<\/strong><\/p>\n<p>deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 11.1, CUDA Runtime Version = 11.1, NumDevs = 2<br \/>\nResult = PASS<br \/>\n[chibi@rhel8 deviceQuery]$<br \/>\n<a href=\"https:\/\/wp.study3.biz\/wp-content\/uploads\/2021\/01\/AMD-EPYC-7302P-16-Core-Processor-Red-Hat-Enterprise-Linux-release-8.3-RTX2080Ti-x2-CUDA-11.1-simpleP2P-p2pBandwidthLatencyTest-bandwidthTest-deviceQuery.txt\">AMD EPYC 7302P 16-Core Processor Red Hat Enterprise Linux release 8.3 RTX2080Ti x2 CUDA 11.1 simpleP2P p2pBandwidthLatencyTest bandwidthTest deviceQuery<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>[chibi@rhel8 simpleP2P]$ .\/simpleP2P [.\/simpleP2P] &#8211; Starting&#8230; Checking for multiple GPUs&#8230; C &hellip; <a href=\"https:\/\/wp.study3.biz\/?p=5063\">\u7d9a\u304d\u3092\u8aad\u3080 <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[18,22],"tags":[],"class_list":["post-5063","post","type-post","status-publish","format-standard","hentry","category-nvidia","category-rhel8"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/wp.study3.biz\/index.php?rest_route=\/wp\/v2\/posts\/5063","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/wp.study3.biz\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/wp.study3.biz\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/wp.study3.biz\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/wp.study3.biz\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=5063"}],"version-history":[{"count":5,"href":"https:\/\/wp.study3.biz\/index.php?rest_route=\/wp\/v2\/posts\/5063\/revisions"}],"predecessor-version":[{"id":5075,"href":"https:\/\/wp.study3.biz\/index.php?rest_route=\/wp\/v2\/posts\/5063\/revisions\/5075"}],"wp:attachment":[{"href":"https:\/\/wp.study3.biz\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=5063"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/wp.study3.biz\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=5063"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/wp.study3.biz\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=5063"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}