AMD Ryzen Threadripper 3990X 64-Core Processor CentOS Linux release 8.2 TITAN RTX x2 RTX2080Ti x2 CUDA 11.0 namd 2.12-171025 STMV (virus) benchmark (1,066,628 atoms, periodic, PME)を他のCPUと比較して動作させてみた 0.265825 days/ns

[chibi@centos8 ~]$ sudo nvidia-docker run -it –rm nvcr.io/hpc/namd:2.12-171025 /opt/namd/namd-multicore-memopt +p40 +setcpuaffinity +idlepoll /workspace/examples/stmv/stmv_pmecuda.namd
[sudo] chibi のパスワード:
Charm++: standalone mode (not using charmrun)
Charm++> Running in Multicore mode: 40 threads
Charm++> Using recursive bisection (scheme 3) for topology aware partitions
Converse/Charm++ Commit ID: v6.8.2
Warning> Randomization of virtual memory (ASLR) is turned on in the kernel, thread migration may not work! Run ‘echo 0 > /proc/sys/kernel/randomize_va_space’ as root to disable it, or try running with ‘+isomalloc_sync’.
CharmLB> Load balancer assumes all CPUs are same.
Charm++> cpu affinity enabled.
Charm++> Running on 1 unique compute nodes (128-way SMP).
Charm++> cpu topology info is gathered in 0.004 seconds.
Info: Built with CUDA version 9000
Did not find +devices i,j,k,… argument, using all
Pe 19 physical rank 19 will use CUDA device of pe 16
Pe 3 physical rank 3 will use CUDA device of pe 8
Pe 28 physical rank 28 will use CUDA device of pe 24
Pe 18 physical rank 18 will use CUDA device of pe 16
Pe 22 physical rank 22 will use CUDA device of pe 24
Pe 23 physical rank 23 will use CUDA device of pe 24
Pe 4 physical rank 4 will use CUDA device of pe 8
Pe 5 physical rank 5 will use CUDA device of pe 8
Pe 12 physical rank 12 will use CUDA device of pe 16
Pe 21 physical rank 21 will use CUDA device of pe 24
Pe 2 physical rank 2 will use CUDA device of pe 8
Pe 36 physical rank 36 will use CUDA device of pe 32
Pe 29 physical rank 29 will use CUDA device of pe 24
Pe 1 physical rank 1 will use CUDA device of pe 8
Pe 17 physical rank 17 will use CUDA device of pe 16
Pe 35 physical rank 35 will use CUDA device of pe 32
Pe 0 physical rank 0 will use CUDA device of pe 8
Pe 10 physical rank 10 will use CUDA device of pe 16
Pe 33 physical rank 33 will use CUDA device of pe 32
Pe 9 physical rank 9 will use CUDA device of pe 8
Pe 34 physical rank 34 will use CUDA device of pe 32
Pe 20 physical rank 20 will use CUDA device of pe 24
Pe 11 physical rank 11 will use CUDA device of pe 16
Pe 31 physical rank 31 will use CUDA device of pe 32
Pe 6 physical rank 6 will use CUDA device of pe 8
Pe 30 physical rank 30 will use CUDA device of pe 32
Pe 15 physical rank 15 will use CUDA device of pe 16
Pe 39 physical rank 39 will use CUDA device of pe 32
Pe 38 physical rank 38 will use CUDA device of pe 32
Pe 7 physical rank 7 will use CUDA device of pe 8
Pe 14 physical rank 14 will use CUDA device of pe 16
Pe 25 physical rank 25 will use CUDA device of pe 24
Pe 27 physical rank 27 will use CUDA device of pe 24
Pe 26 physical rank 26 will use CUDA device of pe 24
Pe 37 physical rank 37 will use CUDA device of pe 32
Pe 13 physical rank 13 will use CUDA device of pe 16
Pe 32 physical rank 32 binding to CUDA device 3 on c22b80e21951: ‘GeForce RTX 2080 Ti’ Mem: 11019MB Rev: 7.5
Pe 24 physical rank 24 binding to CUDA device 2 on c22b80e21951: ‘GeForce RTX 2080 Ti’ Mem: 11019MB Rev: 7.5
Pe 8 physical rank 8 binding to CUDA device 0 on c22b80e21951: ‘TITAN RTX’ Mem: 24219MB Rev: 7.5
Pe 16 physical rank 16 binding to CUDA device 1 on c22b80e21951: ‘TITAN RTX’ Mem: 24220MB Rev: 7.5
Info: NAMD 2.12 for Linux-x86_64-multicore-CUDA-memopt
Warning:
Warning: *** EXPERIMENTAL MEMORY OPTIMIZED VERSION ***
Warning:
Info:
Info: Please visit http://www.ks.uiuc.edu/Research/namd/
Info: for updates, documentation, and support information.
Info:
Info: Please cite Phillips et al., J. Comp. Chem. 26:1781-1802 (2005)
Info: in all publications reporting results obtained with NAMD.
Info:
Info: Based on Charm++/Converse 60800 for multicore-linux64-gcc
Info: Built Tue Nov 21 02:03:10 UTC 2017 by on a02d2dbfe66b
Info: 1 NAMD 2.12 Linux-x86_64-multicore-CUDA-memopt 40 c22b80e21951 root
Info: Running on 40 processors, 1 nodes, 1 physical nodes.
Info: CPU topology information available.
Info: Charm++/Converse parallel runtime startup completed at 0.340343 s
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level notification) but not using node-level queue
Info: 39.5391 MB of memory in use based on /proc/self/stat
Info: Configuration file is /workspace/examples/stmv/stmv_pmecuda.namd
Info: Changed directory to /workspace/examples/stmv
TCL: Suspending until startup complete.
Info: SIMULATION PARAMETERS:
Info: TIMESTEP 1
Info: NUMBER OF STEPS 800
Info: STEPS PER CYCLE 20
Info: PERIODIC CELL BASIS 1 216.832 0 0
Info: PERIODIC CELL BASIS 2 0 216.832 0
Info: PERIODIC CELL BASIS 3 0 0 216.832
Info: PERIODIC CELL CENTER 0 0 0
Info: LOAD BALANCER Hybrid
Info: LOAD BALANCING STRATEGY New Load Balancers — DEFAULT
Info: LDB PERIOD 4000 steps
Info: FIRST LDB TIMESTEP 100
Info: HYBRIDLB GROUP SIZE 512
Info: LAST LDB TIMESTEP -1
Info: LDB BACKGROUND SCALING 1
Info: HOM BACKGROUND SCALING 1
Info: PME BACKGROUND SCALING 1
Info: MAX SELF PARTITIONS 1
Info: MAX PAIR PARTITIONS 1
Info: SELF PARTITION ATOMS 154
Info: SELF2 PARTITION ATOMS 154
Info: PAIR PARTITION ATOMS 318
Info: PAIR2 PARTITION ATOMS 637
Info: MIN ATOMS PER PATCH 40
Info: INITIAL TEMPERATURE 298
Info: CENTER OF MASS MOVING INITIALLY? NO
Info: DIELECTRIC 1
Info: EXCLUDE SCALED ONE-FOUR
Info: 1-4 ELECTROSTATICS SCALED BY 1
Info: MODIFIED 1-4 VDW PARAMETERS WILL BE USED
Info: NO DCD TRAJECTORY OUTPUT
Info: NO EXTENDED SYSTEM TRAJECTORY OUTPUT
Info: NO VELOCITY DCD OUTPUT
Info: NO FORCE DCD OUTPUT
Info: OUTPUT FILENAME /workspace/examples/stmv/stmv-output
Info: BINARY OUTPUT FILES WILL BE USED
Info: NO RESTART FILE
Info: SWITCHING ACTIVE
Info: SWITCHING ON 10
Info: SWITCHING OFF 12
Info: PAIRLIST DISTANCE 13.5
Info: PAIRLIST SHRINK RATE 0.01
Info: PAIRLIST GROW RATE 0.01
Info: PAIRLIST TRIGGER 0.3
Info: PAIRLISTS PER CYCLE 2
Info: PAIRLISTS ENABLED
Info: MARGIN 0.48
Info: HYDROGEN GROUP CUTOFF 2.5
Info: PATCH DIMENSION 16.48
Info: ENERGY OUTPUT STEPS 200
Info: CROSSTERM ENERGY INCLUDED IN DIHEDRAL
Info: TIMING OUTPUT STEPS 1
Info: LANGEVIN DYNAMICS ACTIVE
Info: LANGEVIN TEMPERATURE 298
Info: LANGEVIN USING BBK INTEGRATOR
Info: LANGEVIN DAMPING COEFFICIENT IS 5 INVERSE PS
Info: LANGEVIN DYNAMICS NOT APPLIED TO HYDROGENS
Info: LANGEVIN PISTON PRESSURE CONTROL ACTIVE
Info: TARGET PRESSURE IS 1.01325 BAR
Info: OSCILLATION PERIOD IS 200 FS
Info: DECAY TIME IS 100 FS
Info: PISTON TEMPERATURE IS 298 K
Info: PRESSURE CONTROL IS GROUP-BASED
Info: INITIAL STRAIN RATE IS 0 0 0
Info: CELL FLUCTUATION IS ISOTROPIC
Info: PARTICLE MESH EWALD (PME) ACTIVE
Info: PME TOLERANCE 1e-06
Info: PME EWALD COEFFICIENT 0.257952
Info: PME INTERPOLATION ORDER 8
Info: PME GRID DIMENSIONS 108 108 108
Info: PME MAXIMUM GRID SPACING 2.1
Info: FULL ELECTROSTATIC EVALUATION FREQUENCY 4
Info: USING VERLET I (r-RESPA) MTS SCHEME.
Info: C1 SPLITTING OF LONG RANGE ELECTROSTATICS
Info: PLACING ATOMS IN PATCHES BY HYDROGEN GROUPS
Info: RIGID BONDS TO HYDROGEN : ALL
Info: ERROR TOLERANCE : 1e-08
Info: MAX ITERATIONS : 100
Info: RIGID WATER USING SETTLE ALGORITHM
Info: NONBONDED FORCES EVALUATED EVERY 2 STEPS
Info: RANDOM NUMBER SEED 3141
Info: USE HYDROGEN BONDS? NO
Info: STRUCTURE FILE stmv.psf.inter
Info: PARAMETER file: CHARMM format!
Info: PARAMETERS par_all27_prot_na.inp
Info: USING ARITHMETIC MEAN TO COMBINE L-J SIGMA PARAMETERS
Info: BINARY COORDINATES stmv.coor
Info: SUMMARY OF PARAMETERS:
Info: 250 BONDS
Info: 622 ANGLES
Info: 1049 DIHEDRAL
Info: 73 IMPROPER
Info: 0 CROSSTERM
Info: 130 VDW
Info: 0 VDW_PAIRS
Info: 0 NBTHOLE_PAIRS
Info: TIME FOR READING PSF FILE: 0.0028851
Info:
Info: Entering startup at 0.359806 s, 85.5703 MB of memory in use
Info: Startup phase 0 took 8.01086e-05 s, 85.5703 MB of memory in use
Warning: an empty exclusion signature with index 709!
Info: Startup phase 1 took 0.000342846 s, 85.5703 MB of memory in use
Info: NONBONDED TABLE R-SQUARED SPACING: 0.0625
Info: NONBONDED TABLE SIZE: 769 POINTS
Info: INCONSISTENCY IN FAST TABLE ENERGY VS FORCE: 0.000325096 AT 11.9556
Info: INCONSISTENCY IN SCOR TABLE ENERGY VS FORCE: 0.000324844 AT 11.9556
Info: INCONSISTENCY IN VDWA TABLE ENERGY VS FORCE: 0.0040507 AT 0.251946
Info: INCONSISTENCY IN VDWB TABLE ENERGY VS FORCE: 0.00150189 AT 0.251946
Info: Running with 2 input processors.
Info: Running with 1 output processors (1 of them will output simultaneously).
Info: INPUT PROC LOCATIONS: 8 16
Info: OUTPUT PROC LOCATIONS: 32
Info: Startup phase 2 took 0.00796604 s, 241.273 MB of memory in use
Info: Startup phase 3 took 0.049921 s, 244.605 MB of memory in use
Info: PATCH GRID IS 13 (PERIODIC) BY 13 (PERIODIC) BY 13 (PERIODIC)
Info: PATCH GRID IS 1-AWAY BY 1-AWAY BY 1-AWAY
Info: LOADED 1810196 TOTAL EXCLUSIONS
Info: REMOVING COM VELOCITY -0.00436736 -0.0116608 0.0017952
Info: Startup phase 4 took 0.111289 s, 559.586 MB of memory in use
Info: ****************************
Info: STRUCTURE SUMMARY:
Info: 1066628 ATOMS
Info: 769956 BONDS
Info: 605872 ANGLES
Info: 450875 DIHEDRALS
Info: 24578 IMPROPERS
Info: 0 CROSSTERMS
Info: 0 EXCLUSIONS
Info: 977416 RIGID BONDS
Info: 2222468 DEGREES OF FREEDOM
Info: 389067 HYDROGEN GROUPS
Info: 4 ATOMS IN LARGEST HYDROGEN GROUP
Info: 389067 MIGRATION GROUPS
Info: 4 ATOMS IN LARGEST MIGRATION GROUP
Info: TOTAL MASS = 6.69877e+06 amu
Info: TOTAL CHARGE = 0.000168104 e
Info: MASS DENSITY = 1.09115 g/cm^3
Info: ATOM DENSITY = 0.104627 atoms/A^3
Info: *****************************
Info: LARGEST PATCH (1044) HAS 541 ATOMS
Info: Startup phase 5 took 0.0533659 s, 565.773 MB of memory in use
Info: TORUS A SIZE 1 USING 0
Info: TORUS B SIZE 1 USING 0
Info: TORUS C SIZE 1 USING 0
Info: TORUS MINIMAL MESH SIZE IS 1 BY 1 BY 1
Info: Placed 100% of base nodes on same physical node as patch
Info: Startup phase 6 took 0.00464511 s, 573.051 MB of memory in use
Info: PME using 1 x 1 x 1 pencil grid for FFT and reciprocal sum.
Info: Updated CUDA force table with 4096 elements.
Info: Updated CUDA LJ table with 130 x 130 elements.
Info: Updated CUDA force table with 4096 elements.
Info: Updated CUDA LJ table with 130 x 130 elements.
Info: Updated CUDA force table with 4096 elements.
Info: Updated CUDA LJ table with 130 x 130 elements.
Info: Updated CUDA force table with 4096 elements.
Info: Updated CUDA LJ table with 130 x 130 elements.
Info: Startup phase 7 took 3.33094 s, 1593.53 MB of memory in use
Info: Startup phase 8 took 0.00655293 s, 1595.91 MB of memory in use
LDB: Hybrid LB being created…
HybridBaseLB: ThreeLevelTree is created.
Info: Startup phase 9 took 0.0024569 s, 1595.91 MB of memory in use
Info: CREATING 46457 COMPUTE OBJECTS
Info: Found 333 unique exclusion lists needing 1076 bytes
Info: Found 333 unique exclusion lists needing 1076 bytes
Info: Found 333 unique exclusion lists needing 1076 bytes
Info: Found 333 unique exclusion lists needing 1076 bytes
Info: useSync: 0 useProxySync: 0
Info: Startup phase 10 took 0.068258 s, 1634.8 MB of memory in use
Info: Startup phase 11 took 0.000103951 s, 1634.8 MB of memory in use
Info: Startup phase 12 took 0.00266719 s, 1641.88 MB of memory in use
Info: Finished startup at 3.99839 s, 1641.88 MB of memory in use
中略
Info: Benchmark time: 40 CPUs 0.0229673 s/step 0.265825 days/ns 2834.73 MB memory
←days/ns, Less Is Better

カテゴリー: centos8, nvidia パーマリンク

コメントを残す

メールアドレスが公開されることはありません。 が付いている欄は必須項目です