[chibi@rhel8 ~]$ sudo nvidia-docker run --rm -ti nvcr.io/nvidia/tensorflow:19.04-py3 [sudo] chibi のパスワード: Unable to find image 'nvcr.io/nvidia/tensorflow:19.04-py3' locally 19.04-py3: Pulling from nvidia/tensorflow 34667c7e4631: Pulling fs layer d18d76a881a4: Pulling fs layer 119c7358fbfc: Pulling fs layer 2aaf13f3eff0: Pulling fs layer 202fa0f8874b: Pulling fs layer 3b700a61ede6: Pulling fs layer 87e6ca450d3f: Pulling fs layer a1e76dce1aec: Pulling fs layer 9b91fa2f9276: Pulling fs layer b5877a9add73: Pulling fs layer bab74df105f1: Pulling fs layer 534bbf505504: Pulling fs layer 4956bf3bbbb9: Waiting f4371944c97d: Waiting 4615a735431d: Waiting 5db2639932b5: Pulling fs layer 5db2639932b5: Waiting 8071b94b5429: Pulling fs layer 6eb8eba2ad5a: Waiting e32e86c15b8b: Waiting 08db5b51b243: Waiting 8071b94b5429: Waiting f71ce95fb406: Waiting 62819d8896c1: Pulling fs layer 2aaf13f3eff0: Waiting 34bc85bf8bef: Waiting 41bc2d0a4d4d: Waiting a2ceadc61854: Waiting 2d0c5308ff92: Waiting a531832992b8: Waiting b24a8fd8f2e1: Pulling fs layer 8d9313624ab7: Waiting e5cafe011f22: Pull complete eca19a329cd4: Pull complete 65ee50af0bcc: Pull complete 5f60ec8c32f4: Pull complete d7dcb657fa13: Pull complete 1f6ef6575fbe: Pull complete d1ef346a3015: Pull complete 4ef9cb404fd5: Pull complete f6797f45a018: Pull complete 1d4380527325: Pull complete 965f2629db02: Pull complete 5debff4c8c0a: Pull complete b3a3a9d82be6: Pull complete eac05f20b729: Pull complete 3ce0a7f80167: Pull complete 2a21e34a5784: Pull complete c1ccf19e258e: Pull complete 0b6ea9d0652b: Pull complete 307bc8c3f024: Pull complete ca75fd593a79: Pull complete 0cd3cdca1af7: Pull complete 48e857e9d372: Pull complete 3264ea403ca9: Pull complete Digest: sha256:aaebc136d5d50937362675c77afd908bd96cded68846f39163050a023c8a9851 Status: Downloaded newer image for nvcr.io/nvidia/tensorflow:19.04-py3 ================ == TensorFlow == ================ NVIDIA Release 19.04 (build 6132408) TensorFlow Version 1.13.1 Container image Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved. Copyright 2017-2019 The TensorFlow Authors. All rights reserved. Various files include modifications (c) NVIDIA CORPORATION. All rights reserved. NVIDIA modifications are covered by the license terms that apply to the underlying project or file. NOTE: MOFED driver for multi-node communication was not detected. Multi-node communication performance may be reduced. NOTE: The SHMEM allocation limit is set to the default of 64MB. This may be insufficient for TensorFlow. NVIDIA recommends the use of the following flags: nvidia-docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 ... root@dc8a73b89477:/workspace# ls README.md docker-examples nvidia-examples root@dc8a73b89477:/workspace# cd nvidia-examples root@dc8a73b89477:/workspace/nvidia-examples# ls NCF bert cnn ssdv1.2 OpenSeq2Seq big_lstm gnmt_v2 tensorrt UNet_Industrial build_imagenet_data resnet50v1.5 root@dc8a73b89477:/workspace/nvidia-examples# cd big_lstm root@dc8a73b89477:/workspace/nvidia-examples/big_lstm# ls 1b_word_vocab.txt data_utils_test.py language_model_test.py README.md download_1b_words_data.sh model_utils.py __init__.py hparams.py run_utils.py common.py hparams_test.py single_lm_train.py data_utils.py language_model.py testdata root@dc8a73b89477:/workspace/nvidia-examples/big_lstm# ./download_1b_words_data.sh Please specify root of dataset directory: data Success: dataset root dir validated --2020-05-28 18:34:45-- http://www.statmt.org/lm-benchmark/1-billion-word-language-modeling-benchmark-r13output.tar.gz Resolving www.statmt.org (www.statmt.org)... 129.215.197.184 Connecting to www.statmt.org (www.statmt.org)|129.215.197.184|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 1792209805 (1.7G) [application/x-gzip] Saving to: ‘1-billion-word-language-modeling-benchmark-r13output.tar.gz’ 1-billion-word-lang 100%[===================>] 1.67G 414KB/s in 88m 2s 2020-05-28 20:02:48 (331 KB/s) - ‘1-billion-word-language-modeling-benchmark-r13output.tar.gz’ saved [1792209805/1792209805] 1-billion-word-language-modeling-benchmark-r13output/ 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/ 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00024-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00057-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00055-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00096-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00081-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00033-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00072-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00082-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00018-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00008-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00059-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00005-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00091-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00062-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00031-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00095-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00076-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00006-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00038-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00015-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00087-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00021-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00049-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00009-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00027-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00056-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00046-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00032-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00029-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00088-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00085-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00011-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00012-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00067-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00003-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00093-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00050-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00053-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00044-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00019-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00066-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00028-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00045-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00039-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00071-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00052-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00078-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00037-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00002-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00014-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00048-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00017-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00004-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00077-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00080-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00020-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00051-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00016-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00079-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00043-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00068-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00099-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00064-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00034-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00054-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00040-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00070-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00063-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00041-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00083-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00061-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00073-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00094-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00030-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00060-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00035-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00023-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00042-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00025-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00090-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00089-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00065-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00075-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00022-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00026-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00098-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00084-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00010-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00069-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00013-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00092-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00036-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00097-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00007-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00074-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00001-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00047-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00086-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00058-of-00100 1-billion-word-language-modeling-benchmark-r13output/.svn/ 1-billion-word-language-modeling-benchmark-r13output/.svn/tmp/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/de/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/de/de102cd0c91cd19e6612f0840e68a2f20ba8134c.svn-base 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/de/deed1b75d3bd5cc36ae6aeb85d56680b892b7948.svn-base 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/86/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/86/86c58db52fbf362c5bc329afc33b8805085fcb0d.svn-base 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/9f/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/9f/9f2882e21f860a83ad6ea8898ebab140974ed301.svn-base 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/bc/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/bc/bcdbc523ee7488dc438cab869b6d5e236578dbfa.svn-base 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/d2/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/d2/d2718bc26d0ee0a213d7d4add99a304cb5b39ede.svn-base 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/c5/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/c5/c5b24f61479da923123d0394a188da922ea0359c.svn-base 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/11/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/11/116d6ea61730d8199127596b072e981338597779.svn-base 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/b0/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/b0/b0e26559cfe641245584a9400b35ba28d64f1411.svn-base 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/d3/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/d3/d3ae508e3bcb0e696dd70aecd052410f1f7afc1d.svn-base 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/9e/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/9e/9e148bd766e8805e0eb97eeae250433ec7a2e996.svn-base 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/31/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/31/31b645a482e0b81fda3c567cada307c6fcf7ec80.svn-base 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/da/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/da/da39a3ee5e6b4b0d3255bfef95601890afd80709.svn-base 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/c1/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/c1/c1ed42c415ec884e591fb5c70d373da640a383b5.svn-base 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/e3/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/e3/e37ba0f85e94073ccaced1eed7e4f5d737a25f49.svn-base 1-billion-word-language-modeling-benchmark-r13output/.svn/entries 1-billion-word-language-modeling-benchmark-r13output/.svn/format 1-billion-word-language-modeling-benchmark-r13output/.svn/wc.db 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/ 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00015-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00031-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00027-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00010-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00033-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00042-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00046-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00037-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00000-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00029-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00013-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00002-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00048-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00006-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00030-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00025-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00039-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00008-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00020-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00001-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00034-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00044-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00045-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00016-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00004-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00035-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00038-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00009-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00024-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00022-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00021-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00032-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00011-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00049-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00041-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00019-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00023-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00040-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00014-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00007-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00017-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00012-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00018-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00003-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00028-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en-00000-of-00100 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00043-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00005-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00036-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00026-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00047-of-00050 1-billion-word-language-modeling-benchmark-r13output/README Success! One billion words dataset ready at: data/1-billion-word-language-modeling-benchmark-r13output/ Please pass this dir to single_lm_train.py via the --datadir option. root@dc8a73b89477:/workspace/nvidia-examples/big_lstm# time python single_lm_train.py --mode=train --logdir=./logs --num_gpus=4 --datadir=./data/1-billion-word-language-modeling-benchmark-r13output WARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0. For more information, please see: * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md * https://github.com/tensorflow/addons If you depend on functionality not listed there, please file an issue. *****HYPER PARAMETERS***** {'num_layers': 1, 'num_gpus': 4, 'num_steps': 20, 'batch_size': 128, 'num_delayed_steps': 150, 'num_shards': 8, 'state_size': 2048, 'num_sampled': 8192, 'max_time': 180, 'optimizer': 0, 'max_grad_norm': 10.0, 'do_summaries': False, 'average_params': True, 'emb_size': 512, 'keep_prob': 0.9, 'projected_size': 512, 'run_profiler': False, 'learning_rate': 0.2, 'vocab_size': 793470} ************************** WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. WARNING:tensorflow:From /opt/tensorflow/nvidia-examples/big_lstm/model_utils.py:33: UniformUnitScaling.__init__ (from tensorflow.python.ops.init_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.initializers.variance_scaling instead with distribution=uniform to get equivalent behavior. WARNING:tensorflow:From /opt/tensorflow/nvidia-examples/big_lstm/language_model.py:75: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version. Instructions for updating: Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`. WARNING:tensorflow:From /opt/tensorflow/nvidia-examples/big_lstm/language_model.py:107: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/nn_impl.py:1444: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version. Instructions for updating: Create a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead. WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/array_grad.py:425: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. Current time: 1590696229.3037004 ALL VARIABLES WARNING:tensorflow:From /opt/tensorflow/nvidia-examples/big_lstm/run_utils.py:18: all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02. Instructions for updating: Please use tf.global_variables instead. model/emb_0:0 (99184, 512) /gpu:0 model/emb_1:0 (99184, 512) /gpu:0 model/emb_2:0 (99184, 512) /gpu:0 model/emb_3:0 (99184, 512) /gpu:0 model/emb_4:0 (99184, 512) /gpu:0 model/emb_5:0 (99184, 512) /gpu:0 model/emb_6:0 (99184, 512) /gpu:0 model/emb_7:0 (99184, 512) /gpu:0 model/lstm_0/LSTMCell/W_0:0 (1024, 8192) /gpu:0 model/lstm_0/LSTMCell/B:0 (8192,) /gpu:0 model/lstm_0/LSTMCell/W_P_0:0 (2048, 512) /gpu:0 model/softmax_w_0:0 (99184, 512) /gpu:0 model/softmax_w_1:0 (99184, 512) /gpu:0 model/softmax_w_2:0 (99184, 512) /gpu:0 model/softmax_w_3:0 (99184, 512) /gpu:0 model/softmax_w_4:0 (99184, 512) /gpu:0 model/softmax_w_5:0 (99184, 512) /gpu:0 model/softmax_w_6:0 (99184, 512) /gpu:0 model/softmax_w_7:0 (99184, 512) /gpu:0 model/softmax_b:0 (793470,) /gpu:0 model/global_step:0 () model/model/emb_0/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_1/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_2/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_3/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_4/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_5/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_6/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_7/Adagrad:0 (99184, 512) /gpu:0 model/model/lstm_0/LSTMCell/W_0/Adagrad:0 (1024, 8192) /gpu:0 model/model/lstm_0/LSTMCell/B/Adagrad:0 (8192,) /gpu:0 model/model/lstm_0/LSTMCell/W_P_0/Adagrad:0 (2048, 512) /gpu:0 model/model/softmax_w_0/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_1/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_2/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_3/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_4/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_5/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_6/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_7/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_b/Adagrad:0 (793470,) /gpu:0 model/model/lstm_0/LSTMCell/W_0/ExponentialMovingAverage:0 (1024, 8192) /gpu:0 model/model/lstm_0/LSTMCell/B/ExponentialMovingAverage:0 (8192,) /gpu:0 model/model/lstm_0/LSTMCell/W_P_0/ExponentialMovingAverage:0 (2048, 512) /gpu:0 TRAINABLE VARIABLES model/emb_0:0 (99184, 512) /gpu:0 model/emb_1:0 (99184, 512) /gpu:0 model/emb_2:0 (99184, 512) /gpu:0 model/emb_3:0 (99184, 512) /gpu:0 model/emb_4:0 (99184, 512) /gpu:0 model/emb_5:0 (99184, 512) /gpu:0 model/emb_6:0 (99184, 512) /gpu:0 model/emb_7:0 (99184, 512) /gpu:0 model/lstm_0/LSTMCell/W_0:0 (1024, 8192) /gpu:0 model/lstm_0/LSTMCell/B:0 (8192,) /gpu:0 model/lstm_0/LSTMCell/W_P_0:0 (2048, 512) /gpu:0 model/softmax_w_0:0 (99184, 512) /gpu:0 model/softmax_w_1:0 (99184, 512) /gpu:0 model/softmax_w_2:0 (99184, 512) /gpu:0 model/softmax_w_3:0 (99184, 512) /gpu:0 model/softmax_w_4:0 (99184, 512) /gpu:0 model/softmax_w_5:0 (99184, 512) /gpu:0 model/softmax_w_6:0 (99184, 512) /gpu:0 model/softmax_w_7:0 (99184, 512) /gpu:0 model/softmax_b:0 (793470,) /gpu:0 LOCAL VARIABLES model/model/state_0_0:0 (128, 2560) /gpu:0 model/model_1/state_1_0:0 (128, 2560) /gpu:1 model/model_2/state_2_0:0 (128, 2560) /gpu:2 model/model_3/state_3_0:0 (128, 2560) /gpu:3 WARNING:tensorflow:From /opt/tensorflow/nvidia-examples/big_lstm/run_utils.py:32: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.train.MonitoredTrainingSession 2020-05-28 20:03:50.149479: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 1996320000 Hz 2020-05-28 20:03:50.157990: I tensorflow/compiler/xla/service/service.cc:161] XLA service 0xbb69c10 executing computations on platform Host. Devices: 2020-05-28 20:03:50.158049: I tensorflow/compiler/xla/service/service.cc:168] StreamExecutor device (0): , 2020-05-28 20:03:50.698264: I tensorflow/compiler/xla/service/service.cc:161] XLA service 0xb6b19f0 executing computations on platform CUDA. Devices: 2020-05-28 20:03:50.698307: I tensorflow/compiler/xla/service/service.cc:168] StreamExecutor device (0): TITAN RTX, Compute Capability 7.5 2020-05-28 20:03:50.698318: I tensorflow/compiler/xla/service/service.cc:168] StreamExecutor device (1): TITAN RTX, Compute Capability 7.5 2020-05-28 20:03:50.698327: I tensorflow/compiler/xla/service/service.cc:168] StreamExecutor device (2): GeForce RTX 2080 Ti, Compute Capability 7.5 2020-05-28 20:03:50.698387: I tensorflow/compiler/xla/service/service.cc:168] StreamExecutor device (3): GeForce RTX 2080 Ti, Compute Capability 7.5 2020-05-28 20:03:50.700065: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: TITAN RTX major: 7 minor: 5 memoryClockRate(GHz): 1.77 pciBusID: 0000:41:00.0 totalMemory: 23.65GiB freeMemory: 23.48GiB 2020-05-28 20:03:50.700100: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 1 with properties: name: TITAN RTX major: 7 minor: 5 memoryClockRate(GHz): 1.77 pciBusID: 0000:61:00.0 totalMemory: 23.65GiB freeMemory: 23.48GiB 2020-05-28 20:03:50.700126: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 2 with properties: name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.635 pciBusID: 0000:04:00.0 totalMemory: 10.76GiB freeMemory: 10.37GiB 2020-05-28 20:03:50.700158: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 3 with properties: name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.635 pciBusID: 0000:21:00.0 totalMemory: 10.76GiB freeMemory: 10.60GiB 2020-05-28 20:03:50.700322: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0, 1, 2, 3 2020-05-28 20:03:51.674657: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2020-05-28 20:03:51.674707: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 1 2 3 2020-05-28 20:03:51.674714: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N N N N 2020-05-28 20:03:51.674718: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 1: N N N N 2020-05-28 20:03:51.674722: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 2: N N N N 2020-05-28 20:03:51.674726: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 3: N N N N 2020-05-28 20:03:51.674891: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 22757 MB memory) -> physical GPU (device: 0, name: TITAN RTX, pci bus id: 0000:41:00.0, compute capability: 7.5) 2020-05-28 20:03:51.675309: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 22757 MB memory) -> physical GPU (device: 1, name: TITAN RTX, pci bus id: 0000:61:00.0, compute capability: 7.5) 2020-05-28 20:03:51.675521: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 10004 MB memory) -> physical GPU (device: 2, name: GeForce RTX 2080 Ti, pci bus id: 0000:04:00.0, compute capability: 7.5) 2020-05-28 20:03:51.675842: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 10224 MB memory) -> physical GPU (device: 3, name: GeForce RTX 2080 Ti, pci bus id: 0000:21:00.0, compute capability: 7.5) Processing file: ./data/1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00042-of-00100 Finished processing! 2020-05-28 20:04:15.877534: I tensorflow/stream_executor/dso_loader.cc:153] successfully opened CUDA library libcublas.so.10 locally Iteration 1, time = 12.44s, wps = 823, train loss = 12.9932 Iteration 2, time = 9.66s, wps = 1060, train loss = 12.9346 Iteration 3, time = 0.12s, wps = 87759, train loss = 12.8111 Iteration 4, time = 0.11s, wps = 94486, train loss = 12.8659 Iteration 5, time = 0.11s, wps = 94307, train loss = 12.6673 Iteration 6, time = 0.10s, wps = 100545, train loss = 11.1822 Iteration 7, time = 0.11s, wps = 92386, train loss = 84.5649 Iteration 8, time = 0.11s, wps = 91389, train loss = 49.7504 Iteration 9, time = 0.10s, wps = 98697, train loss = 19.4051 Iteration 20, time = 1.22s, wps = 92617, train loss = 9.6552 Iteration 40, time = 2.12s, wps = 96745, train loss = 8.8411 Iteration 60, time = 2.18s, wps = 93932, train loss = 8.8114 Iteration 80, time = 2.23s, wps = 91776, train loss = 8.5576 Iteration 100, time = 2.24s, wps = 91244, train loss = 7.6533 Iteration 120, time = 2.20s, wps = 92885, train loss = 7.6763 Iteration 140, time = 2.16s, wps = 94660, train loss = 7.3734 Iteration 160, time = 2.13s, wps = 96350, train loss = 7.0701 Iteration 180, time = 2.23s, wps = 91854, train loss = 6.9310 Iteration 200, time = 2.24s, wps = 91455, train loss = 6.7095 Iteration 220, time = 2.20s, wps = 92911, train loss = 6.4997 Iteration 240, time = 2.22s, wps = 92068, train loss = 6.3318 Iteration 260, time = 2.29s, wps = 89608, train loss = 6.3565 Iteration 280, time = 2.17s, wps = 94286, train loss = 6.2229 Iteration 300, time = 2.21s, wps = 92606, train loss = 6.0949 Iteration 320, time = 2.19s, wps = 93666, train loss = 6.0544 Iteration 340, time = 2.18s, wps = 94064, train loss = 6.0039 Iteration 360, time = 2.16s, wps = 95001, train loss = 6.0821 Iteration 380, time = 2.12s, wps = 96577, train loss = 5.9918 Iteration 400, time = 2.19s, wps = 93473, train loss = 5.8902 Iteration 420, time = 2.16s, wps = 94824, train loss = 5.7979 Iteration 440, time = 2.22s, wps = 92222, train loss = 5.9237 Iteration 460, time = 2.19s, wps = 93710, train loss = 5.8556 Iteration 480, time = 2.19s, wps = 93414, train loss = 5.6904 Iteration 500, time = 2.17s, wps = 94213, train loss = 5.8135 Iteration 520, time = 2.17s, wps = 94288, train loss = 5.7154 Iteration 540, time = 2.25s, wps = 91123, train loss = 5.7460 Iteration 560, time = 2.21s, wps = 92593, train loss = 5.7053 Iteration 580, time = 2.18s, wps = 94091, train loss = 5.6911 Iteration 600, time = 2.16s, wps = 94660, train loss = 5.6143 Iteration 620, time = 2.18s, wps = 93839, train loss = 5.5928 Iteration 640, time = 2.21s, wps = 92601, train loss = 5.4966 Iteration 660, time = 2.12s, wps = 96456, train loss = 5.5977 Iteration 680, time = 2.18s, wps = 93940, train loss = 5.5815 Iteration 700, time = 2.19s, wps = 93355, train loss = 5.5538 Iteration 720, time = 2.22s, wps = 92140, train loss = 5.5204 Iteration 740, time = 2.25s, wps = 90968, train loss = 5.4444 Iteration 760, time = 2.16s, wps = 94877, train loss = 5.4845 Iteration 780, time = 2.18s, wps = 94102, train loss = 5.4815 Processing file: ./data/1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00001-of-00100 Finished processing! Iteration 800, time = 4.22s, wps = 48567, train loss = 5.4506 Iteration 820, time = 2.20s, wps = 93154, train loss = 5.5089 Iteration 840, time = 2.19s, wps = 93613, train loss = 5.3722 Iteration 860, time = 2.22s, wps = 92170, train loss = 5.3514 Iteration 880, time = 2.23s, wps = 92007, train loss = 5.3561 Iteration 900, time = 2.20s, wps = 93108, train loss = 5.4207 Iteration 920, time = 2.15s, wps = 95384, train loss = 5.3285 Iteration 940, time = 2.16s, wps = 94656, train loss = 5.3391 Iteration 960, time = 2.25s, wps = 91064, train loss = 5.3486 Iteration 980, time = 2.24s, wps = 91449, train loss = 5.2946 Iteration 1000, time = 2.19s, wps = 93490, train loss = 5.3301 Iteration 1020, time = 2.19s, wps = 93631, train loss = 5.2767 Iteration 1040, time = 2.21s, wps = 92574, train loss = 5.2243 Iteration 1060, time = 2.19s, wps = 93343, train loss = 5.2153 Iteration 1080, time = 2.25s, wps = 90937, train loss = 5.2600 Iteration 1100, time = 2.30s, wps = 88973, train loss = 5.1437 Iteration 1120, time = 2.20s, wps = 92939, train loss = 5.2075 Iteration 1140, time = 2.17s, wps = 94165, train loss = 5.2287 Iteration 1160, time = 2.16s, wps = 94727, train loss = 5.1323 Iteration 1180, time = 2.24s, wps = 91343, train loss = 5.2006 Iteration 1200, time = 2.19s, wps = 93582, train loss = 5.0496 Iteration 1220, time = 2.19s, wps = 93677, train loss = 5.1200 Iteration 1240, time = 2.29s, wps = 89329, train loss = 5.1383 Iteration 1260, time = 2.21s, wps = 92662, train loss = 5.1366 Iteration 1280, time = 2.21s, wps = 92580, train loss = 5.1447 /usr/local/lib/python3.5/dist-packages/tensorflow/python/summary/writer/writer.py:386: UserWarning: Attempting to use a closed FileWriter. The operation will be a noop unless the FileWriter is explicitly reopened. warnings.warn("Attempting to use a closed FileWriter. " real 3m17.299s user 28m15.559s sys 7m4.973s root@dc8a73b89477:/workspace/nvidia-examples/big_lstm# time python single_lm_train.py --mode=train --logdir=./logs --num_gpus=4 --datadir=./data/1-billion-word-language-modeling-benchmark-r13output WARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0. For more information, please see: * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md * https://github.com/tensorflow/addons If you depend on functionality not listed there, please file an issue. *****HYPER PARAMETERS***** {'emb_size': 512, 'num_delayed_steps': 150, 'num_gpus': 4, 'batch_size': 128, 'average_params': True, 'max_grad_norm': 10.0, 'num_steps': 20, 'max_time': 180, 'run_profiler': False, 'state_size': 2048, 'vocab_size': 793470, 'do_summaries': False, 'projected_size': 512, 'num_layers': 1, 'optimizer': 0, 'keep_prob': 0.9, 'learning_rate': 0.2, 'num_sampled': 8192, 'num_shards': 8} ************************** WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. WARNING:tensorflow:From /opt/tensorflow/nvidia-examples/big_lstm/model_utils.py:33: UniformUnitScaling.__init__ (from tensorflow.python.ops.init_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.initializers.variance_scaling instead with distribution=uniform to get equivalent behavior. WARNING:tensorflow:From /opt/tensorflow/nvidia-examples/big_lstm/language_model.py:75: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version. Instructions for updating: Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`. WARNING:tensorflow:From /opt/tensorflow/nvidia-examples/big_lstm/language_model.py:107: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/nn_impl.py:1444: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version. Instructions for updating: Create a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead. WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/array_grad.py:425: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. Current time: 1590696852.190737 ALL VARIABLES WARNING:tensorflow:From /opt/tensorflow/nvidia-examples/big_lstm/run_utils.py:18: all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02. Instructions for updating: Please use tf.global_variables instead. model/emb_0:0 (99184, 512) /gpu:0 model/emb_1:0 (99184, 512) /gpu:0 model/emb_2:0 (99184, 512) /gpu:0 model/emb_3:0 (99184, 512) /gpu:0 model/emb_4:0 (99184, 512) /gpu:0 model/emb_5:0 (99184, 512) /gpu:0 model/emb_6:0 (99184, 512) /gpu:0 model/emb_7:0 (99184, 512) /gpu:0 model/lstm_0/LSTMCell/W_0:0 (1024, 8192) /gpu:0 model/lstm_0/LSTMCell/B:0 (8192,) /gpu:0 model/lstm_0/LSTMCell/W_P_0:0 (2048, 512) /gpu:0 model/softmax_w_0:0 (99184, 512) /gpu:0 model/softmax_w_1:0 (99184, 512) /gpu:0 model/softmax_w_2:0 (99184, 512) /gpu:0 model/softmax_w_3:0 (99184, 512) /gpu:0 model/softmax_w_4:0 (99184, 512) /gpu:0 model/softmax_w_5:0 (99184, 512) /gpu:0 model/softmax_w_6:0 (99184, 512) /gpu:0 model/softmax_w_7:0 (99184, 512) /gpu:0 model/softmax_b:0 (793470,) /gpu:0 model/global_step:0 () model/model/emb_0/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_1/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_2/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_3/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_4/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_5/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_6/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_7/Adagrad:0 (99184, 512) /gpu:0 model/model/lstm_0/LSTMCell/W_0/Adagrad:0 (1024, 8192) /gpu:0 model/model/lstm_0/LSTMCell/B/Adagrad:0 (8192,) /gpu:0 model/model/lstm_0/LSTMCell/W_P_0/Adagrad:0 (2048, 512) /gpu:0 model/model/softmax_w_0/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_1/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_2/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_3/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_4/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_5/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_6/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_7/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_b/Adagrad:0 (793470,) /gpu:0 model/model/lstm_0/LSTMCell/W_0/ExponentialMovingAverage:0 (1024, 8192) /gpu:0 model/model/lstm_0/LSTMCell/B/ExponentialMovingAverage:0 (8192,) /gpu:0 model/model/lstm_0/LSTMCell/W_P_0/ExponentialMovingAverage:0 (2048, 512) /gpu:0 TRAINABLE VARIABLES model/emb_0:0 (99184, 512) /gpu:0 model/emb_1:0 (99184, 512) /gpu:0 model/emb_2:0 (99184, 512) /gpu:0 model/emb_3:0 (99184, 512) /gpu:0 model/emb_4:0 (99184, 512) /gpu:0 model/emb_5:0 (99184, 512) /gpu:0 model/emb_6:0 (99184, 512) /gpu:0 model/emb_7:0 (99184, 512) /gpu:0 model/lstm_0/LSTMCell/W_0:0 (1024, 8192) /gpu:0 model/lstm_0/LSTMCell/B:0 (8192,) /gpu:0 model/lstm_0/LSTMCell/W_P_0:0 (2048, 512) /gpu:0 model/softmax_w_0:0 (99184, 512) /gpu:0 model/softmax_w_1:0 (99184, 512) /gpu:0 model/softmax_w_2:0 (99184, 512) /gpu:0 model/softmax_w_3:0 (99184, 512) /gpu:0 model/softmax_w_4:0 (99184, 512) /gpu:0 model/softmax_w_5:0 (99184, 512) /gpu:0 model/softmax_w_6:0 (99184, 512) /gpu:0 model/softmax_w_7:0 (99184, 512) /gpu:0 model/softmax_b:0 (793470,) /gpu:0 LOCAL VARIABLES model/model/state_0_0:0 (128, 2560) /gpu:0 model/model_1/state_1_0:0 (128, 2560) /gpu:1 model/model_2/state_2_0:0 (128, 2560) /gpu:2 model/model_3/state_3_0:0 (128, 2560) /gpu:3 WARNING:tensorflow:From /opt/tensorflow/nvidia-examples/big_lstm/run_utils.py:32: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.train.MonitoredTrainingSession 2020-05-28 20:14:13.026535: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 1996320000 Hz 2020-05-28 20:14:13.034745: I tensorflow/compiler/xla/service/service.cc:161] XLA service 0xae04610 executing computations on platform Host. Devices: 2020-05-28 20:14:13.034799: I tensorflow/compiler/xla/service/service.cc:168] StreamExecutor device (0): , 2020-05-28 20:14:13.597066: I tensorflow/compiler/xla/service/service.cc:161] XLA service 0xc59c260 executing computations on platform CUDA. Devices: 2020-05-28 20:14:13.597110: I tensorflow/compiler/xla/service/service.cc:168] StreamExecutor device (0): TITAN RTX, Compute Capability 7.5 2020-05-28 20:14:13.597124: I tensorflow/compiler/xla/service/service.cc:168] StreamExecutor device (1): TITAN RTX, Compute Capability 7.5 2020-05-28 20:14:13.597136: I tensorflow/compiler/xla/service/service.cc:168] StreamExecutor device (2): GeForce RTX 2080 Ti, Compute Capability 7.5 2020-05-28 20:14:13.597150: I tensorflow/compiler/xla/service/service.cc:168] StreamExecutor device (3): GeForce RTX 2080 Ti, Compute Capability 7.5 2020-05-28 20:14:13.598525: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: TITAN RTX major: 7 minor: 5 memoryClockRate(GHz): 1.77 pciBusID: 0000:41:00.0 totalMemory: 23.65GiB freeMemory: 23.48GiB 2020-05-28 20:14:13.598560: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 1 with properties: name: TITAN RTX major: 7 minor: 5 memoryClockRate(GHz): 1.77 pciBusID: 0000:61:00.0 totalMemory: 23.65GiB freeMemory: 23.48GiB 2020-05-28 20:14:13.598586: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 2 with properties: name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.635 pciBusID: 0000:04:00.0 totalMemory: 10.76GiB freeMemory: 10.37GiB 2020-05-28 20:14:13.598613: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 3 with properties: name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.635 pciBusID: 0000:21:00.0 totalMemory: 10.76GiB freeMemory: 10.60GiB 2020-05-28 20:14:13.598780: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0, 1, 2, 3 2020-05-28 20:14:14.575989: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2020-05-28 20:14:14.576038: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 1 2 3 2020-05-28 20:14:14.576044: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N N N N 2020-05-28 20:14:14.576048: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 1: N N N N 2020-05-28 20:14:14.576053: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 2: N N N N 2020-05-28 20:14:14.576059: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 3: N N N N 2020-05-28 20:14:14.576218: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 22757 MB memory) -> physical GPU (device: 0, name: TITAN RTX, pci bus id: 0000:41:00.0, compute capability: 7.5) 2020-05-28 20:14:14.576686: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 22757 MB memory) -> physical GPU (device: 1, name: TITAN RTX, pci bus id: 0000:61:00.0, compute capability: 7.5) 2020-05-28 20:14:14.576885: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 10004 MB memory) -> physical GPU (device: 2, name: GeForce RTX 2080 Ti, pci bus id: 0000:04:00.0, compute capability: 7.5) 2020-05-28 20:14:14.577218: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 10224 MB memory) -> physical GPU (device: 3, name: GeForce RTX 2080 Ti, pci bus id: 0000:21:00.0, compute capability: 7.5) WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py:1266: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version. Instructions for updating: Use standard file APIs to check for files with this prefix. WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py:1070: get_checkpoint_mtimes (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version. Instructions for updating: Use standard file utilities to get mtimes. Processing file: ./data/1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00093-of-00100 Finished processing! 2020-05-28 20:14:34.197812: I tensorflow/stream_executor/dso_loader.cc:153] successfully opened CUDA library libcublas.so.10 locally Iteration 1282, time = 12.47s, wps = 821, train loss = 5.5932 Iteration 1283, time = 9.69s, wps = 1057, train loss = 5.3706 Iteration 1284, time = 0.13s, wps = 76223, train loss = 5.2951 Iteration 1285, time = 0.12s, wps = 82817, train loss = 5.2899 Iteration 1286, time = 0.13s, wps = 78196, train loss = 5.1537 Iteration 1287, time = 0.12s, wps = 83713, train loss = 5.0601 Iteration 1288, time = 0.12s, wps = 85608, train loss = 5.1372 Iteration 1289, time = 0.13s, wps = 81639, train loss = 5.1309 Iteration 1290, time = 0.12s, wps = 84725, train loss = 5.1538 Iteration 1301, time = 1.33s, wps = 84379, train loss = 5.0990 Iteration 1321, time = 2.43s, wps = 84415, train loss = 5.0791 Iteration 1341, time = 2.45s, wps = 83492, train loss = 5.0961 Iteration 1361, time = 2.48s, wps = 82525, train loss = 5.0837 Iteration 1381, time = 2.49s, wps = 82148, train loss = 5.0479 Iteration 1401, time = 2.48s, wps = 82560, train loss = 5.1369 Iteration 1421, time = 2.48s, wps = 82583, train loss = 5.0562 Iteration 1441, time = 2.47s, wps = 82822, train loss = 5.0000 Iteration 1461, time = 2.46s, wps = 83250, train loss = 5.0067 Iteration 1481, time = 2.48s, wps = 82421, train loss = 5.0704 Iteration 1501, time = 2.48s, wps = 82540, train loss = 4.9547 Iteration 1521, time = 2.47s, wps = 82769, train loss = 5.0144 Iteration 1541, time = 2.47s, wps = 82838, train loss = 4.9846 Iteration 1561, time = 2.48s, wps = 82713, train loss = 4.9756 Iteration 1581, time = 2.48s, wps = 82422, train loss = 4.9777 Iteration 1601, time = 2.53s, wps = 80838, train loss = 4.9535 Iteration 1621, time = 2.50s, wps = 81937, train loss = 4.8891 Iteration 1641, time = 2.46s, wps = 83213, train loss = 4.8758 Iteration 1661, time = 2.50s, wps = 81810, train loss = 4.8969 Iteration 1681, time = 2.50s, wps = 82066, train loss = 4.9042 Iteration 1701, time = 2.48s, wps = 82742, train loss = 4.9104 Iteration 1721, time = 2.50s, wps = 82061, train loss = 4.9082 Iteration 1741, time = 2.49s, wps = 82104, train loss = 4.8209 Iteration 1761, time = 2.49s, wps = 82290, train loss = 4.9004 Iteration 1781, time = 2.49s, wps = 82367, train loss = 4.9406 Iteration 1801, time = 2.48s, wps = 82590, train loss = 4.8976 Iteration 1821, time = 2.49s, wps = 82108, train loss = 4.8394 Iteration 1841, time = 2.51s, wps = 81541, train loss = 4.8744 Iteration 1861, time = 2.50s, wps = 82051, train loss = 4.8893 Iteration 1881, time = 2.49s, wps = 82316, train loss = 4.8230 Iteration 1901, time = 2.50s, wps = 81879, train loss = 4.8001 Iteration 1921, time = 2.50s, wps = 82030, train loss = 4.8108 Iteration 1941, time = 2.49s, wps = 82337, train loss = 4.8139 Iteration 1961, time = 2.47s, wps = 82968, train loss = 4.7781 Iteration 1981, time = 2.52s, wps = 81118, train loss = 4.8373 Iteration 2001, time = 2.49s, wps = 82186, train loss = 4.8388 Iteration 2021, time = 2.52s, wps = 81163, train loss = 4.7926 Iteration 2041, time = 2.55s, wps = 80375, train loss = 4.7480 Iteration 2061, time = 2.52s, wps = 81430, train loss = 4.7733 Processing file: ./data/1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00055-of-00100 Finished processing! Iteration 2081, time = 4.46s, wps = 45960, train loss = 4.7822 Iteration 2101, time = 2.49s, wps = 82175, train loss = 4.7005 Iteration 2121, time = 2.49s, wps = 82369, train loss = 4.7103 Iteration 2141, time = 2.50s, wps = 81928, train loss = 4.7399 Iteration 2161, time = 2.48s, wps = 82682, train loss = 4.7687 Iteration 2181, time = 2.53s, wps = 81008, train loss = 4.7785 Iteration 2201, time = 2.46s, wps = 83306, train loss = 4.7693 Iteration 2221, time = 2.51s, wps = 81525, train loss = 4.7688 Iteration 2241, time = 2.50s, wps = 82058, train loss = 4.7384 Iteration 2261, time = 2.47s, wps = 82947, train loss = 4.7009 Iteration 2281, time = 2.50s, wps = 82029, train loss = 4.6940 Iteration 2301, time = 2.49s, wps = 82388, train loss = 4.6907 Iteration 2321, time = 2.50s, wps = 81946, train loss = 4.6807 Iteration 2341, time = 2.51s, wps = 81436, train loss = 4.7184 Iteration 2361, time = 2.51s, wps = 81542, train loss = 4.7199 Iteration 2381, time = 2.50s, wps = 82017, train loss = 4.6797 Iteration 2401, time = 2.50s, wps = 81935, train loss = 4.6871 Iteration 2421, time = 2.49s, wps = 82086, train loss = 4.6957 Iteration 2441, time = 2.49s, wps = 82102, train loss = 4.6913 /usr/local/lib/python3.5/dist-packages/tensorflow/python/summary/writer/writer.py:386: UserWarning: Attempting to use a closed FileWriter. The operation will be a noop unless the FileWriter is explicitly reopened. warnings.warn("Attempting to use a closed FileWriter. " real 3m17.329s user 36m8.806s sys 7m20.752s root@dc8a73b89477:/workspace/nvidia-examples/big_lstm# time python single_lm_train.py --mode=train --logdir=./logs --num_gpus=3 --datadir=./data/1-billion-word- language-modeling-benchmark-r13output WARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0. For more information, please see: * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md * https://github.com/tensorflow/addons If you depend on functionality not listed there, please file an issue. *****HYPER PARAMETERS***** {'num_gpus': 3, 'optimizer': 0, 'max_grad_norm': 10.0, 'average_params': True, 'run_profiler': False, 'state_size': 2048, 'batch_size': 128, 'emb_size': 512, 'learning_rate': 0.2, 'num_sampled': 8192, 'max_time': 180, 'num_shards': 8, 'num_steps': 20, 'vocab_size': 793470, 'projected_size': 512, 'do_summaries': False, 'num_delayed_steps': 150, 'keep_prob': 0.9, 'num_layers': 1} ************************** WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. WARNING:tensorflow:From /opt/tensorflow/nvidia-examples/big_lstm/model_utils.py:33: UniformUnitScaling.__init__ (from tensorflow.python.ops.init_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.initializers.variance_scaling instead with distribution=uniform to get equivalent behavior. WARNING:tensorflow:From /opt/tensorflow/nvidia-examples/big_lstm/language_model.py:75: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version. Instructions for updating: Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`. WARNING:tensorflow:From /opt/tensorflow/nvidia-examples/big_lstm/language_model.py:107: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/nn_impl.py:1444: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version. Instructions for updating: Create a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead. WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/array_grad.py:425: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. Current time: 1590697746.3923519 ALL VARIABLES WARNING:tensorflow:From /opt/tensorflow/nvidia-examples/big_lstm/run_utils.py:18: all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02. Instructions for updating: Please use tf.global_variables instead. model/emb_0:0 (99184, 512) /gpu:0 model/emb_1:0 (99184, 512) /gpu:0 model/emb_2:0 (99184, 512) /gpu:0 model/emb_3:0 (99184, 512) /gpu:0 model/emb_4:0 (99184, 512) /gpu:0 model/emb_5:0 (99184, 512) /gpu:0 model/emb_6:0 (99184, 512) /gpu:0 model/emb_7:0 (99184, 512) /gpu:0 model/lstm_0/LSTMCell/W_0:0 (1024, 8192) /gpu:0 model/lstm_0/LSTMCell/B:0 (8192,) /gpu:0 model/lstm_0/LSTMCell/W_P_0:0 (2048, 512) /gpu:0 model/softmax_w_0:0 (99184, 512) /gpu:0 model/softmax_w_1:0 (99184, 512) /gpu:0 model/softmax_w_2:0 (99184, 512) /gpu:0 model/softmax_w_3:0 (99184, 512) /gpu:0 model/softmax_w_4:0 (99184, 512) /gpu:0 model/softmax_w_5:0 (99184, 512) /gpu:0 model/softmax_w_6:0 (99184, 512) /gpu:0 model/softmax_w_7:0 (99184, 512) /gpu:0 model/softmax_b:0 (793470,) /gpu:0 model/global_step:0 () model/model/emb_0/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_1/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_2/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_3/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_4/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_5/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_6/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_7/Adagrad:0 (99184, 512) /gpu:0 model/model/lstm_0/LSTMCell/W_0/Adagrad:0 (1024, 8192) /gpu:0 model/model/lstm_0/LSTMCell/B/Adagrad:0 (8192,) /gpu:0 model/model/lstm_0/LSTMCell/W_P_0/Adagrad:0 (2048, 512) /gpu:0 model/model/softmax_w_0/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_1/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_2/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_3/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_4/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_5/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_6/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_7/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_b/Adagrad:0 (793470,) /gpu:0 model/model/lstm_0/LSTMCell/W_0/ExponentialMovingAverage:0 (1024, 8192) /gpu:0 model/model/lstm_0/LSTMCell/B/ExponentialMovingAverage:0 (8192,) /gpu:0 model/model/lstm_0/LSTMCell/W_P_0/ExponentialMovingAverage:0 (2048, 512) /gpu:0 TRAINABLE VARIABLES model/emb_0:0 (99184, 512) /gpu:0 model/emb_1:0 (99184, 512) /gpu:0 model/emb_2:0 (99184, 512) /gpu:0 model/emb_3:0 (99184, 512) /gpu:0 model/emb_4:0 (99184, 512) /gpu:0 model/emb_5:0 (99184, 512) /gpu:0 model/emb_6:0 (99184, 512) /gpu:0 model/emb_7:0 (99184, 512) /gpu:0 model/lstm_0/LSTMCell/W_0:0 (1024, 8192) /gpu:0 model/lstm_0/LSTMCell/B:0 (8192,) /gpu:0 model/lstm_0/LSTMCell/W_P_0:0 (2048, 512) /gpu:0 model/softmax_w_0:0 (99184, 512) /gpu:0 model/softmax_w_1:0 (99184, 512) /gpu:0 model/softmax_w_2:0 (99184, 512) /gpu:0 model/softmax_w_3:0 (99184, 512) /gpu:0 model/softmax_w_4:0 (99184, 512) /gpu:0 model/softmax_w_5:0 (99184, 512) /gpu:0 model/softmax_w_6:0 (99184, 512) /gpu:0 model/softmax_w_7:0 (99184, 512) /gpu:0 model/softmax_b:0 (793470,) /gpu:0 LOCAL VARIABLES model/model/state_0_0:0 (128, 2560) /gpu:0 model/model_1/state_1_0:0 (128, 2560) /gpu:1 model/model_2/state_2_0:0 (128, 2560) /gpu:2 WARNING:tensorflow:From /opt/tensorflow/nvidia-examples/big_lstm/run_utils.py:32: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.train.MonitoredTrainingSession 2020-05-28 20:29:07.056456: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 1996320000 Hz 2020-05-28 20:29:07.065189: I tensorflow/compiler/xla/service/service.cc:161] XLA service 0xa7b7210 executing computations on platform Host. Devices: 2020-05-28 20:29:07.065245: I tensorflow/compiler/xla/service/service.cc:168] StreamExecutor device (0): , 2020-05-28 20:29:07.656807: I tensorflow/compiler/xla/service/service.cc:161] XLA service 0xa7b6c30 executing computations on platform CUDA. Devices: 2020-05-28 20:29:07.656840: I tensorflow/compiler/xla/service/service.cc:168] StreamExecutor device (0): TITAN RTX, Compute Capability 7.5 2020-05-28 20:29:07.656847: I tensorflow/compiler/xla/service/service.cc:168] StreamExecutor device (1): TITAN RTX, Compute Capability 7.5 2020-05-28 20:29:07.656853: I tensorflow/compiler/xla/service/service.cc:168] StreamExecutor device (2): GeForce RTX 2080 Ti, Compute Capability 7.5 2020-05-28 20:29:07.656859: I tensorflow/compiler/xla/service/service.cc:168] StreamExecutor device (3): GeForce RTX 2080 Ti, Compute Capability 7.5 2020-05-28 20:29:07.658046: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: TITAN RTX major: 7 minor: 5 memoryClockRate(GHz): 1.77 pciBusID: 0000:41:00.0 totalMemory: 23.65GiB freeMemory: 23.48GiB 2020-05-28 20:29:07.658082: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 1 with properties: name: TITAN RTX major: 7 minor: 5 memoryClockRate(GHz): 1.77 pciBusID: 0000:61:00.0 totalMemory: 23.65GiB freeMemory: 23.48GiB 2020-05-28 20:29:07.658109: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 2 with properties: name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.635 pciBusID: 0000:04:00.0 totalMemory: 10.76GiB freeMemory: 10.37GiB 2020-05-28 20:29:07.658135: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 3 with properties: name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.635 pciBusID: 0000:21:00.0 totalMemory: 10.76GiB freeMemory: 10.60GiB 2020-05-28 20:29:07.658297: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0, 1, 2, 3 2020-05-28 20:29:08.629112: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2020-05-28 20:29:08.629163: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 1 2 3 2020-05-28 20:29:08.629169: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N N N N 2020-05-28 20:29:08.629174: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 1: N N N N 2020-05-28 20:29:08.629178: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 2: N N N N 2020-05-28 20:29:08.629187: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 3: N N N N 2020-05-28 20:29:08.629361: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 22757 MB memory) -> physical GPU (device: 0, name: TITAN RTX, pci bus id: 0000:41:00.0, compute capability: 7.5) 2020-05-28 20:29:08.629677: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 22757 MB memory) -> physical GPU (device: 1, name: TITAN RTX, pci bus id: 0000:61:00.0, compute capability: 7.5) 2020-05-28 20:29:08.630061: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 10004 MB memory) -> physical GPU (device: 2, name: GeForce RTX 2080 Ti, pci bus id: 0000:04:00.0, compute capability: 7.5) 2020-05-28 20:29:08.630257: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 10224 MB memory) -> physical GPU (device: 3, name: GeForce RTX 2080 Ti, pci bus id: 0000:21:00.0, compute capability: 7.5) WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py:1266: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version. Instructions for updating: Use standard file APIs to check for files with this prefix. WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py:1070: get_checkpoint_mtimes (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version. Instructions for updating: Use standard file utilities to get mtimes. Processing file: ./data/1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00091-of-00100 Finished processing! 2020-05-28 20:29:25.189305: I tensorflow/stream_executor/dso_loader.cc:153] successfully opened CUDA library libcublas.so.10 locally Iteration 2451, time = 10.36s, wps = 741, train loss = 5.2095 Iteration 2452, time = 7.27s, wps = 1057, train loss = 4.7903 Iteration 2453, time = 0.11s, wps = 70819, train loss = 4.8678 Iteration 2454, time = 0.10s, wps = 75423, train loss = 4.7182 Iteration 2455, time = 0.11s, wps = 71179, train loss = 4.7166 Iteration 2456, time = 0.10s, wps = 77105, train loss = 4.6929 Iteration 2457, time = 0.10s, wps = 79203, train loss = 4.7794 Iteration 2458, time = 0.10s, wps = 76666, train loss = 4.8060 Iteration 2459, time = 0.10s, wps = 76692, train loss = 4.7285 Iteration 2470, time = 1.09s, wps = 77691, train loss = 4.6344 Iteration 2490, time = 1.97s, wps = 78148, train loss = 4.6120 Iteration 2510, time = 1.97s, wps = 77968, train loss = 4.6495 Iteration 2530, time = 2.03s, wps = 75685, train loss = 4.6046 Iteration 2550, time = 2.01s, wps = 76376, train loss = 4.7414 Iteration 2570, time = 2.02s, wps = 76146, train loss = 4.6813 Iteration 2590, time = 2.01s, wps = 76488, train loss = 4.6308 Iteration 2610, time = 1.99s, wps = 77058, train loss = 4.7029 Iteration 2630, time = 2.01s, wps = 76561, train loss = 4.6364 Iteration 2650, time = 2.00s, wps = 76772, train loss = 4.6441 Iteration 2670, time = 1.99s, wps = 77127, train loss = 4.7217 Iteration 2690, time = 2.01s, wps = 76405, train loss = 4.6752 Iteration 2710, time = 1.99s, wps = 77325, train loss = 4.6635 Iteration 2730, time = 1.98s, wps = 77486, train loss = 4.6225 Iteration 2750, time = 1.98s, wps = 77604, train loss = 4.6811 Iteration 2770, time = 1.99s, wps = 77244, train loss = 4.5836 Iteration 2790, time = 1.99s, wps = 77109, train loss = 4.5992 Iteration 2810, time = 2.00s, wps = 76895, train loss = 4.6315 Iteration 2830, time = 1.97s, wps = 77905, train loss = 4.6842 Iteration 2850, time = 1.98s, wps = 77557, train loss = 4.5718 Iteration 2870, time = 2.01s, wps = 76517, train loss = 4.6504 Iteration 2890, time = 1.99s, wps = 77038, train loss = 4.6249 Iteration 2910, time = 1.99s, wps = 77013, train loss = 4.5854 Iteration 2930, time = 1.98s, wps = 77488, train loss = 4.5631 Iteration 2950, time = 1.99s, wps = 77359, train loss = 4.6879 Iteration 2970, time = 1.98s, wps = 77564, train loss = 4.5792 Iteration 2990, time = 2.00s, wps = 76637, train loss = 4.5784 Iteration 3010, time = 2.01s, wps = 76517, train loss = 4.5824 Iteration 3030, time = 2.02s, wps = 76087, train loss = 4.5266 Iteration 3050, time = 2.00s, wps = 76794, train loss = 4.5414 Iteration 3070, time = 1.98s, wps = 77600, train loss = 4.6107 Iteration 3090, time = 1.99s, wps = 77349, train loss = 4.5748 Iteration 3110, time = 2.01s, wps = 76453, train loss = 4.5255 Iteration 3130, time = 2.00s, wps = 76981, train loss = 4.5916 Iteration 3150, time = 2.00s, wps = 76929, train loss = 4.5591 Iteration 3170, time = 2.02s, wps = 75999, train loss = 4.5912 Iteration 3190, time = 1.97s, wps = 77818, train loss = 4.5358 Iteration 3210, time = 1.99s, wps = 77340, train loss = 4.5254 Iteration 3230, time = 2.00s, wps = 76696, train loss = 4.5543 Iteration 3250, time = 2.00s, wps = 76901, train loss = 4.5145 Iteration 3270, time = 1.98s, wps = 77490, train loss = 4.5064 Iteration 3290, time = 1.98s, wps = 77759, train loss = 4.5719 Iteration 3310, time = 2.00s, wps = 76856, train loss = 4.5903 Iteration 3330, time = 2.00s, wps = 76768, train loss = 4.5218 Iteration 3350, time = 1.99s, wps = 77334, train loss = 4.5773 Iteration 3370, time = 1.97s, wps = 78070, train loss = 4.5230 Iteration 3390, time = 1.98s, wps = 77440, train loss = 4.5533 Iteration 3410, time = 2.01s, wps = 76479, train loss = 4.5459 Iteration 3430, time = 1.98s, wps = 77394, train loss = 4.4668 Iteration 3450, time = 2.01s, wps = 76536, train loss = 4.5895 Iteration 3470, time = 2.00s, wps = 76983, train loss = 4.4973 Iteration 3490, time = 1.98s, wps = 77519, train loss = 4.5387 Processing file: ./data/1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00029-of-00100 Finished processing! Iteration 3510, time = 4.03s, wps = 38070, train loss = 4.5132 Iteration 3530, time = 1.98s, wps = 77580, train loss = 4.5082 Iteration 3550, time = 1.99s, wps = 77003, train loss = 4.5329 Iteration 3570, time = 1.99s, wps = 77146, train loss = 4.5935 Iteration 3590, time = 1.99s, wps = 77241, train loss = 4.5782 Iteration 3610, time = 2.01s, wps = 76547, train loss = 4.4661 Iteration 3630, time = 2.02s, wps = 76131, train loss = 4.5108 Iteration 3650, time = 1.98s, wps = 77663, train loss = 4.4856 Iteration 3670, time = 1.99s, wps = 77029, train loss = 4.4649 Iteration 3690, time = 2.01s, wps = 76384, train loss = 4.5279 Iteration 3710, time = 2.01s, wps = 76247, train loss = 4.4714 Iteration 3730, time = 1.99s, wps = 77005, train loss = 4.5317 Iteration 3750, time = 2.01s, wps = 76575, train loss = 4.4564 Iteration 3770, time = 1.99s, wps = 77003, train loss = 4.4617 Iteration 3790, time = 1.98s, wps = 77481, train loss = 4.4602 Iteration 3810, time = 2.00s, wps = 76699, train loss = 4.5334 Iteration 3830, time = 1.99s, wps = 77062, train loss = 4.4854 Iteration 3850, time = 1.99s, wps = 77328, train loss = 4.4562 Iteration 3870, time = 1.99s, wps = 77189, train loss = 4.5119 Iteration 3890, time = 2.02s, wps = 76150, train loss = 4.5070 Iteration 3910, time = 2.00s, wps = 76676, train loss = 4.4626 Iteration 3930, time = 1.99s, wps = 77198, train loss = 4.4995 Iteration 3950, time = 2.01s, wps = 76246, train loss = 4.4529 /usr/local/lib/python3.5/dist-packages/tensorflow/python/summary/writer/writer.py:386: UserWarning: Attempting to use a closed FileWriter. The operation will be a noop unless the FileWriter is explicitly reopened. warnings.warn("Attempting to use a closed FileWriter. " real 3m15.301s user 30m28.252s sys 7m25.871s root@dc8a73b89477:/workspace/nvidia-examples/big_lstm# time python single_lm_train.py --mode=train --logdir=./logs --num_gpus=2 --datadir=./data/1-billion-word- language-modeling-benchmark-r13output WARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0. For more information, please see: * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md * https://github.com/tensorflow/addons If you depend on functionality not listed there, please file an issue. *****HYPER PARAMETERS***** {'projected_size': 512, 'num_steps': 20, 'num_shards': 8, 'num_sampled': 8192, 'num_delayed_steps': 150, 'vocab_size': 793470, 'emb_size': 512, 'average_params': True, 'keep_prob': 0.9, 'optimizer': 0, 'max_time': 180, 'run_profiler': False, 'max_grad_norm': 10.0, 'num_gpus': 2, 'state_size': 2048, 'do_summaries': False, 'num_layers': 1, 'batch_size': 128, 'learning_rate': 0.2} ************************** WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. WARNING:tensorflow:From /opt/tensorflow/nvidia-examples/big_lstm/model_utils.py:33: UniformUnitScaling.__init__ (from tensorflow.python.ops.init_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.initializers.variance_scaling instead with distribution=uniform to get equivalent behavior. WARNING:tensorflow:From /opt/tensorflow/nvidia-examples/big_lstm/language_model.py:75: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version. Instructions for updating: Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`. WARNING:tensorflow:From /opt/tensorflow/nvidia-examples/big_lstm/language_model.py:107: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/nn_impl.py:1444: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version. Instructions for updating: Create a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead. WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/array_grad.py:425: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. Current time: 1590698529.1341493 ALL VARIABLES WARNING:tensorflow:From /opt/tensorflow/nvidia-examples/big_lstm/run_utils.py:18: all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02. Instructions for updating: Please use tf.global_variables instead. model/emb_0:0 (99184, 512) /gpu:0 model/emb_1:0 (99184, 512) /gpu:0 model/emb_2:0 (99184, 512) /gpu:0 model/emb_3:0 (99184, 512) /gpu:0 model/emb_4:0 (99184, 512) /gpu:0 model/emb_5:0 (99184, 512) /gpu:0 model/emb_6:0 (99184, 512) /gpu:0 model/emb_7:0 (99184, 512) /gpu:0 model/lstm_0/LSTMCell/W_0:0 (1024, 8192) /gpu:0 model/lstm_0/LSTMCell/B:0 (8192,) /gpu:0 model/lstm_0/LSTMCell/W_P_0:0 (2048, 512) /gpu:0 model/softmax_w_0:0 (99184, 512) /gpu:0 model/softmax_w_1:0 (99184, 512) /gpu:0 model/softmax_w_2:0 (99184, 512) /gpu:0 model/softmax_w_3:0 (99184, 512) /gpu:0 model/softmax_w_4:0 (99184, 512) /gpu:0 model/softmax_w_5:0 (99184, 512) /gpu:0 model/softmax_w_6:0 (99184, 512) /gpu:0 model/softmax_w_7:0 (99184, 512) /gpu:0 model/softmax_b:0 (793470,) /gpu:0 model/global_step:0 () model/model/emb_0/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_1/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_2/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_3/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_4/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_5/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_6/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_7/Adagrad:0 (99184, 512) /gpu:0 model/model/lstm_0/LSTMCell/W_0/Adagrad:0 (1024, 8192) /gpu:0 model/model/lstm_0/LSTMCell/B/Adagrad:0 (8192,) /gpu:0 model/model/lstm_0/LSTMCell/W_P_0/Adagrad:0 (2048, 512) /gpu:0 model/model/softmax_w_0/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_1/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_2/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_3/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_4/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_5/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_6/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_7/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_b/Adagrad:0 (793470,) /gpu:0 model/model/lstm_0/LSTMCell/W_0/ExponentialMovingAverage:0 (1024, 8192) /gpu:0 model/model/lstm_0/LSTMCell/B/ExponentialMovingAverage:0 (8192,) /gpu:0 model/model/lstm_0/LSTMCell/W_P_0/ExponentialMovingAverage:0 (2048, 512) /gpu:0 TRAINABLE VARIABLES model/emb_0:0 (99184, 512) /gpu:0 model/emb_1:0 (99184, 512) /gpu:0 model/emb_2:0 (99184, 512) /gpu:0 model/emb_3:0 (99184, 512) /gpu:0 model/emb_4:0 (99184, 512) /gpu:0 model/emb_5:0 (99184, 512) /gpu:0 model/emb_6:0 (99184, 512) /gpu:0 model/emb_7:0 (99184, 512) /gpu:0 model/lstm_0/LSTMCell/W_0:0 (1024, 8192) /gpu:0 model/lstm_0/LSTMCell/B:0 (8192,) /gpu:0 model/lstm_0/LSTMCell/W_P_0:0 (2048, 512) /gpu:0 model/softmax_w_0:0 (99184, 512) /gpu:0 model/softmax_w_1:0 (99184, 512) /gpu:0 model/softmax_w_2:0 (99184, 512) /gpu:0 model/softmax_w_3:0 (99184, 512) /gpu:0 model/softmax_w_4:0 (99184, 512) /gpu:0 model/softmax_w_5:0 (99184, 512) /gpu:0 model/softmax_w_6:0 (99184, 512) /gpu:0 model/softmax_w_7:0 (99184, 512) /gpu:0 model/softmax_b:0 (793470,) /gpu:0 LOCAL VARIABLES model/model/state_0_0:0 (128, 2560) /gpu:0 model/model_1/state_1_0:0 (128, 2560) /gpu:1 WARNING:tensorflow:From /opt/tensorflow/nvidia-examples/big_lstm/run_utils.py:32: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.train.MonitoredTrainingSession 2020-05-28 20:42:09.641567: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 1996320000 Hz 2020-05-28 20:42:09.649320: I tensorflow/compiler/xla/service/service.cc:161] XLA service 0x78bcb60 executing computations on platform Host. Devices: 2020-05-28 20:42:09.649382: I tensorflow/compiler/xla/service/service.cc:168] StreamExecutor device (0): , 2020-05-28 20:42:10.212938: I tensorflow/compiler/xla/service/service.cc:161] XLA service 0x78bbfa0 executing computations on platform CUDA. Devices: 2020-05-28 20:42:10.212984: I tensorflow/compiler/xla/service/service.cc:168] StreamExecutor device (0): TITAN RTX, Compute Capability 7.5 2020-05-28 20:42:10.212996: I tensorflow/compiler/xla/service/service.cc:168] StreamExecutor device (1): TITAN RTX, Compute Capability 7.5 2020-05-28 20:42:10.213009: I tensorflow/compiler/xla/service/service.cc:168] StreamExecutor device (2): GeForce RTX 2080 Ti, Compute Capability 7.5 2020-05-28 20:42:10.213020: I tensorflow/compiler/xla/service/service.cc:168] StreamExecutor device (3): GeForce RTX 2080 Ti, Compute Capability 7.5 2020-05-28 20:42:10.214407: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: TITAN RTX major: 7 minor: 5 memoryClockRate(GHz): 1.77 pciBusID: 0000:41:00.0 totalMemory: 23.65GiB freeMemory: 23.48GiB 2020-05-28 20:42:10.214443: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 1 with properties: name: TITAN RTX major: 7 minor: 5 memoryClockRate(GHz): 1.77 pciBusID: 0000:61:00.0 totalMemory: 23.65GiB freeMemory: 23.48GiB 2020-05-28 20:42:10.214469: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 2 with properties: name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.635 pciBusID: 0000:04:00.0 totalMemory: 10.76GiB freeMemory: 10.37GiB 2020-05-28 20:42:10.214495: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 3 with properties: name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.635 pciBusID: 0000:21:00.0 totalMemory: 10.76GiB freeMemory: 10.60GiB 2020-05-28 20:42:10.214663: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0, 1, 2, 3 2020-05-28 20:42:11.193718: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2020-05-28 20:42:11.193771: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 1 2 3 2020-05-28 20:42:11.193777: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N N N N 2020-05-28 20:42:11.193781: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 1: N N N N 2020-05-28 20:42:11.193786: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 2: N N N N 2020-05-28 20:42:11.193794: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 3: N N N N 2020-05-28 20:42:11.193949: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 22757 MB memory) -> physical GPU (device: 0, name: TITAN RTX, pci bus id: 0000:41:00.0, compute capability: 7.5) 2020-05-28 20:42:11.194234: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 22757 MB memory) -> physical GPU (device: 1, name: TITAN RTX, pci bus id: 0000:61:00.0, compute capability: 7.5) 2020-05-28 20:42:11.194579: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 10004 MB memory) -> physical GPU (device: 2, name: GeForce RTX 2080 Ti, pci bus id: 0000:04:00.0, compute capability: 7.5) 2020-05-28 20:42:11.194914: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 10224 MB memory) -> physical GPU (device: 3, name: GeForce RTX 2080 Ti, pci bus id: 0000:21:00.0, compute capability: 7.5) WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py:1266: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version. Instructions for updating: Use standard file APIs to check for files with this prefix. WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py:1070: get_checkpoint_mtimes (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version. Instructions for updating: Use standard file utilities to get mtimes. Processing file: ./data/1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00093-of-00100 Finished processing! 2020-05-28 20:42:23.281086: I tensorflow/stream_executor/dso_loader.cc:153] successfully opened CUDA library libcublas.so.10 locally Iteration 3968, time = 6.77s, wps = 757, train loss = 4.9539 Iteration 3969, time = 4.32s, wps = 1185, train loss = 4.5360 Iteration 3970, time = 0.08s, wps = 61586, train loss = 4.4191 Iteration 3971, time = 0.08s, wps = 62234, train loss = 4.4203 Iteration 3972, time = 0.08s, wps = 66638, train loss = 4.4177 Iteration 3973, time = 0.08s, wps = 66335, train loss = 4.3937 Iteration 3974, time = 0.08s, wps = 66599, train loss = 4.4518 Iteration 3975, time = 0.07s, wps = 71254, train loss = 4.5054 Iteration 3976, time = 0.08s, wps = 67857, train loss = 4.4965 Iteration 3987, time = 0.83s, wps = 67719, train loss = 4.4401 Iteration 4007, time = 1.47s, wps = 69841, train loss = 4.4648 Iteration 4027, time = 1.50s, wps = 68342, train loss = 4.3938 Iteration 4047, time = 1.50s, wps = 68415, train loss = 4.4533 Iteration 4067, time = 1.49s, wps = 68705, train loss = 4.3665 Iteration 4087, time = 1.50s, wps = 68210, train loss = 4.3895 Iteration 4107, time = 1.50s, wps = 68073, train loss = 4.4353 Iteration 4127, time = 1.54s, wps = 66480, train loss = 4.3672 Iteration 4147, time = 1.51s, wps = 67842, train loss = 4.4194 Iteration 4167, time = 1.52s, wps = 67523, train loss = 4.4416 Iteration 4187, time = 1.52s, wps = 67545, train loss = 4.4098 Iteration 4207, time = 1.51s, wps = 67732, train loss = 4.3265 Iteration 4227, time = 1.49s, wps = 68585, train loss = 4.3827 Iteration 4247, time = 1.50s, wps = 68088, train loss = 4.3426 Iteration 4267, time = 1.51s, wps = 67603, train loss = 4.4130 Iteration 4287, time = 1.51s, wps = 67701, train loss = 4.3963 Iteration 4307, time = 1.51s, wps = 67720, train loss = 4.3788 Iteration 4327, time = 1.51s, wps = 67703, train loss = 4.4120 Iteration 4347, time = 1.50s, wps = 68266, train loss = 4.3542 Iteration 4367, time = 1.51s, wps = 67696, train loss = 4.3157 Iteration 4387, time = 1.50s, wps = 68365, train loss = 4.3104 Iteration 4407, time = 1.48s, wps = 69060, train loss = 4.4339 Iteration 4427, time = 1.50s, wps = 68378, train loss = 4.3153 Iteration 4447, time = 1.50s, wps = 68365, train loss = 4.3321 Iteration 4467, time = 1.53s, wps = 67080, train loss = 4.4120 Iteration 4487, time = 1.52s, wps = 67381, train loss = 4.3062 Iteration 4507, time = 1.51s, wps = 67806, train loss = 4.3681 Iteration 4527, time = 1.50s, wps = 68063, train loss = 4.3578 Iteration 4547, time = 1.48s, wps = 68979, train loss = 4.4950 Iteration 4567, time = 1.50s, wps = 68088, train loss = 4.3863 Iteration 4587, time = 1.50s, wps = 68399, train loss = 4.3632 Iteration 4607, time = 1.52s, wps = 67514, train loss = 4.3729 Iteration 4627, time = 1.50s, wps = 68423, train loss = 4.3984 Iteration 4647, time = 1.52s, wps = 67157, train loss = 4.3690 Iteration 4667, time = 1.51s, wps = 67811, train loss = 4.3165 Iteration 4687, time = 1.51s, wps = 67645, train loss = 4.5067 Iteration 4707, time = 1.51s, wps = 67962, train loss = 4.3490 Iteration 4727, time = 1.52s, wps = 67385, train loss = 4.3730 Iteration 4747, time = 1.50s, wps = 68094, train loss = 4.3936 Iteration 4767, time = 1.50s, wps = 68207, train loss = 4.3903 Iteration 4787, time = 1.49s, wps = 68810, train loss = 4.3187 Iteration 4807, time = 1.50s, wps = 68230, train loss = 4.3251 Iteration 4827, time = 1.52s, wps = 67586, train loss = 4.3465 Iteration 4847, time = 1.51s, wps = 67786, train loss = 4.3849 Iteration 4867, time = 1.50s, wps = 68156, train loss = 4.2849 Iteration 4887, time = 1.52s, wps = 67355, train loss = 4.3947 Iteration 4907, time = 1.53s, wps = 67127, train loss = 4.4368 Iteration 4927, time = 1.52s, wps = 67435, train loss = 4.3975 Iteration 4947, time = 1.52s, wps = 67455, train loss = 4.4220 Iteration 4967, time = 1.51s, wps = 68005, train loss = 4.3902 Iteration 4987, time = 1.52s, wps = 67567, train loss = 4.2851 Iteration 5007, time = 1.50s, wps = 68092, train loss = 4.4013 Iteration 5027, time = 1.51s, wps = 67714, train loss = 4.4300 Iteration 5047, time = 1.52s, wps = 67405, train loss = 4.3429 Iteration 5067, time = 1.50s, wps = 68082, train loss = 4.3354 Iteration 5087, time = 1.51s, wps = 67655, train loss = 4.3469 Iteration 5107, time = 1.50s, wps = 68169, train loss = 4.3401 Iteration 5127, time = 1.50s, wps = 68043, train loss = 4.3964 Iteration 5147, time = 1.52s, wps = 67442, train loss = 4.3211 Iteration 5167, time = 1.50s, wps = 68095, train loss = 4.3906 Iteration 5187, time = 1.51s, wps = 67804, train loss = 4.3566 Iteration 5207, time = 1.50s, wps = 68271, train loss = 4.3320 Iteration 5227, time = 1.50s, wps = 68080, train loss = 4.2954 Iteration 5247, time = 1.51s, wps = 67762, train loss = 4.3189 Iteration 5267, time = 1.51s, wps = 67781, train loss = 4.3437 Iteration 5287, time = 1.52s, wps = 67547, train loss = 4.3495 Iteration 5307, time = 1.50s, wps = 68194, train loss = 4.3569 Iteration 5327, time = 1.52s, wps = 67276, train loss = 4.3950 Iteration 5347, time = 1.50s, wps = 68258, train loss = 4.3802 Iteration 5367, time = 1.51s, wps = 67795, train loss = 4.2897 Iteration 5387, time = 1.51s, wps = 67899, train loss = 4.3202 Iteration 5407, time = 1.51s, wps = 67924, train loss = 4.2183 Iteration 5427, time = 1.51s, wps = 67815, train loss = 4.4307 Iteration 5447, time = 1.52s, wps = 67357, train loss = 4.3840 Iteration 5467, time = 1.53s, wps = 67137, train loss = 4.3190 Iteration 5487, time = 1.50s, wps = 68246, train loss = 4.3218 Iteration 5507, time = 1.53s, wps = 67012, train loss = 4.3712 Iteration 5527, time = 1.52s, wps = 67197, train loss = 4.3184 Processing file: ./data/1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00058-of-00100 Finished processing! Iteration 5547, time = 3.55s, wps = 28865, train loss = 4.3501 Iteration 5567, time = 1.52s, wps = 67560, train loss = 4.2992 Iteration 5587, time = 1.52s, wps = 67487, train loss = 4.3439 Iteration 5607, time = 1.53s, wps = 67127, train loss = 4.3221 Iteration 5627, time = 1.53s, wps = 67029, train loss = 4.3927 Iteration 5647, time = 1.51s, wps = 67712, train loss = 4.3470 Iteration 5667, time = 1.52s, wps = 67523, train loss = 4.2838 Iteration 5687, time = 1.52s, wps = 67454, train loss = 4.3444 Iteration 5707, time = 1.51s, wps = 67876, train loss = 4.3042 Iteration 5727, time = 1.52s, wps = 67454, train loss = 4.4056 Iteration 5747, time = 1.52s, wps = 67501, train loss = 4.3285 Iteration 5767, time = 1.50s, wps = 68346, train loss = 4.2653 Iteration 5787, time = 1.51s, wps = 67944, train loss = 4.3902 Iteration 5807, time = 1.51s, wps = 67747, train loss = 4.3351 Iteration 5827, time = 1.53s, wps = 67134, train loss = 4.3587 Iteration 5847, time = 1.52s, wps = 67558, train loss = 4.3667 Iteration 5867, time = 1.51s, wps = 67786, train loss = 4.3097 Iteration 5887, time = 1.52s, wps = 67414, train loss = 4.3174 Iteration 5907, time = 1.51s, wps = 68016, train loss = 4.3789 Iteration 5927, time = 1.52s, wps = 67529, train loss = 4.3641 Iteration 5947, time = 1.52s, wps = 67387, train loss = 4.2437 Iteration 5967, time = 1.50s, wps = 68099, train loss = 4.3776 Iteration 5987, time = 1.52s, wps = 67589, train loss = 4.3350 Iteration 6007, time = 1.52s, wps = 67336, train loss = 4.2663 Iteration 6027, time = 1.52s, wps = 67342, train loss = 4.3692 Iteration 6047, time = 1.51s, wps = 67879, train loss = 4.2967 Iteration 6067, time = 1.50s, wps = 68179, train loss = 4.3443 /usr/local/lib/python3.5/dist-packages/tensorflow/python/summary/writer/writer.py:386: UserWarning: Attempting to use a closed FileWriter. The operation will be a noop unless the FileWriter is explicitly reopened. warnings.warn("Attempting to use a closed FileWriter. " real 3m13.231s user 25m14.534s sys 6m54.206s root@dc8a73b89477:/workspace/nvidia-examples/big_lstm# time python single_lm_train.py --mode=train --logdir=./logs --num_gpus=1 --datadir=./data/1-billion-word- language-modeling-benchmark-r13output WARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0. For more information, please see: * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md * https://github.com/tensorflow/addons If you depend on functionality not listed there, please file an issue. *****HYPER PARAMETERS***** {'do_summaries': False, 'num_sampled': 8192, 'projected_size': 512, 'optimizer': 0, 'run_profiler': False, 'num_gpus': 1, 'num_steps': 20, 'num_layers': 1, 'max_grad_norm': 10.0, 'max_time': 180, 'num_shards': 8, 'vocab_size': 793470, 'learning_rate': 0.2, 'num_delayed_steps': 150, 'state_size': 2048, 'average_params': True, 'batch_size': 128, 'emb_size': 512, 'keep_prob': 0.9} ************************** WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. WARNING:tensorflow:From /opt/tensorflow/nvidia-examples/big_lstm/model_utils.py:33: UniformUnitScaling.__init__ (from tensorflow.python.ops.init_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.initializers.variance_scaling instead with distribution=uniform to get equivalent behavior. WARNING:tensorflow:From /opt/tensorflow/nvidia-examples/big_lstm/language_model.py:75: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version. Instructions for updating: Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`. WARNING:tensorflow:From /opt/tensorflow/nvidia-examples/big_lstm/language_model.py:107: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/nn_impl.py:1444: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version. Instructions for updating: Create a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead. WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/array_grad.py:425: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. Current time: 1590699287.4263654 ALL VARIABLES WARNING:tensorflow:From /opt/tensorflow/nvidia-examples/big_lstm/run_utils.py:18: all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02. Instructions for updating: Please use tf.global_variables instead. model/emb_0:0 (99184, 512) /gpu:0 model/emb_1:0 (99184, 512) /gpu:0 model/emb_2:0 (99184, 512) /gpu:0 model/emb_3:0 (99184, 512) /gpu:0 model/emb_4:0 (99184, 512) /gpu:0 model/emb_5:0 (99184, 512) /gpu:0 model/emb_6:0 (99184, 512) /gpu:0 model/emb_7:0 (99184, 512) /gpu:0 model/lstm_0/LSTMCell/W_0:0 (1024, 8192) /gpu:0 model/lstm_0/LSTMCell/B:0 (8192,) /gpu:0 model/lstm_0/LSTMCell/W_P_0:0 (2048, 512) /gpu:0 model/softmax_w_0:0 (99184, 512) /gpu:0 model/softmax_w_1:0 (99184, 512) /gpu:0 model/softmax_w_2:0 (99184, 512) /gpu:0 model/softmax_w_3:0 (99184, 512) /gpu:0 model/softmax_w_4:0 (99184, 512) /gpu:0 model/softmax_w_5:0 (99184, 512) /gpu:0 model/softmax_w_6:0 (99184, 512) /gpu:0 model/softmax_w_7:0 (99184, 512) /gpu:0 model/softmax_b:0 (793470,) /gpu:0 model/global_step:0 () model/model/emb_0/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_1/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_2/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_3/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_4/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_5/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_6/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_7/Adagrad:0 (99184, 512) /gpu:0 model/model/lstm_0/LSTMCell/W_0/Adagrad:0 (1024, 8192) /gpu:0 model/model/lstm_0/LSTMCell/B/Adagrad:0 (8192,) /gpu:0 model/model/lstm_0/LSTMCell/W_P_0/Adagrad:0 (2048, 512) /gpu:0 model/model/softmax_w_0/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_1/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_2/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_3/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_4/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_5/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_6/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_7/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_b/Adagrad:0 (793470,) /gpu:0 model/model/lstm_0/LSTMCell/W_0/ExponentialMovingAverage:0 (1024, 8192) /gpu:0 model/model/lstm_0/LSTMCell/B/ExponentialMovingAverage:0 (8192,) /gpu:0 model/model/lstm_0/LSTMCell/W_P_0/ExponentialMovingAverage:0 (2048, 512) /gpu:0 TRAINABLE VARIABLES model/emb_0:0 (99184, 512) /gpu:0 model/emb_1:0 (99184, 512) /gpu:0 model/emb_2:0 (99184, 512) /gpu:0 model/emb_3:0 (99184, 512) /gpu:0 model/emb_4:0 (99184, 512) /gpu:0 model/emb_5:0 (99184, 512) /gpu:0 model/emb_6:0 (99184, 512) /gpu:0 model/emb_7:0 (99184, 512) /gpu:0 model/lstm_0/LSTMCell/W_0:0 (1024, 8192) /gpu:0 model/lstm_0/LSTMCell/B:0 (8192,) /gpu:0 model/lstm_0/LSTMCell/W_P_0:0 (2048, 512) /gpu:0 model/softmax_w_0:0 (99184, 512) /gpu:0 model/softmax_w_1:0 (99184, 512) /gpu:0 model/softmax_w_2:0 (99184, 512) /gpu:0 model/softmax_w_3:0 (99184, 512) /gpu:0 model/softmax_w_4:0 (99184, 512) /gpu:0 model/softmax_w_5:0 (99184, 512) /gpu:0 model/softmax_w_6:0 (99184, 512) /gpu:0 model/softmax_w_7:0 (99184, 512) /gpu:0 model/softmax_b:0 (793470,) /gpu:0 LOCAL VARIABLES model/model/state_0_0:0 (128, 2560) /gpu:0 WARNING:tensorflow:From /opt/tensorflow/nvidia-examples/big_lstm/run_utils.py:32: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.train.MonitoredTrainingSession 2020-05-28 20:54:47.677566: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 1996320000 Hz 2020-05-28 20:54:47.685402: I tensorflow/compiler/xla/service/service.cc:161] XLA service 0x6756a30 executing computations on platform Host. Devices: 2020-05-28 20:54:47.685456: I tensorflow/compiler/xla/service/service.cc:168] StreamExecutor device (0): , 2020-05-28 20:54:48.224400: I tensorflow/compiler/xla/service/service.cc:161] XLA service 0x6756450 executing computations on platform CUDA. Devices: 2020-05-28 20:54:48.224447: I tensorflow/compiler/xla/service/service.cc:168] StreamExecutor device (0): TITAN RTX, Compute Capability 7.5 2020-05-28 20:54:48.224459: I tensorflow/compiler/xla/service/service.cc:168] StreamExecutor device (1): TITAN RTX, Compute Capability 7.5 2020-05-28 20:54:48.224472: I tensorflow/compiler/xla/service/service.cc:168] StreamExecutor device (2): GeForce RTX 2080 Ti, Compute Capability 7.5 2020-05-28 20:54:48.224484: I tensorflow/compiler/xla/service/service.cc:168] StreamExecutor device (3): GeForce RTX 2080 Ti, Compute Capability 7.5 2020-05-28 20:54:48.225938: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: TITAN RTX major: 7 minor: 5 memoryClockRate(GHz): 1.77 pciBusID: 0000:41:00.0 totalMemory: 23.65GiB freeMemory: 23.48GiB 2020-05-28 20:54:48.225973: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 1 with properties: name: TITAN RTX major: 7 minor: 5 memoryClockRate(GHz): 1.77 pciBusID: 0000:61:00.0 totalMemory: 23.65GiB freeMemory: 23.48GiB 2020-05-28 20:54:48.226000: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 2 with properties: name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.635 pciBusID: 0000:04:00.0 totalMemory: 10.76GiB freeMemory: 10.37GiB 2020-05-28 20:54:48.226025: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 3 with properties: name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.635 pciBusID: 0000:21:00.0 totalMemory: 10.76GiB freeMemory: 10.60GiB 2020-05-28 20:54:48.226179: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0, 1, 2, 3 2020-05-28 20:54:49.191387: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2020-05-28 20:54:49.191448: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 1 2 3 2020-05-28 20:54:49.191455: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N N N N 2020-05-28 20:54:49.191460: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 1: N N N N 2020-05-28 20:54:49.191464: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 2: N N N N 2020-05-28 20:54:49.191468: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 3: N N N N 2020-05-28 20:54:49.191635: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 22757 MB memory) -> physical GPU (device: 0, name: TITAN RTX, pci bus id: 0000:41:00.0, compute capability: 7.5) 2020-05-28 20:54:49.191942: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 22757 MB memory) -> physical GPU (device: 1, name: TITAN RTX, pci bus id: 0000:61:00.0, compute capability: 7.5) 2020-05-28 20:54:49.192149: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 10004 MB memory) -> physical GPU (device: 2, name: GeForce RTX 2080 Ti, pci bus id: 0000:04:00.0, compute capability: 7.5) 2020-05-28 20:54:49.192341: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 10224 MB memory) -> physical GPU (device: 3, name: GeForce RTX 2080 Ti, pci bus id: 0000:21:00.0, compute capability: 7.5) WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py:1266: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version. Instructions for updating: Use standard file APIs to check for files with this prefix. WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py:1070: get_checkpoint_mtimes (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version. Instructions for updating: Use standard file utilities to get mtimes. Processing file: ./data/1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00045-of-00100 Finished processing! 2020-05-28 20:54:58.153901: I tensorflow/stream_executor/dso_loader.cc:153] successfully opened CUDA library libcublas.so.10 locally Iteration 6076, time = 4.44s, wps = 576, train loss = 4.6649 Iteration 6077, time = 2.24s, wps = 1143, train loss = 4.3314 Iteration 6078, time = 0.06s, wps = 41033, train loss = 4.4353 Iteration 6079, time = 0.06s, wps = 41849, train loss = 4.4128 Iteration 6080, time = 0.06s, wps = 41128, train loss = 4.4220 Iteration 6081, time = 0.05s, wps = 46959, train loss = 4.4431 Iteration 6082, time = 0.05s, wps = 51421, train loss = 4.3484 Iteration 6083, time = 0.05s, wps = 51554, train loss = 4.2707 Iteration 6084, time = 0.05s, wps = 47336, train loss = 4.3847 Iteration 6095, time = 0.57s, wps = 49583, train loss = 4.3706 Iteration 6115, time = 1.05s, wps = 48740, train loss = 4.3157 Iteration 6135, time = 1.04s, wps = 49019, train loss = 4.3490 Iteration 6155, time = 1.06s, wps = 48465, train loss = 4.3934 Iteration 6175, time = 1.03s, wps = 49515, train loss = 4.4199 Iteration 6195, time = 1.04s, wps = 49043, train loss = 4.4302 Iteration 6215, time = 1.05s, wps = 48990, train loss = 4.4137 Iteration 6235, time = 1.04s, wps = 49370, train loss = 4.3934 Iteration 6255, time = 1.05s, wps = 48661, train loss = 4.5149 Iteration 6275, time = 1.04s, wps = 49335, train loss = 4.3845 Iteration 6295, time = 1.04s, wps = 49369, train loss = 4.3459 Iteration 6315, time = 1.05s, wps = 48823, train loss = 4.2881 Iteration 6335, time = 1.06s, wps = 48507, train loss = 4.4100 Iteration 6355, time = 1.03s, wps = 49559, train loss = 4.4486 Iteration 6375, time = 1.05s, wps = 48686, train loss = 4.4206 Iteration 6395, time = 1.05s, wps = 48695, train loss = 4.3880 Iteration 6415, time = 1.06s, wps = 48177, train loss = 4.3439 Iteration 6435, time = 1.04s, wps = 49274, train loss = 4.3863 Iteration 6455, time = 1.05s, wps = 48731, train loss = 4.3324 Iteration 6475, time = 1.06s, wps = 48227, train loss = 4.2432 Iteration 6495, time = 1.05s, wps = 48574, train loss = 4.5078 Iteration 6515, time = 1.04s, wps = 49040, train loss = 4.4154 Iteration 6535, time = 1.04s, wps = 49381, train loss = 4.3156 Iteration 6555, time = 1.06s, wps = 48148, train loss = 4.3361 Iteration 6575, time = 1.06s, wps = 48448, train loss = 4.4414 Iteration 6595, time = 1.06s, wps = 48468, train loss = 4.4092 Iteration 6615, time = 1.06s, wps = 48114, train loss = 4.2183 Iteration 6635, time = 1.06s, wps = 48376, train loss = 4.2931 Iteration 6655, time = 1.05s, wps = 48556, train loss = 4.2650 Iteration 6675, time = 1.05s, wps = 48929, train loss = 4.3506 Iteration 6695, time = 1.05s, wps = 48857, train loss = 4.3824 Iteration 6715, time = 1.05s, wps = 48779, train loss = 4.4176 Iteration 6735, time = 1.04s, wps = 49358, train loss = 4.2170 Iteration 6755, time = 1.05s, wps = 48779, train loss = 4.2915 Iteration 6775, time = 1.05s, wps = 48900, train loss = 4.2651 Iteration 6795, time = 1.06s, wps = 48440, train loss = 4.3580 Iteration 6815, time = 1.04s, wps = 49192, train loss = 4.5017 Iteration 6835, time = 1.04s, wps = 49400, train loss = 4.2599 Iteration 6855, time = 1.05s, wps = 48611, train loss = 4.3187 Iteration 6875, time = 1.04s, wps = 49160, train loss = 4.2704 Iteration 6895, time = 1.07s, wps = 47872, train loss = 4.2816 Iteration 6915, time = 1.06s, wps = 48468, train loss = 4.3581 Iteration 6935, time = 1.04s, wps = 49088, train loss = 4.3684 Iteration 6955, time = 1.07s, wps = 48011, train loss = 4.3385 Iteration 6975, time = 1.05s, wps = 48776, train loss = 4.3660 Iteration 6995, time = 1.04s, wps = 49347, train loss = 4.2657 Iteration 7015, time = 1.06s, wps = 48261, train loss = 4.3913 Iteration 7035, time = 1.06s, wps = 48464, train loss = 4.2081 Iteration 7055, time = 1.04s, wps = 49250, train loss = 4.2611 Iteration 7075, time = 1.06s, wps = 48363, train loss = 4.3218 Iteration 7095, time = 1.05s, wps = 48711, train loss = 4.2646 Iteration 7115, time = 1.04s, wps = 49209, train loss = 4.3910 Iteration 7135, time = 1.04s, wps = 49439, train loss = 4.2700 Iteration 7155, time = 1.05s, wps = 48667, train loss = 4.2960 Iteration 7175, time = 1.08s, wps = 47449, train loss = 4.2777 Iteration 7195, time = 1.06s, wps = 48493, train loss = 4.2703 Iteration 7215, time = 1.04s, wps = 49233, train loss = 4.3924 Iteration 7235, time = 1.07s, wps = 47920, train loss = 4.3761 Iteration 7255, time = 1.07s, wps = 47975, train loss = 4.3552 Iteration 7275, time = 1.05s, wps = 48551, train loss = 4.3728 Iteration 7295, time = 1.05s, wps = 48946, train loss = 4.3017 Iteration 7315, time = 1.06s, wps = 48173, train loss = 4.3909 Iteration 7335, time = 1.05s, wps = 48968, train loss = 4.3850 Iteration 7355, time = 1.06s, wps = 48147, train loss = 4.3954 Iteration 7375, time = 1.07s, wps = 48005, train loss = 4.3138 Iteration 7395, time = 1.05s, wps = 48900, train loss = 4.3072 Iteration 7415, time = 1.06s, wps = 48453, train loss = 4.3292 Iteration 7435, time = 1.07s, wps = 47929, train loss = 4.2836 Iteration 7455, time = 1.06s, wps = 48326, train loss = 4.4243 Iteration 7475, time = 1.05s, wps = 48728, train loss = 4.3847 Iteration 7495, time = 1.05s, wps = 48706, train loss = 4.3262 Iteration 7515, time = 1.05s, wps = 48719, train loss = 4.2567 Iteration 7535, time = 1.06s, wps = 48147, train loss = 4.2572 Iteration 7555, time = 1.05s, wps = 48876, train loss = 4.2914 Iteration 7575, time = 1.06s, wps = 48450, train loss = 4.2191 Iteration 7595, time = 1.07s, wps = 48020, train loss = 4.3018 Iteration 7615, time = 1.06s, wps = 48441, train loss = 4.2231 Iteration 7635, time = 1.07s, wps = 48065, train loss = 4.3403 Iteration 7655, time = 1.08s, wps = 47527, train loss = 4.4077 Iteration 7675, time = 1.07s, wps = 47989, train loss = 4.3930 Iteration 7695, time = 1.07s, wps = 47791, train loss = 4.3071 Iteration 7715, time = 1.03s, wps = 49642, train loss = 4.3962 Iteration 7735, time = 1.06s, wps = 48520, train loss = 4.1891 Iteration 7755, time = 1.06s, wps = 48153, train loss = 4.3602 Iteration 7775, time = 1.08s, wps = 47418, train loss = 4.3487 Iteration 7795, time = 1.03s, wps = 49633, train loss = 4.3162 Iteration 7815, time = 1.04s, wps = 49067, train loss = 4.3161 Iteration 7835, time = 1.04s, wps = 49078, train loss = 4.2658 Iteration 7855, time = 1.07s, wps = 47651, train loss = 4.2461 Iteration 7875, time = 1.07s, wps = 48031, train loss = 4.2406 Iteration 7895, time = 1.06s, wps = 48326, train loss = 4.3507 Iteration 7915, time = 1.05s, wps = 48986, train loss = 4.1426 Iteration 7935, time = 1.05s, wps = 48825, train loss = 4.3941 Iteration 7955, time = 1.06s, wps = 48514, train loss = 4.2206 Iteration 7975, time = 1.07s, wps = 47977, train loss = 4.2814 Iteration 7995, time = 1.05s, wps = 48731, train loss = 4.3841 Iteration 8015, time = 1.04s, wps = 49141, train loss = 4.4013 Iteration 8035, time = 1.07s, wps = 48050, train loss = 4.2684 Iteration 8055, time = 1.09s, wps = 47106, train loss = 4.3361 Iteration 8075, time = 1.08s, wps = 47518, train loss = 4.2860 Iteration 8095, time = 1.07s, wps = 48063, train loss = 4.4133 Iteration 8115, time = 1.06s, wps = 48177, train loss = 4.2589 Iteration 8135, time = 1.07s, wps = 47944, train loss = 4.3573 Iteration 8155, time = 1.05s, wps = 48677, train loss = 4.4310 Iteration 8175, time = 1.07s, wps = 48066, train loss = 4.4041 Iteration 8195, time = 1.07s, wps = 47876, train loss = 4.2821 Iteration 8215, time = 1.06s, wps = 48482, train loss = 4.3504 Iteration 8235, time = 1.06s, wps = 48092, train loss = 4.4324 Iteration 8255, time = 1.05s, wps = 48576, train loss = 4.3516 Iteration 8275, time = 1.05s, wps = 48578, train loss = 4.3389 Iteration 8295, time = 1.06s, wps = 48380, train loss = 4.3348 Iteration 8315, time = 1.04s, wps = 49151, train loss = 4.1997 Iteration 8335, time = 1.07s, wps = 48016, train loss = 4.1604 Iteration 8355, time = 1.06s, wps = 48462, train loss = 4.2970 Iteration 8375, time = 1.06s, wps = 48424, train loss = 4.3444 Iteration 8395, time = 1.05s, wps = 48812, train loss = 4.2659 Iteration 8415, time = 1.04s, wps = 49292, train loss = 4.2204 Iteration 8435, time = 1.05s, wps = 48596, train loss = 4.2491 Iteration 8455, time = 1.06s, wps = 48418, train loss = 4.4083 Iteration 8475, time = 1.06s, wps = 48389, train loss = 4.3619 Iteration 8495, time = 1.06s, wps = 48205, train loss = 4.2838 Iteration 8515, time = 1.07s, wps = 47890, train loss = 4.3254 Iteration 8535, time = 1.07s, wps = 47785, train loss = 4.3158 Iteration 8555, time = 1.06s, wps = 48453, train loss = 4.3042 Iteration 8575, time = 1.06s, wps = 48500, train loss = 4.3868 Iteration 8595, time = 1.06s, wps = 48112, train loss = 4.2216 Iteration 8615, time = 1.05s, wps = 48752, train loss = 4.2663 Iteration 8635, time = 1.06s, wps = 48092, train loss = 4.1551 Iteration 8655, time = 1.04s, wps = 49195, train loss = 4.2679 Iteration 8675, time = 1.07s, wps = 47988, train loss = 4.2818 Iteration 8695, time = 1.06s, wps = 48203, train loss = 4.3254 Iteration 8715, time = 1.08s, wps = 47381, train loss = 4.2523 Iteration 8735, time = 1.07s, wps = 47632, train loss = 4.3420 Iteration 8755, time = 1.06s, wps = 48219, train loss = 4.2558 Iteration 8775, time = 1.07s, wps = 48012, train loss = 4.3687 Iteration 8795, time = 1.07s, wps = 48054, train loss = 4.2169 Iteration 8815, time = 1.08s, wps = 47510, train loss = 4.1992 Iteration 8835, time = 1.07s, wps = 47669, train loss = 4.1726 Iteration 8855, time = 1.07s, wps = 47910, train loss = 4.2971 Iteration 8875, time = 1.06s, wps = 48464, train loss = 4.1624 Iteration 8895, time = 1.06s, wps = 48431, train loss = 4.1786 Iteration 8915, time = 1.07s, wps = 47989, train loss = 4.2931 Iteration 8935, time = 1.05s, wps = 48545, train loss = 4.2466 Iteration 8955, time = 1.06s, wps = 48229, train loss = 4.3273 Iteration 8975, time = 1.05s, wps = 48597, train loss = 4.2990 Iteration 8995, time = 1.06s, wps = 48241, train loss = 4.3638 Iteration 9015, time = 1.07s, wps = 47661, train loss = 4.3069 Iteration 9035, time = 1.07s, wps = 47730, train loss = 4.2197 Iteration 9055, time = 1.07s, wps = 47814, train loss = 4.1161 Iteration 9075, time = 1.06s, wps = 48429, train loss = 4.1450 Iteration 9095, time = 1.04s, wps = 49307, train loss = 4.4085 Iteration 9115, time = 1.06s, wps = 48127, train loss = 4.2554 Iteration 9135, time = 1.09s, wps = 47107, train loss = 4.1749 Iteration 9155, time = 1.07s, wps = 48034, train loss = 4.1401 Iteration 9175, time = 1.07s, wps = 47981, train loss = 4.3515 Iteration 9195, time = 1.08s, wps = 47385, train loss = 4.3475 Iteration 9215, time = 1.09s, wps = 47148, train loss = 4.1597 Processing file: ./data/1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00015-of-00100 Finished processing! WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py:966: remove_checkpoint (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version. Instructions for updating: Use standard file APIs to delete files with this prefix. /usr/local/lib/python3.5/dist-packages/tensorflow/python/summary/writer/writer.py:386: UserWarning: Attempting to use a closed FileWriter. The operation will be a noop unless the FileWriter is explicitly reopened. warnings.warn("Attempting to use a closed FileWriter. " real 3m13.006s user 16m19.698s sys 6m9.393s root@dc8a73b89477:/workspace/nvidia-examples/big_lstm# cat /etc/os-release NAME="Ubuntu" VERSION="16.04.6 LTS (Xenial Xerus)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 16.04.6 LTS" VERSION_ID="16.04" HOME_URL="http://www.ubuntu.com/" SUPPORT_URL="http://help.ubuntu.com/" BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/" VERSION_CODENAME=xenial UBUNTU_CODENAME=xenial root@dc8a73b89477:/workspace/nvidia-examples/big_lstm# nvcc -V nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2019 NVIDIA Corporation Built on Fri_Feb__8_19:08:17_PST_2019 Cuda compilation tools, release 10.1, V10.1.105 root@dc8a73b89477:/workspace/nvidia-examples/big_lstm# cd data root@dc8a73b89477:/workspace/nvidia-examples/big_lstm/data# ls 1-billion-word-language-modeling-benchmark-r13output root@dc8a73b89477:/workspace/nvidia-examples/big_lstm/data# cd 1-billion-word-language-modeling-benchmark-r13output root@dc8a73b89477:/workspace/nvidia-examples/big_lstm/data/1-billion-word-language-modeling-benchmark-r13output# ls 1b_word_vocab.txt heldout-monolingual.tokenized.shuffled README training-monolingual.tokenized.shuffled root@dc8a73b89477:/workspace/nvidia-examples/big_lstm/data/1-billion-word-language-modeling-benchmark-r13output# cd training-monolingual.tokenized.shuffled root@dc8a73b89477:/workspace/nvidia-examples/big_lstm/data/1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled# ls news.en-00001-of-00100 news.en-00034-of-00100 news.en-00067-of-00100 news.en-00002-of-00100 news.en-00035-of-00100 news.en-00068-of-00100 news.en-00003-of-00100 news.en-00036-of-00100 news.en-00069-of-00100 news.en-00004-of-00100 news.en-00037-of-00100 news.en-00070-of-00100 news.en-00005-of-00100 news.en-00038-of-00100 news.en-00071-of-00100 news.en-00006-of-00100 news.en-00039-of-00100 news.en-00072-of-00100 news.en-00007-of-00100 news.en-00040-of-00100 news.en-00073-of-00100 news.en-00008-of-00100 news.en-00041-of-00100 news.en-00074-of-00100 news.en-00009-of-00100 news.en-00042-of-00100 news.en-00075-of-00100 news.en-00010-of-00100 news.en-00043-of-00100 news.en-00076-of-00100 news.en-00011-of-00100 news.en-00044-of-00100 news.en-00077-of-00100 news.en-00012-of-00100 news.en-00045-of-00100 news.en-00078-of-00100 news.en-00013-of-00100 news.en-00046-of-00100 news.en-00079-of-00100 news.en-00014-of-00100 news.en-00047-of-00100 news.en-00080-of-00100 news.en-00015-of-00100 news.en-00048-of-00100 news.en-00081-of-00100 news.en-00016-of-00100 news.en-00049-of-00100 news.en-00082-of-00100 news.en-00017-of-00100 news.en-00050-of-00100 news.en-00083-of-00100 news.en-00018-of-00100 news.en-00051-of-00100 news.en-00084-of-00100 news.en-00019-of-00100 news.en-00052-of-00100 news.en-00085-of-00100 news.en-00020-of-00100 news.en-00053-of-00100 news.en-00086-of-00100 news.en-00021-of-00100 news.en-00054-of-00100 news.en-00087-of-00100 news.en-00022-of-00100 news.en-00055-of-00100 news.en-00088-of-00100 news.en-00023-of-00100 news.en-00056-of-00100 news.en-00089-of-00100 news.en-00024-of-00100 news.en-00057-of-00100 news.en-00090-of-00100 news.en-00025-of-00100 news.en-00058-of-00100 news.en-00091-of-00100 news.en-00026-of-00100 news.en-00059-of-00100 news.en-00092-of-00100 news.en-00027-of-00100 news.en-00060-of-00100 news.en-00093-of-00100 news.en-00028-of-00100 news.en-00061-of-00100 news.en-00094-of-00100 news.en-00029-of-00100 news.en-00062-of-00100 news.en-00095-of-00100 news.en-00030-of-00100 news.en-00063-of-00100 news.en-00096-of-00100 news.en-00031-of-00100 news.en-00064-of-00100 news.en-00097-of-00100 news.en-00032-of-00100 news.en-00065-of-00100 news.en-00098-of-00100 news.en-00033-of-00100 news.en-00066-of-00100 news.en-00099-of-00100 root@dc8a73b89477:/workspace/nvidia-examples/big_lstm/data/1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled# exit exit [chibi@rhel8 ~]$ cat /etc/redhat-release Red Hat Enterprise Linux release 8.2 (Ootpa) [chibi@rhel8 ~]$ nvcc -V nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2019 NVIDIA Corporation Built on Wed_Oct_23_19:24:38_PDT_2019 Cuda compilation tools, release 10.2, V10.2.89 [chibi@rhel8 ~]$ sensors k10temp-pci-00c3 Adapter: PCI adapter Tdie: +34.5°C (high = +70.0°C) Tctl: +34.5°C [chibi@rhel8 ~]$ sudo nvme smart-log /dev/nvme0n1 [sudo] chibi のパスワード: Smart Log for NVME device:nvme0n1 namespace-id:ffffffff critical_warning : 0 temperature : 44 C available_spare : 100% available_spare_threshold : 0% percentage_used : 0% data_units_read : 559,343 data_units_written : 2,030,310 host_read_commands : 11,658,706 host_write_commands : 14,262,782 controller_busy_time : 478 power_cycles : 47 power_on_hours : 58 unsafe_shutdowns : 8 media_errors : 0 num_err_log_entries : 0 Warning Temperature Time : 0 Critical Composite Temperature Time : 0 Temperature Sensor 1 : 44 C Thermal Management T1 Trans Count : 0 Thermal Management T2 Trans Count : 0 Thermal Management T1 Total Time : 0 Thermal Management T2 Total Time : 0 [chibi@rhel8 ~]$ sudo nvme list Node SN Model Namespace Usage Format FW Rev ---------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- -------- /dev/nvme0n1 P02938115263 PLEXTOR PX-512M9PeG 1 512.11 GB / 512.11 GB 512 B + 0 B 1.07 [chibi@rhel8 ~]$ nvidia-smi nvlink -c GPU 0: GeForce RTX 2080 Ti (UUID: GPU-13277ce5-e1e9-0cb1-8cee-6c9e6618e774) GPU 1: GeForce RTX 2080 Ti (UUID: GPU-1ac935c2-557f-282e-14e5-3f749ffd63ac) GPU 2: TITAN RTX (UUID: GPU-5a71d61e-f130-637a-b33d-4df555b0ed88) GPU 3: TITAN RTX (UUID: GPU-7fb51c1d-c1e7-35cc-aad7-66971f05ddb7) [chibi@rhel8 ~]$ cat /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 23 model : 49 model name : AMD EPYC 7702P 64-Core Processor stepping : 0 microcode : 0x8301034 cpu MHz : 2136.994 cache size : 512 KB physical id : 0 siblings : 128 core id : 0 cpu cores : 64 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 16 wp : yes