[chibi@rhel8 ~]$ sudo nvidia-docker run --rm -ti nvcr.io/nvidia/tensorflow:19.04-py3 Unable to find image 'nvcr.io/nvidia/tensorflow:19.04-py3' locally 19.04-py3: Pulling from nvidia/tensorflow 34667c7e4631: Pulling fs layer d18d76a881a4: Pulling fs layer 119c7358fbfc: Pulling fs layer 2aaf13f3eff0: Waiting 202fa0f8874b: Waiting 3b700a61ede6: Waiting 87e6ca450d3f: Waiting a1e76dce1aec: Waiting 9b91fa2f9276: Waiting b5877a9add73: Pulling fs layer bab74df105f1: Waiting 534bbf505504: Waiting 4956bf3bbbb9: Waiting f4371944c97d: Waiting 4615a735431d: Waiting 5db2639932b5: Waiting 629d5c9d75a4: Waiting 8071b94b5429: Waiting 6eb8eba2ad5a: Waiting e32e86c15b8b: Waiting 08db5b51b243: Waiting f71ce95fb406: Waiting 3498ed8c5685: Waiting 62819d8896c1: Waiting 34bc85bf8bef: Waiting 4a95ca3431c4: Pulling fs layer 41bc2d0a4d4d: Waiting a2ceadc61854: Waiting 2d0c5308ff92: Waiting a531832992b8: Waiting b24a8fd8f2e1: Waiting 8d9313624ab7: Waiting e5cafe011f22: Pull complete eca19a329cd4: Pull complete 65ee50af0bcc: Pull complete 5f60ec8c32f4: Pull complete d7dcb657fa13: Pull complete 1f6ef6575fbe: Pull complete d1ef346a3015: Pull complete 4ef9cb404fd5: Pull complete f6797f45a018: Pull complete 1d4380527325: Pull complete 965f2629db02: Pull complete 5debff4c8c0a: Pull complete b3a3a9d82be6: Pull complete eac05f20b729: Pull complete 3ce0a7f80167: Pull complete 2a21e34a5784: Pull complete c1ccf19e258e: Pull complete 0b6ea9d0652b: Pull complete 307bc8c3f024: Pull complete ca75fd593a79: Pull complete 0cd3cdca1af7: Pull complete 48e857e9d372: Pull complete 3264ea403ca9: Pull complete Digest: sha256:aaebc136d5d50937362675c77afd908bd96cded68846f39163050a023c8a9851 Status: Downloaded newer image for nvcr.io/nvidia/tensorflow:19.04-py3 ================ == TensorFlow == ================ NVIDIA Release 19.04 (build 6132408) TensorFlow Version 1.13.1 Container image Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved. Copyright 2017-2019 The TensorFlow Authors. All rights reserved. Various files include modifications (c) NVIDIA CORPORATION. All rights reserved. NVIDIA modifications are covered by the license terms that apply to the underlying project or file. NOTE: MOFED driver for multi-node communication was not detected. Multi-node communication performance may be reduced. NOTE: The SHMEM allocation limit is set to the default of 64MB. This may be insufficient for TensorFlow. NVIDIA recommends the use of the following flags: nvidia-docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 ... root@d15f0a372786:/workspace# ls README.md docker-examples nvidia-examples root@d15f0a372786:/workspace# cd nvidia-examples root@d15f0a372786:/workspace/nvidia-examples# ls NCF bert cnn ssdv1.2 OpenSeq2Seq big_lstm gnmt_v2 tensorrt UNet_Industrial build_imagenet_data resnet50v1.5 root@d15f0a372786:/workspace/nvidia-examples# cd big_lstm root@d15f0a372786:/workspace/nvidia-examples/big_lstm# ls 1b_word_vocab.txt data_utils_test.py language_model_test.py README.md download_1b_words_data.sh model_utils.py __init__.py hparams.py run_utils.py common.py hparams_test.py single_lm_train.py data_utils.py language_model.py testdata root@d15f0a372786:/workspace/nvidia-examples/big_lstm# ./download_1b_words_data.sh Please specify root of dataset directory: data Success: dataset root dir validated --2020-04-08 16:57:20-- http://www.statmt.org/lm-benchmark/1-billion-word-language-modeling-benchmark-r13output.tar.gz Resolving www.statmt.org (www.statmt.org)... 129.215.197.184 Connecting to www.statmt.org (www.statmt.org)|129.215.197.184|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 1792209805 (1.7G) [application/x-gzip] Saving to: ‘1-billion-word-language-modeling-benchmark-r13output.tar.gz’ 1-billio 75%[==============> ] 1.26G 215KB/s eta 30m 59sT 1-billion-word-lang 100%[===================>] 1.67G 179KB/s in 2h 10m 2020-04-08 19:07:26 (224 KB/s) - ‘1-billion-word-language-modeling-benchmark-r13output.tar.gz’ saved [1792209805/1792209805] 1-billion-word-language-modeling-benchmark-r13output/ 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/ 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00024-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00057-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00055-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00096-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00081-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00033-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00072-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00082-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00018-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00008-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00059-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00005-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00091-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00062-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00031-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00095-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00076-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00006-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00038-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00015-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00087-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00021-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00049-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00009-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00027-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00056-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00046-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00032-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00029-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00088-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00085-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00011-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00012-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00067-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00003-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00093-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00050-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00053-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00044-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00019-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00066-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00028-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00045-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00039-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00071-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00052-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00078-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00037-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00002-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00014-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00048-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00017-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00004-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00077-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00080-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00020-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00051-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00016-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00079-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00043-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00068-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00099-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00064-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00034-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00054-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00040-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00070-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00063-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00041-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00083-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00061-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00073-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00094-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00030-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00060-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00035-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00023-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00042-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00025-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00090-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00089-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00065-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00075-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00022-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00026-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00098-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00084-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00010-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00069-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00013-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00092-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00036-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00097-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00007-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00074-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00001-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00047-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00086-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00058-of-00100 1-billion-word-language-modeling-benchmark-r13output/.svn/ 1-billion-word-language-modeling-benchmark-r13output/.svn/tmp/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/de/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/de/de102cd0c91cd19e6612f0840e68a2f20ba8134c.svn-base 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/de/deed1b75d3bd5cc36ae6aeb85d56680b892b7948.svn-base 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/86/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/86/86c58db52fbf362c5bc329afc33b8805085fcb0d.svn-base 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/9f/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/9f/9f2882e21f860a83ad6ea8898ebab140974ed301.svn-base 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/bc/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/bc/bcdbc523ee7488dc438cab869b6d5e236578dbfa.svn-base 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/d2/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/d2/d2718bc26d0ee0a213d7d4add99a304cb5b39ede.svn-base 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/c5/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/c5/c5b24f61479da923123d0394a188da922ea0359c.svn-base 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/11/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/11/116d6ea61730d8199127596b072e981338597779.svn-base 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/b0/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/b0/b0e26559cfe641245584a9400b35ba28d64f1411.svn-base 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/d3/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/d3/d3ae508e3bcb0e696dd70aecd052410f1f7afc1d.svn-base 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/9e/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/9e/9e148bd766e8805e0eb97eeae250433ec7a2e996.svn-base 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/31/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/31/31b645a482e0b81fda3c567cada307c6fcf7ec80.svn-base 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/da/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/da/da39a3ee5e6b4b0d3255bfef95601890afd80709.svn-base 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/c1/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/c1/c1ed42c415ec884e591fb5c70d373da640a383b5.svn-base 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/e3/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/e3/e37ba0f85e94073ccaced1eed7e4f5d737a25f49.svn-base 1-billion-word-language-modeling-benchmark-r13output/.svn/entries 1-billion-word-language-modeling-benchmark-r13output/.svn/format 1-billion-word-language-modeling-benchmark-r13output/.svn/wc.db 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/ 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00015-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00031-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00027-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00010-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00033-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00042-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00046-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00037-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00000-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00029-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00013-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00002-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00048-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00006-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00030-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00025-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00039-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00008-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00020-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00001-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00034-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00044-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00045-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00016-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00004-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00035-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00038-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00009-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00024-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00022-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00021-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00032-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00011-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00049-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00041-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00019-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00023-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00040-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00014-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00007-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00017-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00012-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00018-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00003-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00028-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en-00000-of-00100 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00043-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00005-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00036-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00026-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00047-of-00050 1-billion-word-language-modeling-benchmark-r13output/README Success! One billion words dataset ready at: data/1-billion-word-language-modeling-benchmark-r13output/ Please pass this dir to single_lm_train.py via the --datadir option. root@d15f0a372786:/workspace/nvidia-examples/big_lstm# df -hT Filesystem Type Size Used Avail Use% Mounted on overlay overlay 239G 26G 213G 11% / tmpfs tmpfs 64M 0 64M 0% /dev tmpfs tmpfs 32G 0 32G 0% /sys/fs/cgroup shm tmpfs 64M 0 64M 0% /dev/shm /dev/sda1 xfs 239G 26G 213G 11% /etc/hosts tmpfs tmpfs 32G 12K 32G 1% /proc/driver/nvidia devtmpfs devtmpfs 32G 0 32G 0% /dev/nvidia0 tmpfs tmpfs 32G 0 32G 0% /proc/asound tmpfs tmpfs 32G 0 32G 0% /proc/acpi tmpfs tmpfs 32G 0 32G 0% /proc/scsi tmpfs tmpfs 32G 0 32G 0% /sys/firmware root@d15f0a372786:/workspace/nvidia-examples/big_lstm# time python single_lm_train.py --mode=train --logdir=./logs --num_gpus=4 --datadir=./data/1-billion-word-language-modeling-benchmark-r13output WARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0. For more information, please see: * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md * https://github.com/tensorflow/addons If you depend on functionality not listed there, please file an issue. *****HYPER PARAMETERS***** {'num_layers': 1, 'run_profiler': False, 'optimizer': 0, 'state_size': 2048, 'num_delayed_steps': 150, 'vocab_size': 793470, 'keep_prob': 0.9, 'learning_rate': 0.2, 'max_time': 180, 'num_sampled': 8192, 'emb_size': 512, 'projected_size': 512, 'do_summaries': False, 'num_shards': 8, 'batch_size': 128, 'num_gpus': 4, 'average_params': True, 'num_steps': 20, 'max_grad_norm': 10.0} ************************** WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. WARNING:tensorflow:From /opt/tensorflow/nvidia-examples/big_lstm/model_utils.py:33: UniformUnitScaling.__init__ (from tensorflow.python.ops.init_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.initializers.variance_scaling instead with distribution=uniform to get equivalent behavior. WARNING:tensorflow:From /opt/tensorflow/nvidia-examples/big_lstm/language_model.py:75: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version. Instructions for updating: Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`. WARNING:tensorflow:From /opt/tensorflow/nvidia-examples/big_lstm/language_model.py:107: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/nn_impl.py:1444: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version. Instructions for updating: Create a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead. WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/array_grad.py:425: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. Current time: 1586372907.9878447 ALL VARIABLES WARNING:tensorflow:From /opt/tensorflow/nvidia-examples/big_lstm/run_utils.py:18: all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02. Instructions for updating: Please use tf.global_variables instead. model/emb_0:0 (99184, 512) /gpu:0 model/emb_1:0 (99184, 512) /gpu:0 model/emb_2:0 (99184, 512) /gpu:0 model/emb_3:0 (99184, 512) /gpu:0 model/emb_4:0 (99184, 512) /gpu:0 model/emb_5:0 (99184, 512) /gpu:0 model/emb_6:0 (99184, 512) /gpu:0 model/emb_7:0 (99184, 512) /gpu:0 model/lstm_0/LSTMCell/W_0:0 (1024, 8192) /gpu:0 model/lstm_0/LSTMCell/B:0 (8192,) /gpu:0 model/lstm_0/LSTMCell/W_P_0:0 (2048, 512) /gpu:0 model/softmax_w_0:0 (99184, 512) /gpu:0 model/softmax_w_1:0 (99184, 512) /gpu:0 model/softmax_w_2:0 (99184, 512) /gpu:0 model/softmax_w_3:0 (99184, 512) /gpu:0 model/softmax_w_4:0 (99184, 512) /gpu:0 model/softmax_w_5:0 (99184, 512) /gpu:0 model/softmax_w_6:0 (99184, 512) /gpu:0 model/softmax_w_7:0 (99184, 512) /gpu:0 model/softmax_b:0 (793470,) /gpu:0 model/global_step:0 () model/model/emb_0/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_1/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_2/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_3/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_4/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_5/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_6/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_7/Adagrad:0 (99184, 512) /gpu:0 model/model/lstm_0/LSTMCell/W_0/Adagrad:0 (1024, 8192) /gpu:0 model/model/lstm_0/LSTMCell/B/Adagrad:0 (8192,) /gpu:0 model/model/lstm_0/LSTMCell/W_P_0/Adagrad:0 (2048, 512) /gpu:0 model/model/softmax_w_0/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_1/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_2/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_3/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_4/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_5/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_6/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_7/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_b/Adagrad:0 (793470,) /gpu:0 model/model/lstm_0/LSTMCell/W_0/ExponentialMovingAverage:0 (1024, 8192) /gpu:0 model/model/lstm_0/LSTMCell/B/ExponentialMovingAverage:0 (8192,) /gpu:0 model/model/lstm_0/LSTMCell/W_P_0/ExponentialMovingAverage:0 (2048, 512) /gpu:0 TRAINABLE VARIABLES model/emb_0:0 (99184, 512) /gpu:0 model/emb_1:0 (99184, 512) /gpu:0 model/emb_2:0 (99184, 512) /gpu:0 model/emb_3:0 (99184, 512) /gpu:0 model/emb_4:0 (99184, 512) /gpu:0 model/emb_5:0 (99184, 512) /gpu:0 model/emb_6:0 (99184, 512) /gpu:0 model/emb_7:0 (99184, 512) /gpu:0 model/lstm_0/LSTMCell/W_0:0 (1024, 8192) /gpu:0 model/lstm_0/LSTMCell/B:0 (8192,) /gpu:0 model/lstm_0/LSTMCell/W_P_0:0 (2048, 512) /gpu:0 model/softmax_w_0:0 (99184, 512) /gpu:0 model/softmax_w_1:0 (99184, 512) /gpu:0 model/softmax_w_2:0 (99184, 512) /gpu:0 model/softmax_w_3:0 (99184, 512) /gpu:0 model/softmax_w_4:0 (99184, 512) /gpu:0 model/softmax_w_5:0 (99184, 512) /gpu:0 model/softmax_w_6:0 (99184, 512) /gpu:0 model/softmax_w_7:0 (99184, 512) /gpu:0 model/softmax_b:0 (793470,) /gpu:0 LOCAL VARIABLES model/model/state_0_0:0 (128, 2560) /gpu:0 model/model_1/state_1_0:0 (128, 2560) /gpu:1 model/model_2/state_2_0:0 (128, 2560) /gpu:2 model/model_3/state_3_0:0 (128, 2560) /gpu:3 WARNING:tensorflow:From /opt/tensorflow/nvidia-examples/big_lstm/run_utils.py:32: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.train.MonitoredTrainingSession 2020-04-08 19:08:28.943470: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2200075000 Hz 2020-04-08 19:08:28.949346: I tensorflow/compiler/xla/service/service.cc:161] XLA service 0xa869b20 executing computations on platform Host. Devices: 2020-04-08 19:08:28.949379: I tensorflow/compiler/xla/service/service.cc:168] StreamExecutor device (0): , 2020-04-08 19:08:30.229067: I tensorflow/compiler/xla/service/service.cc:161] XLA service 0xa868980 executing computations on platform CUDA. Devices: 2020-04-08 19:08:30.229113: I tensorflow/compiler/xla/service/service.cc:168] StreamExecutor device (0): TITAN RTX, Compute Capability 7.5 2020-04-08 19:08:30.229123: I tensorflow/compiler/xla/service/service.cc:168] StreamExecutor device (1): TITAN RTX, Compute Capability 7.5 2020-04-08 19:08:30.229133: I tensorflow/compiler/xla/service/service.cc:168] StreamExecutor device (2): GeForce RTX 2080 Ti, Compute Capability 7.5 2020-04-08 19:08:30.229141: I tensorflow/compiler/xla/service/service.cc:168] StreamExecutor device (3): GeForce RTX 2080 Ti, Compute Capability 7.5 2020-04-08 19:08:30.230587: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: TITAN RTX major: 7 minor: 5 memoryClockRate(GHz): 1.77 pciBusID: 0000:02:00.0 totalMemory: 23.65GiB freeMemory: 23.26GiB 2020-04-08 19:08:30.230670: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 1 with properties: name: TITAN RTX major: 7 minor: 5 memoryClockRate(GHz): 1.77 pciBusID: 0000:03:00.0 totalMemory: 23.65GiB freeMemory: 23.48GiB 2020-04-08 19:08:30.230711: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 2 with properties: name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.635 pciBusID: 0000:82:00.0 totalMemory: 10.76GiB freeMemory: 10.60GiB 2020-04-08 19:08:30.230751: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 3 with properties: name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.635 pciBusID: 0000:83:00.0 totalMemory: 10.76GiB freeMemory: 10.60GiB 2020-04-08 19:08:30.230954: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0, 1, 2, 3 2020-04-08 19:08:31.496747: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2020-04-08 19:08:31.496803: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 1 2 3 2020-04-08 19:08:31.496813: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N N N N 2020-04-08 19:08:31.496819: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 1: N N N N 2020-04-08 19:08:31.496826: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 2: N N N N 2020-04-08 19:08:31.496886: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 3: N N N N 2020-04-08 19:08:31.497114: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 22545 MB memory) -> physical GPU (device: 0, name: TITAN RTX, pci bus id: 0000:02:00.0, compute capability: 7.5) 2020-04-08 19:08:31.497377: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 22757 MB memory) -> physical GPU (device: 1, name: TITAN RTX, pci bus id: 0000:03:00.0, compute capability: 7.5) 2020-04-08 19:08:31.497549: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 10224 MB memory) -> physical GPU (device: 2, name: GeForce RTX 2080 Ti, pci bus id: 0000:82:00.0, compute capability: 7.5) 2020-04-08 19:08:31.498187: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 10224 MB memory) -> physical GPU (device: 3, name: GeForce RTX 2080 Ti, pci bus id: 0000:83:00.0, compute capability: 7.5) Processing file: ./data/1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00090-of-00100 Finished processing! 2020-04-08 19:09:12.668020: I tensorflow/stream_executor/dso_loader.cc:153] successfully opened CUDA library libcublas.so.10 locally Iteration 1, time = 22.07s, wps = 464, train loss = 12.9918 Iteration 2, time = 18.84s, wps = 544, train loss = 12.9718 Iteration 3, time = 0.11s, wps = 93841, train loss = 12.8202 Iteration 4, time = 0.10s, wps = 97786, train loss = 11.3564 Iteration 5, time = 0.11s, wps = 95916, train loss = 112.6045 Iteration 6, time = 0.10s, wps = 104523, train loss = 57.9972 Iteration 7, time = 0.11s, wps = 92589, train loss = 14.9993 Iteration 8, time = 0.10s, wps = 99343, train loss = 47.8161 Iteration 9, time = 0.11s, wps = 95486, train loss = 13.9594 Iteration 20, time = 1.12s, wps = 100311, train loss = 11.4091 Iteration 40, time = 2.08s, wps = 98402, train loss = 9.8444 Iteration 60, time = 2.06s, wps = 99576, train loss = 8.4549 Iteration 80, time = 2.06s, wps = 99197, train loss = 8.5005 Iteration 100, time = 2.07s, wps = 99140, train loss = 7.6186 Iteration 120, time = 2.06s, wps = 99199, train loss = 7.5319 Iteration 140, time = 2.09s, wps = 97860, train loss = 6.8477 Iteration 160, time = 2.06s, wps = 99659, train loss = 6.8928 Iteration 180, time = 2.08s, wps = 98349, train loss = 6.5199 Iteration 200, time = 2.05s, wps = 99694, train loss = 6.4047 Iteration 220, time = 2.06s, wps = 99352, train loss = 6.2545 Iteration 240, time = 2.05s, wps = 100093, train loss = 6.2418 Iteration 260, time = 2.06s, wps = 99453, train loss = 6.1291 Iteration 280, time = 2.08s, wps = 98489, train loss = 6.1268 Iteration 300, time = 2.06s, wps = 99337, train loss = 6.1012 Iteration 320, time = 2.11s, wps = 97204, train loss = 5.9627 Iteration 340, time = 2.08s, wps = 98308, train loss = 5.9448 Iteration 360, time = 2.08s, wps = 98377, train loss = 5.8519 Iteration 380, time = 2.06s, wps = 99403, train loss = 5.8916 Iteration 400, time = 2.06s, wps = 99205, train loss = 5.8172 Iteration 420, time = 2.05s, wps = 99880, train loss = 5.7832 Iteration 440, time = 2.06s, wps = 99267, train loss = 5.8620 Iteration 460, time = 2.07s, wps = 98724, train loss = 5.7603 Iteration 480, time = 2.07s, wps = 98789, train loss = 5.7345 Iteration 500, time = 2.07s, wps = 99062, train loss = 5.6593 Iteration 520, time = 2.07s, wps = 99121, train loss = 5.6419 Iteration 540, time = 2.03s, wps = 100702, train loss = 5.6443 Iteration 560, time = 2.06s, wps = 99494, train loss = 5.6184 Iteration 580, time = 2.09s, wps = 97957, train loss = 5.5860 Iteration 600, time = 2.08s, wps = 98656, train loss = 5.5418 Iteration 620, time = 2.05s, wps = 99764, train loss = 5.4871 Iteration 640, time = 2.07s, wps = 98826, train loss = 5.5347 Iteration 660, time = 2.05s, wps = 99998, train loss = 5.5534 Iteration 680, time = 2.08s, wps = 98423, train loss = 5.5311 Iteration 700, time = 2.05s, wps = 100077, train loss = 5.4512 Iteration 720, time = 2.04s, wps = 100309, train loss = 5.3668 Iteration 740, time = 2.05s, wps = 99936, train loss = 5.4503 Iteration 760, time = 2.05s, wps = 99693, train loss = 5.3940 Iteration 780, time = 2.08s, wps = 98235, train loss = 5.3769 Processing file: ./data/1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00063-of-00100 Finished processing! Iteration 800, time = 4.27s, wps = 47989, train loss = 5.3729 Iteration 820, time = 2.07s, wps = 98863, train loss = 5.2877 Iteration 840, time = 2.08s, wps = 98602, train loss = 5.3134 Iteration 860, time = 2.07s, wps = 98828, train loss = 5.3081 Iteration 880, time = 2.07s, wps = 99114, train loss = 5.2930 Iteration 900, time = 2.08s, wps = 98617, train loss = 5.3119 Iteration 920, time = 2.08s, wps = 98399, train loss = 5.2599 Iteration 940, time = 2.06s, wps = 99651, train loss = 5.2640 Iteration 960, time = 2.10s, wps = 97733, train loss = 5.2944 Iteration 980, time = 2.06s, wps = 99281, train loss = 5.2124 Iteration 1000, time = 2.06s, wps = 99526, train loss = 5.3017 Iteration 1020, time = 2.05s, wps = 99862, train loss = 5.2014 Iteration 1040, time = 2.08s, wps = 98313, train loss = 5.1441 Iteration 1060, time = 2.04s, wps = 100407, train loss = 5.1589 Iteration 1080, time = 2.08s, wps = 98361, train loss = 5.2334 /usr/local/lib/python3.5/dist-packages/tensorflow/python/summary/writer/writer.py:386: UserWarning: Attempting to use a closed FileWriter. The operation will be a noop unless the FileWriter is explicitly reopened. warnings.warn("Attempting to use a closed FileWriter. " real 3m22.462s user 16m13.484s sys 2m0.365s root@d15f0a372786:/workspace/nvidia-examples/big_lstm# time python single_lm_train.py --mode=train --logdir=./logs --num_gpus=2 --datadir=./data/1-billion-word- language-modeling-benchmark-r13output WARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0. For more information, please see: * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md * https://github.com/tensorflow/addons If you depend on functionality not listed there, please file an issue. *****HYPER PARAMETERS***** {'num_gpus': 2, 'num_shards': 8, 'run_profiler': False, 'learning_rate': 0.2, 'num_delayed_steps': 150, 'emb_size': 512, 'batch_size': 128, 'projected_size': 512, 'max_grad_norm': 10.0, 'optimizer': 0, 'num_steps': 20, 'do_summaries': False, 'keep_prob': 0.9, 'state_size': 2048, 'max_time': 180, 'num_sampled': 8192, 'average_params': True, 'num_layers': 1, 'vocab_size': 793470} ************************** WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. WARNING:tensorflow:From /opt/tensorflow/nvidia-examples/big_lstm/model_utils.py:33: UniformUnitScaling.__init__ (from tensorflow.python.ops.init_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.initializers.variance_scaling instead with distribution=uniform to get equivalent behavior. WARNING:tensorflow:From /opt/tensorflow/nvidia-examples/big_lstm/language_model.py:75: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version. Instructions for updating: Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`. WARNING:tensorflow:From /opt/tensorflow/nvidia-examples/big_lstm/language_model.py:107: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/nn_impl.py:1444: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version. Instructions for updating: Create a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead. WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/array_grad.py:425: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. Current time: 1586373710.4204924 ALL VARIABLES WARNING:tensorflow:From /opt/tensorflow/nvidia-examples/big_lstm/run_utils.py:18: all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02. Instructions for updating: Please use tf.global_variables instead. model/emb_0:0 (99184, 512) /gpu:0 model/emb_1:0 (99184, 512) /gpu:0 model/emb_2:0 (99184, 512) /gpu:0 model/emb_3:0 (99184, 512) /gpu:0 model/emb_4:0 (99184, 512) /gpu:0 model/emb_5:0 (99184, 512) /gpu:0 model/emb_6:0 (99184, 512) /gpu:0 model/emb_7:0 (99184, 512) /gpu:0 model/lstm_0/LSTMCell/W_0:0 (1024, 8192) /gpu:0 model/lstm_0/LSTMCell/B:0 (8192,) /gpu:0 model/lstm_0/LSTMCell/W_P_0:0 (2048, 512) /gpu:0 model/softmax_w_0:0 (99184, 512) /gpu:0 model/softmax_w_1:0 (99184, 512) /gpu:0 model/softmax_w_2:0 (99184, 512) /gpu:0 model/softmax_w_3:0 (99184, 512) /gpu:0 model/softmax_w_4:0 (99184, 512) /gpu:0 model/softmax_w_5:0 (99184, 512) /gpu:0 model/softmax_w_6:0 (99184, 512) /gpu:0 model/softmax_w_7:0 (99184, 512) /gpu:0 model/softmax_b:0 (793470,) /gpu:0 model/global_step:0 () model/model/emb_0/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_1/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_2/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_3/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_4/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_5/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_6/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_7/Adagrad:0 (99184, 512) /gpu:0 model/model/lstm_0/LSTMCell/W_0/Adagrad:0 (1024, 8192) /gpu:0 model/model/lstm_0/LSTMCell/B/Adagrad:0 (8192,) /gpu:0 model/model/lstm_0/LSTMCell/W_P_0/Adagrad:0 (2048, 512) /gpu:0 model/model/softmax_w_0/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_1/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_2/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_3/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_4/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_5/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_6/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_7/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_b/Adagrad:0 (793470,) /gpu:0 model/model/lstm_0/LSTMCell/W_0/ExponentialMovingAverage:0 (1024, 8192) /gpu:0 model/model/lstm_0/LSTMCell/B/ExponentialMovingAverage:0 (8192,) /gpu:0 model/model/lstm_0/LSTMCell/W_P_0/ExponentialMovingAverage:0 (2048, 512) /gpu:0 TRAINABLE VARIABLES model/emb_0:0 (99184, 512) /gpu:0 model/emb_1:0 (99184, 512) /gpu:0 model/emb_2:0 (99184, 512) /gpu:0 model/emb_3:0 (99184, 512) /gpu:0 model/emb_4:0 (99184, 512) /gpu:0 model/emb_5:0 (99184, 512) /gpu:0 model/emb_6:0 (99184, 512) /gpu:0 model/emb_7:0 (99184, 512) /gpu:0 model/lstm_0/LSTMCell/W_0:0 (1024, 8192) /gpu:0 model/lstm_0/LSTMCell/B:0 (8192,) /gpu:0 model/lstm_0/LSTMCell/W_P_0:0 (2048, 512) /gpu:0 model/softmax_w_0:0 (99184, 512) /gpu:0 model/softmax_w_1:0 (99184, 512) /gpu:0 model/softmax_w_2:0 (99184, 512) /gpu:0 model/softmax_w_3:0 (99184, 512) /gpu:0 model/softmax_w_4:0 (99184, 512) /gpu:0 model/softmax_w_5:0 (99184, 512) /gpu:0 model/softmax_w_6:0 (99184, 512) /gpu:0 model/softmax_w_7:0 (99184, 512) /gpu:0 model/softmax_b:0 (793470,) /gpu:0 LOCAL VARIABLES model/model/state_0_0:0 (128, 2560) /gpu:0 model/model_1/state_1_0:0 (128, 2560) /gpu:1 WARNING:tensorflow:From /opt/tensorflow/nvidia-examples/big_lstm/run_utils.py:32: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.train.MonitoredTrainingSession 2020-04-08 19:21:50.980348: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2200075000 Hz 2020-04-08 19:21:50.985979: I tensorflow/compiler/xla/service/service.cc:161] XLA service 0x92de520 executing computations on platform Host. Devices: 2020-04-08 19:21:50.986012: I tensorflow/compiler/xla/service/service.cc:168] StreamExecutor device (0): , 2020-04-08 19:21:52.306153: I tensorflow/compiler/xla/service/service.cc:161] XLA service 0x92ddf40 executing computations on platform CUDA. Devices: 2020-04-08 19:21:52.306190: I tensorflow/compiler/xla/service/service.cc:168] StreamExecutor device (0): TITAN RTX, Compute Capability 7.5 2020-04-08 19:21:52.306220: I tensorflow/compiler/xla/service/service.cc:168] StreamExecutor device (1): TITAN RTX, Compute Capability 7.5 2020-04-08 19:21:52.306229: I tensorflow/compiler/xla/service/service.cc:168] StreamExecutor device (2): GeForce RTX 2080 Ti, Compute Capability 7.5 2020-04-08 19:21:52.306238: I tensorflow/compiler/xla/service/service.cc:168] StreamExecutor device (3): GeForce RTX 2080 Ti, Compute Capability 7.5 2020-04-08 19:21:52.307554: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: TITAN RTX major: 7 minor: 5 memoryClockRate(GHz): 1.77 pciBusID: 0000:02:00.0 totalMemory: 23.65GiB freeMemory: 23.26GiB 2020-04-08 19:21:52.307651: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 1 with properties: name: TITAN RTX major: 7 minor: 5 memoryClockRate(GHz): 1.77 pciBusID: 0000:03:00.0 totalMemory: 23.65GiB freeMemory: 23.48GiB 2020-04-08 19:21:52.307691: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 2 with properties: name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.635 pciBusID: 0000:82:00.0 totalMemory: 10.76GiB freeMemory: 10.60GiB 2020-04-08 19:21:52.307732: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 3 with properties: name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.635 pciBusID: 0000:83:00.0 totalMemory: 10.76GiB freeMemory: 10.60GiB 2020-04-08 19:21:52.307954: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0, 1, 2, 3 2020-04-08 19:21:53.554183: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2020-04-08 19:21:53.554258: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 1 2 3 2020-04-08 19:21:53.554268: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N N N N 2020-04-08 19:21:53.554276: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 1: N N N N 2020-04-08 19:21:53.554283: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 2: N N N N 2020-04-08 19:21:53.554307: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 3: N N N N 2020-04-08 19:21:53.554525: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 22545 MB memory) -> physical GPU (device: 0, name: TITAN RTX, pci bus id: 0000:02:00.0, compute capability: 7.5) 2020-04-08 19:21:53.555321: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 22757 MB memory) -> physical GPU (device: 1, name: TITAN RTX, pci bus id: 0000:03:00.0, compute capability: 7.5) 2020-04-08 19:21:53.555524: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 10224 MB memory) -> physical GPU (device: 2, name: GeForce RTX 2080 Ti, pci bus id: 0000:82:00.0, compute capability: 7.5) 2020-04-08 19:21:53.556210: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 10224 MB memory) -> physical GPU (device: 3, name: GeForce RTX 2080 Ti, pci bus id: 0000:83:00.0, compute capability: 7.5) WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py:1266: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version. Instructions for updating: Use standard file APIs to check for files with this prefix. WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py:1070: get_checkpoint_mtimes (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version. Instructions for updating: Use standard file utilities to get mtimes. Processing file: ./data/1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00065-of-00100 Finished processing! 2020-04-08 19:22:11.621355: I tensorflow/stream_executor/dso_loader.cc:153] successfully opened CUDA library libcublas.so.10 locally Iteration 1094, time = 10.98s, wps = 466, train loss = 6.7827 Iteration 1095, time = 8.51s, wps = 602, train loss = 6.3364 Iteration 1096, time = 0.08s, wps = 61684, train loss = 5.6167 Iteration 1097, time = 0.09s, wps = 60003, train loss = 5.3081 Iteration 1098, time = 0.08s, wps = 66100, train loss = 5.3154 Iteration 1099, time = 0.08s, wps = 65104, train loss = 5.2341 Iteration 1100, time = 0.08s, wps = 65852, train loss = 5.2812 Iteration 1101, time = 0.08s, wps = 63809, train loss = 5.2625 Iteration 1102, time = 0.08s, wps = 62343, train loss = 5.1965 Iteration 1113, time = 0.85s, wps = 66413, train loss = 5.1497 Iteration 1133, time = 1.54s, wps = 66291, train loss = 5.1519 Iteration 1153, time = 1.53s, wps = 66915, train loss = 5.0841 Iteration 1173, time = 1.52s, wps = 67278, train loss = 5.0527 Iteration 1193, time = 1.52s, wps = 67448, train loss = 5.0813 Iteration 1213, time = 1.52s, wps = 67231, train loss = 5.0914 Iteration 1233, time = 1.52s, wps = 67181, train loss = 5.1241 Iteration 1253, time = 1.53s, wps = 66911, train loss = 5.0085 Iteration 1273, time = 1.52s, wps = 67509, train loss = 5.0294 Iteration 1293, time = 1.50s, wps = 68251, train loss = 5.1676 Iteration 1313, time = 1.52s, wps = 67291, train loss = 5.0994 Iteration 1333, time = 1.51s, wps = 67936, train loss = 5.0452 Iteration 1353, time = 1.53s, wps = 66758, train loss = 5.0656 Iteration 1373, time = 1.52s, wps = 67484, train loss = 4.9973 Iteration 1393, time = 1.53s, wps = 67077, train loss = 5.0474 Iteration 1413, time = 1.51s, wps = 67768, train loss = 5.0069 Iteration 1433, time = 1.51s, wps = 67759, train loss = 5.0032 Iteration 1453, time = 1.50s, wps = 68211, train loss = 5.0730 Iteration 1473, time = 1.52s, wps = 67435, train loss = 5.0175 Iteration 1493, time = 1.53s, wps = 66896, train loss = 4.9558 Iteration 1513, time = 1.51s, wps = 67682, train loss = 4.9683 Iteration 1533, time = 1.54s, wps = 66402, train loss = 5.0538 Iteration 1553, time = 1.50s, wps = 68345, train loss = 4.9766 Iteration 1573, time = 1.49s, wps = 68872, train loss = 4.9805 Iteration 1593, time = 1.53s, wps = 67129, train loss = 5.0303 Iteration 1613, time = 1.53s, wps = 66849, train loss = 5.0078 Iteration 1633, time = 1.51s, wps = 67692, train loss = 4.9945 Iteration 1653, time = 1.53s, wps = 67052, train loss = 4.8770 Iteration 1673, time = 1.51s, wps = 67841, train loss = 4.8614 Iteration 1693, time = 1.52s, wps = 67380, train loss = 4.9773 Iteration 1713, time = 1.52s, wps = 67579, train loss = 4.9668 Iteration 1733, time = 1.55s, wps = 66207, train loss = 5.0140 Iteration 1753, time = 1.53s, wps = 66931, train loss = 5.0079 Iteration 1773, time = 1.51s, wps = 67757, train loss = 4.9451 Iteration 1793, time = 1.51s, wps = 67985, train loss = 4.8996 Iteration 1813, time = 1.50s, wps = 68059, train loss = 4.9134 Iteration 1833, time = 1.51s, wps = 67861, train loss = 4.9582 Iteration 1853, time = 1.53s, wps = 67019, train loss = 4.8718 Iteration 1873, time = 1.50s, wps = 68048, train loss = 4.8518 Iteration 1893, time = 1.50s, wps = 68183, train loss = 4.9371 Iteration 1913, time = 1.53s, wps = 66906, train loss = 4.8951 Iteration 1933, time = 1.50s, wps = 68215, train loss = 4.8573 Iteration 1953, time = 1.49s, wps = 68506, train loss = 4.8141 Iteration 1973, time = 1.51s, wps = 67707, train loss = 4.8450 Iteration 1993, time = 1.51s, wps = 68009, train loss = 4.9198 Iteration 2013, time = 1.51s, wps = 67955, train loss = 4.8767 Iteration 2033, time = 1.50s, wps = 68328, train loss = 4.9069 Iteration 2053, time = 1.51s, wps = 67887, train loss = 4.8980 Iteration 2073, time = 1.50s, wps = 68372, train loss = 4.7764 Iteration 2093, time = 1.50s, wps = 68073, train loss = 4.8326 Iteration 2113, time = 1.51s, wps = 67835, train loss = 4.7513 Iteration 2133, time = 1.54s, wps = 66474, train loss = 4.7810 Iteration 2153, time = 1.51s, wps = 67942, train loss = 4.8577 Iteration 2173, time = 1.51s, wps = 67960, train loss = 4.7965 Iteration 2193, time = 1.51s, wps = 67803, train loss = 4.7775 Iteration 2213, time = 1.51s, wps = 67847, train loss = 4.9523 Iteration 2233, time = 1.51s, wps = 67628, train loss = 4.8503 Iteration 2253, time = 1.54s, wps = 66621, train loss = 4.7671 Iteration 2273, time = 1.54s, wps = 66331, train loss = 4.7813 Iteration 2293, time = 1.54s, wps = 66698, train loss = 4.7768 Iteration 2313, time = 1.53s, wps = 67031, train loss = 4.8374 Iteration 2333, time = 1.49s, wps = 68778, train loss = 4.8252 Iteration 2353, time = 1.51s, wps = 67898, train loss = 4.7972 Iteration 2373, time = 1.52s, wps = 67258, train loss = 4.8398 Iteration 2393, time = 1.51s, wps = 67680, train loss = 4.7957 Iteration 2413, time = 1.54s, wps = 66478, train loss = 4.7924 Iteration 2433, time = 1.52s, wps = 67335, train loss = 4.7476 Iteration 2453, time = 1.50s, wps = 68404, train loss = 4.7206 Iteration 2473, time = 1.51s, wps = 67712, train loss = 4.7337 Iteration 2493, time = 1.52s, wps = 67490, train loss = 4.7050 Iteration 2513, time = 1.52s, wps = 67514, train loss = 4.8185 Iteration 2533, time = 1.52s, wps = 67423, train loss = 4.7101 Iteration 2553, time = 1.53s, wps = 67025, train loss = 4.8144 Iteration 2573, time = 1.50s, wps = 68093, train loss = 4.7985 Iteration 2593, time = 1.52s, wps = 67365, train loss = 4.6783 Iteration 2613, time = 1.50s, wps = 68118, train loss = 4.7983 Iteration 2633, time = 1.51s, wps = 68026, train loss = 4.7846 Iteration 2653, time = 1.49s, wps = 68843, train loss = 4.7417 Processing file: ./data/1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00062-of-00100 Finished processing! Iteration 2673, time = 3.69s, wps = 27726, train loss = 4.6568 Iteration 2693, time = 1.52s, wps = 67158, train loss = 4.8170 Iteration 2713, time = 1.51s, wps = 67642, train loss = 4.7130 Iteration 2733, time = 1.52s, wps = 67250, train loss = 4.7274 Iteration 2753, time = 1.52s, wps = 67173, train loss = 4.7009 Iteration 2773, time = 1.51s, wps = 68001, train loss = 4.6991 Iteration 2793, time = 1.53s, wps = 67048, train loss = 4.7744 Iteration 2813, time = 1.51s, wps = 67783, train loss = 4.6640 Iteration 2833, time = 1.50s, wps = 68379, train loss = 4.7199 Iteration 2853, time = 1.54s, wps = 66448, train loss = 4.7327 Iteration 2873, time = 1.53s, wps = 67062, train loss = 4.7226 Iteration 2893, time = 1.47s, wps = 69480, train loss = 4.7301 Iteration 2913, time = 1.50s, wps = 68309, train loss = 4.7513 Iteration 2933, time = 1.52s, wps = 67526, train loss = 4.7736 Iteration 2953, time = 1.50s, wps = 68175, train loss = 4.7516 Iteration 2973, time = 1.50s, wps = 68063, train loss = 4.7624 Iteration 2993, time = 1.49s, wps = 68766, train loss = 4.6963 Iteration 3013, time = 1.49s, wps = 68735, train loss = 4.7237 Iteration 3033, time = 1.49s, wps = 68632, train loss = 4.6537 /usr/local/lib/python3.5/dist-packages/tensorflow/python/summary/writer/writer.py:386: UserWarning: Attempting to use a closed FileWriter. The operation will be a noop unless the FileWriter is explicitly reopened. warnings.warn("Attempting to use a closed FileWriter. " real 3m16.965s user 14m35.362s sys 1m38.223s root@d15f0a372786:/workspace/nvidia-examples/big_lstm# cat /etc/os-release NAME="Ubuntu" VERSION="16.04.6 LTS (Xenial Xerus)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 16.04.6 LTS" VERSION_ID="16.04" HOME_URL="http://www.ubuntu.com/" SUPPORT_URL="http://help.ubuntu.com/" BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/" VERSION_CODENAME=xenial UBUNTU_CODENAME=xenial root@d15f0a372786:/workspace/nvidia-examples/big_lstm# nvcc -V nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2019 NVIDIA Corporation Built on Fri_Feb__8_19:08:17_PST_2019 Cuda compilation tools, release 10.1, V10.1.105 root@d15f0a372786:/workspace/nvidia-examples/big_lstm# cd data root@d15f0a372786:/workspace/nvidia-examples/big_lstm/data# ls 1-billion-word-language-modeling-benchmark-r13output root@d15f0a372786:/workspace/nvidia-examples/big_lstm/data# cd 1-billion-word-language-modeling-benchmark-r13output root@d15f0a372786:/workspace/nvidia-examples/big_lstm/data/1-billion-word-language-modeling-benchmark-r13output# ls 1b_word_vocab.txt heldout-monolingual.tokenized.shuffled README training-monolingual.tokenized.shuffled root@d15f0a372786:/workspace/nvidia-examples/big_lstm/data/1-billion-word-language-modeling-benchmark-r13output# cd training-monolingual.tokenized.shuffled root@d15f0a372786:/workspace/nvidia-examples/big_lstm/data/1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled# ls news.en-00001-of-00100 news.en-00034-of-00100 news.en-00067-of-00100 news.en-00002-of-00100 news.en-00035-of-00100 news.en-00068-of-00100 news.en-00003-of-00100 news.en-00036-of-00100 news.en-00069-of-00100 news.en-00004-of-00100 news.en-00037-of-00100 news.en-00070-of-00100 news.en-00005-of-00100 news.en-00038-of-00100 news.en-00071-of-00100 news.en-00006-of-00100 news.en-00039-of-00100 news.en-00072-of-00100 news.en-00007-of-00100 news.en-00040-of-00100 news.en-00073-of-00100 news.en-00008-of-00100 news.en-00041-of-00100 news.en-00074-of-00100 news.en-00009-of-00100 news.en-00042-of-00100 news.en-00075-of-00100 news.en-00010-of-00100 news.en-00043-of-00100 news.en-00076-of-00100 news.en-00011-of-00100 news.en-00044-of-00100 news.en-00077-of-00100 news.en-00012-of-00100 news.en-00045-of-00100 news.en-00078-of-00100 news.en-00013-of-00100 news.en-00046-of-00100 news.en-00079-of-00100 news.en-00014-of-00100 news.en-00047-of-00100 news.en-00080-of-00100 news.en-00015-of-00100 news.en-00048-of-00100 news.en-00081-of-00100 news.en-00016-of-00100 news.en-00049-of-00100 news.en-00082-of-00100 news.en-00017-of-00100 news.en-00050-of-00100 news.en-00083-of-00100 news.en-00018-of-00100 news.en-00051-of-00100 news.en-00084-of-00100 news.en-00019-of-00100 news.en-00052-of-00100 news.en-00085-of-00100 news.en-00020-of-00100 news.en-00053-of-00100 news.en-00086-of-00100 news.en-00021-of-00100 news.en-00054-of-00100 news.en-00087-of-00100 news.en-00022-of-00100 news.en-00055-of-00100 news.en-00088-of-00100 news.en-00023-of-00100 news.en-00056-of-00100 news.en-00089-of-00100 news.en-00024-of-00100 news.en-00057-of-00100 news.en-00090-of-00100 news.en-00025-of-00100 news.en-00058-of-00100 news.en-00091-of-00100 news.en-00026-of-00100 news.en-00059-of-00100 news.en-00092-of-00100 news.en-00027-of-00100 news.en-00060-of-00100 news.en-00093-of-00100 news.en-00028-of-00100 news.en-00061-of-00100 news.en-00094-of-00100 news.en-00029-of-00100 news.en-00062-of-00100 news.en-00095-of-00100 news.en-00030-of-00100 news.en-00063-of-00100 news.en-00096-of-00100 news.en-00031-of-00100 news.en-00064-of-00100 news.en-00097-of-00100 news.en-00032-of-00100 news.en-00065-of-00100 news.en-00098-of-00100 news.en-00033-of-00100 news.en-00066-of-00100 news.en-00099-of-00100 root@d15f0a372786:/workspace/nvidia-examples/big_lstm/data/1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled# exit exit [chibi@rhel8 ~]$ cat /etc/redhat-release Red Hat Enterprise Linux release 8.2 Beta (Ootpa) [chibi@rhel8 ~]$ nvcc -V nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2019 NVIDIA Corporation Built on Wed_Oct_23_19:24:38_PDT_2019 Cuda compilation tools, release 10.2, V10.2.89 [chibi@rhel8 ~]$ sudo hddtemp /dev/sda [sudo] chibi のパスワード: /dev/sda: Samsung SSD 840 PRO Series: 22°C [chibi@rhel8 ~]$ nvidia-smi nvlink -c GPU 0: TITAN RTX (UUID: GPU-5a71d61e-f130-637a-b33d-4df555b0ed88) GPU 1: TITAN RTX (UUID: GPU-7fb51c1d-c1e7-35cc-aad7-66971f05ddb7) GPU 2: GeForce RTX 2080 Ti (UUID: GPU-13277ce5-e1e9-0cb1-8cee-6c9e6618e774) GPU 3: GeForce RTX 2080 Ti (UUID: GPU-1ac935c2-557f-282e-14e5-3f749ffd63ac) [chibi@rhel8 ~]$ sensors coretemp-isa-0000 Adapter: ISA adapter Package id 0: +20.0°C (high = +85.0°C, crit = +95.0°C) Core 0: +16.0°C (high = +85.0°C, crit = +95.0°C) Core 1: +16.0°C (high = +85.0°C, crit = +95.0°C) Core 2: +17.0°C (high = +85.0°C, crit = +95.0°C) Core 3: +16.0°C (high = +85.0°C, crit = +95.0°C) Core 4: +17.0°C (high = +85.0°C, crit = +95.0°C) Core 5: +16.0°C (high = +85.0°C, crit = +95.0°C) Core 8: +16.0°C (high = +85.0°C, crit = +95.0°C) Core 9: +16.0°C (high = +85.0°C, crit = +95.0°C) Core 10: +16.0°C (high = +85.0°C, crit = +95.0°C) Core 11: +17.0°C (high = +85.0°C, crit = +95.0°C) Core 12: +16.0°C (high = +85.0°C, crit = +95.0°C) Core 13: +16.0°C (high = +85.0°C, crit = +95.0°C) i350bb-pci-8100 Adapter: PCI adapter loc1: +33.0°C (high = +120.0°C, crit = +110.0°C) coretemp-isa-0001 Adapter: ISA adapter Package id 1: +19.0°C (high = +85.0°C, crit = +95.0°C) Core 0: +15.0°C (high = +85.0°C, crit = +95.0°C) Core 1: +15.0°C (high = +85.0°C, crit = +95.0°C) Core 2: +15.0°C (high = +85.0°C, crit = +95.0°C) Core 3: +14.0°C (high = +85.0°C, crit = +95.0°C) Core 4: +15.0°C (high = +85.0°C, crit = +95.0°C) Core 5: +15.0°C (high = +85.0°C, crit = +95.0°C) Core 8: +14.0°C (high = +85.0°C, crit = +95.0°C) Core 9: +13.0°C (high = +85.0°C, crit = +95.0°C) Core 10: +15.0°C (high = +85.0°C, crit = +95.0°C) Core 11: +15.0°C (high = +85.0°C, crit = +95.0°C) Core 12: +15.0°C (high = +85.0°C, crit = +95.0°C) Core 13: +14.0°C (high = +85.0°C, crit = +95.0°C) power_meter-acpi-0 Adapter: ACPI interface power1: 4.29 MW (interval = 1.00 s) [chibi@rhel8 ~]$