[chibi@centos8 ~]$ sudo nvidia-docker run --rm -ti nvcr.io/nvidia/tensorflow:19.04-py3 Unable to find image 'nvcr.io/nvidia/tensorflow:19.04-py3' locally 19.04-py3: Pulling from nvidia/tensorflow 34667c7e4631: Pulling fs layer d18d76a881a4: Pulling fs layer 119c7358fbfc: Pulling fs layer 2aaf13f3eff0: Pulling fs layer 202fa0f8874b: Pulling fs layer 3b700a61ede6: Pulling fs layer 87e6ca450d3f: Pulling fs layer a1e76dce1aec: Pulling fs layer 9b91fa2f9276: Pulling fs layer b5877a9add73: Pulling fs layer bab74df105f1: Pulling fs layer 534bbf505504: Waiting 4956bf3bbbb9: Waiting f4371944c97d: Pulling fs layer 4615a735431d: Pulling fs layer f4371944c97d: Waiting 629d5c9d75a4: Pulling fs layer 8071b94b5429: Pulling fs layer 6eb8eba2ad5a: Pulling fs layer e32e86c15b8b: Pulling fs layer 08db5b51b243: Pulling fs layer f71ce95fb406: Pulling fs layer 3498ed8c5685: Pulling fs layer 62819d8896c1: Pulling fs layer 4615a735431d: Waiting 5db2639932b5: Waiting 629d5c9d75a4: Waiting a2ceadc61854: Pulling fs layer 62819d8896c1: Waiting e32e86c15b8b: Waiting 08db5b51b243: Waiting 8d9313624ab7: Pulling fs layer e5cafe011f22: Pull complete eca19a329cd4: Pull complete 65ee50af0bcc: Pull complete 5f60ec8c32f4: Pull complete d7dcb657fa13: Pull complete 1f6ef6575fbe: Pull complete d1ef346a3015: Pull complete 4ef9cb404fd5: Pull complete f6797f45a018: Pull complete 1d4380527325: Pull complete 965f2629db02: Pull complete 5debff4c8c0a: Pull complete b3a3a9d82be6: Pull complete eac05f20b729: Pull complete 3ce0a7f80167: Pull complete 2a21e34a5784: Pull complete c1ccf19e258e: Pull complete 0b6ea9d0652b: Pull complete 307bc8c3f024: Pull complete ca75fd593a79: Pull complete 0cd3cdca1af7: Pull complete 48e857e9d372: Pull complete 3264ea403ca9: Pull complete Digest: sha256:aaebc136d5d50937362675c77afd908bd96cded68846f39163050a023c8a9851 Status: Downloaded newer image for nvcr.io/nvidia/tensorflow:19.04-py3 ================ == TensorFlow == ================ NVIDIA Release 19.04 (build 6132408) TensorFlow Version 1.13.1 Container image Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved. Copyright 2017-2019 The TensorFlow Authors. All rights reserved. Various files include modifications (c) NVIDIA CORPORATION. All rights reserved. NVIDIA modifications are covered by the license terms that apply to the underlying project or file. NOTE: MOFED driver for multi-node communication was not detected. Multi-node communication performance may be reduced. NOTE: The SHMEM allocation limit is set to the default of 64MB. This may be insufficient for TensorFlow. NVIDIA recommends the use of the following flags: nvidia-docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 ... root@0c523bf46a95:/workspace# ls README.md docker-examples nvidia-examples root@0c523bf46a95:/workspace# cd nvidia-examples root@0c523bf46a95:/workspace/nvidia-examples# ls NCF bert cnn ssdv1.2 OpenSeq2Seq big_lstm gnmt_v2 tensorrt UNet_Industrial build_imagenet_data resnet50v1.5 root@0c523bf46a95:/workspace/nvidia-examples# cd big_lstm root@0c523bf46a95:/workspace/nvidia-examples/big_lstm# ls 1b_word_vocab.txt data_utils_test.py language_model_test.py README.md download_1b_words_data.sh model_utils.py __init__.py hparams.py run_utils.py common.py hparams_test.py single_lm_train.py data_utils.py language_model.py testdata root@0c523bf46a95:/workspace/nvidia-examples/big_lstm# ./download_1b_words_data.sh Please specify root of dataset directory: data Success: dataset root dir validated --2020-04-04 15:45:49-- http://www.statmt.org/lm-benchmark/1-billion-word-language-modeling-benchmark-r13output.tar.gz Resolving www.statmt.org (www.statmt.org)... 129.215.197.184 Connecting to www.statmt.org (www.statmt.org)|129.215.197.184|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 1792209805 (1.7G) [application/x-gzip] Saving to: ‘1-billion-word-language-modeling-benchmark-r13output.tar.gz’ 1-billion-word-lang 100%[===================>] 1.67G 207KB/s in 3h 0m 2020-04-04 18:46:04 (162 KB/s) - ‘1-billion-word-language-modeling-benchmark-r13output.tar.gz’ saved [1792209805/1792209805] 1-billion-word-language-modeling-benchmark-r13output/ 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/ 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00024-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00057-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00055-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00096-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00081-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00033-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00072-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00082-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00018-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00008-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00059-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00005-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00091-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00062-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00031-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00095-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00076-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00006-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00038-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00015-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00087-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00021-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00049-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00009-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00027-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00056-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00046-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00032-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00029-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00088-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00085-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00011-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00012-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00067-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00003-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00093-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00050-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00053-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00044-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00019-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00066-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00028-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00045-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00039-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00071-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00052-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00078-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00037-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00002-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00014-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00048-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00017-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00004-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00077-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00080-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00020-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00051-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00016-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00079-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00043-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00068-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00099-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00064-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00034-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00054-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00040-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00070-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00063-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00041-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00083-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00061-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00073-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00094-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00030-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00060-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00035-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00023-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00042-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00025-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00090-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00089-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00065-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00075-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00022-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00026-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00098-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00084-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00010-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00069-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00013-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00092-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00036-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00097-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00007-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00074-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00001-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00047-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00086-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00058-of-00100 1-billion-word-language-modeling-benchmark-r13output/.svn/ 1-billion-word-language-modeling-benchmark-r13output/.svn/tmp/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/de/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/de/de102cd0c91cd19e6612f0840e68a2f20ba8134c.svn-base 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/de/deed1b75d3bd5cc36ae6aeb85d56680b892b7948.svn-base 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/86/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/86/86c58db52fbf362c5bc329afc33b8805085fcb0d.svn-base 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/9f/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/9f/9f2882e21f860a83ad6ea8898ebab140974ed301.svn-base 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/bc/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/bc/bcdbc523ee7488dc438cab869b6d5e236578dbfa.svn-base 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/d2/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/d2/d2718bc26d0ee0a213d7d4add99a304cb5b39ede.svn-base 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/c5/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/c5/c5b24f61479da923123d0394a188da922ea0359c.svn-base 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/11/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/11/116d6ea61730d8199127596b072e981338597779.svn-base 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/b0/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/b0/b0e26559cfe641245584a9400b35ba28d64f1411.svn-base 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/d3/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/d3/d3ae508e3bcb0e696dd70aecd052410f1f7afc1d.svn-base 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/9e/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/9e/9e148bd766e8805e0eb97eeae250433ec7a2e996.svn-base 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/31/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/31/31b645a482e0b81fda3c567cada307c6fcf7ec80.svn-base 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/da/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/da/da39a3ee5e6b4b0d3255bfef95601890afd80709.svn-base 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/c1/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/c1/c1ed42c415ec884e591fb5c70d373da640a383b5.svn-base 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/e3/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/e3/e37ba0f85e94073ccaced1eed7e4f5d737a25f49.svn-base 1-billion-word-language-modeling-benchmark-r13output/.svn/entries 1-billion-word-language-modeling-benchmark-r13output/.svn/format 1-billion-word-language-modeling-benchmark-r13output/.svn/wc.db 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/ 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00015-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00031-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00027-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00010-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00033-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00042-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00046-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00037-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00000-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00029-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00013-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00002-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00048-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00006-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00030-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00025-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00039-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00008-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00020-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00001-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00034-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00044-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00045-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00016-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00004-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00035-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00038-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00009-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00024-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00022-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00021-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00032-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00011-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00049-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00041-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00019-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00023-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00040-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00014-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00007-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00017-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00012-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00018-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00003-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00028-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en-00000-of-00100 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00043-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00005-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00036-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00026-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00047-of-00050 1-billion-word-language-modeling-benchmark-r13output/README Success! One billion words dataset ready at: data/1-billion-word-language-modeling-benchmark-r13output/ Please pass this dir to single_lm_train.py via the --datadir option. root@0c523bf46a95:/workspace/nvidia-examples/big_lstm# time python single_lm_train.py --mode=train --logdir=./logs --num_gpus=2 --datadir=./data/1-billion-word-language-modeling-benchmark-r13output WARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0. For more information, please see: * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md * https://github.com/tensorflow/addons If you depend on functionality not listed there, please file an issue. *****HYPER PARAMETERS***** {'num_gpus': 2, 'do_summaries': False, 'max_grad_norm': 10.0, 'batch_size': 128, 'num_steps': 20, 'num_shards': 8, 'run_profiler': False, 'vocab_size': 793470, 'num_layers': 1, 'emb_size': 512, 'learning_rate': 0.2, 'num_delayed_steps': 150, 'projected_size': 512, 'state_size': 2048, 'average_params': True, 'optimizer': 0, 'keep_prob': 0.9, 'num_sampled': 8192, 'max_time': 180} ************************** WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. WARNING:tensorflow:From /opt/tensorflow/nvidia-examples/big_lstm/model_utils.py:33: UniformUnitScaling.__init__ (from tensorflow.python.ops.init_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.initializers.variance_scaling instead with distribution=uniform to get equivalent behavior. WARNING:tensorflow:From /opt/tensorflow/nvidia-examples/big_lstm/language_model.py:75: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version. Instructions for updating: Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`. WARNING:tensorflow:From /opt/tensorflow/nvidia-examples/big_lstm/language_model.py:107: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/nn_impl.py:1444: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version. Instructions for updating: Create a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead. WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/array_grad.py:425: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. Current time: 1586026220.8400578 ALL VARIABLES WARNING:tensorflow:From /opt/tensorflow/nvidia-examples/big_lstm/run_utils.py:18: all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02. Instructions for updating: Please use tf.global_variables instead. model/emb_0:0 (99184, 512) /gpu:0 model/emb_1:0 (99184, 512) /gpu:0 model/emb_2:0 (99184, 512) /gpu:0 model/emb_3:0 (99184, 512) /gpu:0 model/emb_4:0 (99184, 512) /gpu:0 model/emb_5:0 (99184, 512) /gpu:0 model/emb_6:0 (99184, 512) /gpu:0 model/emb_7:0 (99184, 512) /gpu:0 model/lstm_0/LSTMCell/W_0:0 (1024, 8192) /gpu:0 model/lstm_0/LSTMCell/B:0 (8192,) /gpu:0 model/lstm_0/LSTMCell/W_P_0:0 (2048, 512) /gpu:0 model/softmax_w_0:0 (99184, 512) /gpu:0 model/softmax_w_1:0 (99184, 512) /gpu:0 model/softmax_w_2:0 (99184, 512) /gpu:0 model/softmax_w_3:0 (99184, 512) /gpu:0 model/softmax_w_4:0 (99184, 512) /gpu:0 model/softmax_w_5:0 (99184, 512) /gpu:0 model/softmax_w_6:0 (99184, 512) /gpu:0 model/softmax_w_7:0 (99184, 512) /gpu:0 model/softmax_b:0 (793470,) /gpu:0 model/global_step:0 () model/model/emb_0/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_1/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_2/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_3/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_4/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_5/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_6/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_7/Adagrad:0 (99184, 512) /gpu:0 model/model/lstm_0/LSTMCell/W_0/Adagrad:0 (1024, 8192) /gpu:0 model/model/lstm_0/LSTMCell/B/Adagrad:0 (8192,) /gpu:0 model/model/lstm_0/LSTMCell/W_P_0/Adagrad:0 (2048, 512) /gpu:0 model/model/softmax_w_0/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_1/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_2/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_3/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_4/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_5/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_6/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_7/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_b/Adagrad:0 (793470,) /gpu:0 model/model/lstm_0/LSTMCell/W_0/ExponentialMovingAverage:0 (1024, 8192) /gpu:0 model/model/lstm_0/LSTMCell/B/ExponentialMovingAverage:0 (8192,) /gpu:0 model/model/lstm_0/LSTMCell/W_P_0/ExponentialMovingAverage:0 (2048, 512) /gpu:0 TRAINABLE VARIABLES model/emb_0:0 (99184, 512) /gpu:0 model/emb_1:0 (99184, 512) /gpu:0 model/emb_2:0 (99184, 512) /gpu:0 model/emb_3:0 (99184, 512) /gpu:0 model/emb_4:0 (99184, 512) /gpu:0 model/emb_5:0 (99184, 512) /gpu:0 model/emb_6:0 (99184, 512) /gpu:0 model/emb_7:0 (99184, 512) /gpu:0 model/lstm_0/LSTMCell/W_0:0 (1024, 8192) /gpu:0 model/lstm_0/LSTMCell/B:0 (8192,) /gpu:0 model/lstm_0/LSTMCell/W_P_0:0 (2048, 512) /gpu:0 model/softmax_w_0:0 (99184, 512) /gpu:0 model/softmax_w_1:0 (99184, 512) /gpu:0 model/softmax_w_2:0 (99184, 512) /gpu:0 model/softmax_w_3:0 (99184, 512) /gpu:0 model/softmax_w_4:0 (99184, 512) /gpu:0 model/softmax_w_5:0 (99184, 512) /gpu:0 model/softmax_w_6:0 (99184, 512) /gpu:0 model/softmax_w_7:0 (99184, 512) /gpu:0 model/softmax_b:0 (793470,) /gpu:0 LOCAL VARIABLES model/model/state_0_0:0 (128, 2560) /gpu:0 model/model_1/state_1_0:0 (128, 2560) /gpu:1 WARNING:tensorflow:From /opt/tensorflow/nvidia-examples/big_lstm/run_utils.py:32: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.train.MonitoredTrainingSession 2020-04-04 18:50:21.246066: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2894555000 Hz 2020-04-04 18:50:21.253658: I tensorflow/compiler/xla/service/service.cc:161] XLA service 0x8f7f1c0 executing computations on platform Host. Devices: 2020-04-04 18:50:21.253702: I tensorflow/compiler/xla/service/service.cc:168] StreamExecutor device (0): , 2020-04-04 18:50:21.453804: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2020-04-04 18:50:21.454739: I tensorflow/compiler/xla/service/service.cc:161] XLA service 0x8f7ebe0 executing computations on platform CUDA. Devices: 2020-04-04 18:50:21.454787: I tensorflow/compiler/xla/service/service.cc:168] StreamExecutor device (0): TITAN RTX, Compute Capability 7.5 2020-04-04 18:50:21.454794: I tensorflow/compiler/xla/service/service.cc:168] StreamExecutor device (1): TITAN RTX, Compute Capability 7.5 2020-04-04 18:50:21.455483: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: TITAN RTX major: 7 minor: 5 memoryClockRate(GHz): 1.77 pciBusID: 0000:01:00.0 totalMemory: 23.65GiB freeMemory: 23.22GiB 2020-04-04 18:50:21.455515: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 1 with properties: name: TITAN RTX major: 7 minor: 5 memoryClockRate(GHz): 1.77 pciBusID: 0000:47:00.0 totalMemory: 23.65GiB freeMemory: 23.48GiB 2020-04-04 18:50:21.455561: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0, 1 2020-04-04 18:50:21.845359: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2020-04-04 18:50:21.845399: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 1 2020-04-04 18:50:21.845404: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N N 2020-04-04 18:50:21.845407: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 1: N N 2020-04-04 18:50:21.845505: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 22502 MB memory) -> physical GPU (device: 0, name: TITAN RTX, pci bus id: 0000:01:00.0, compute capability: 7.5) 2020-04-04 18:50:21.845994: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 22759 MB memory) -> physical GPU (device: 1, name: TITAN RTX, pci bus id: 0000:47:00.0, compute capability: 7.5) Processing file: ./data/1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00020-of-00100 Finished processing! 2020-04-04 18:50:36.616926: I tensorflow/stream_executor/dso_loader.cc:153] successfully opened CUDA library libcublas.so.10 locally Iteration 1, time = 11.03s, wps = 464, train loss = 12.9856 Iteration 2, time = 3.95s, wps = 1297, train loss = 12.9570 Iteration 3, time = 0.07s, wps = 72449, train loss = 12.8872 Iteration 4, time = 0.07s, wps = 75560, train loss = 12.8016 Iteration 5, time = 0.07s, wps = 76521, train loss = 19.6229 Iteration 6, time = 0.06s, wps = 85010, train loss = 13.4116 Iteration 7, time = 0.06s, wps = 84470, train loss = 13.3624 Iteration 8, time = 0.06s, wps = 85293, train loss = 11.9182 Iteration 9, time = 0.06s, wps = 87246, train loss = 47.8357 Iteration 20, time = 0.73s, wps = 76648, train loss = 11.7535 Iteration 40, time = 1.20s, wps = 85319, train loss = 12.2738 Iteration 60, time = 1.20s, wps = 85180, train loss = 11.0479 Iteration 80, time = 1.20s, wps = 85015, train loss = 9.0630 Iteration 100, time = 1.20s, wps = 85402, train loss = 8.3124 Iteration 120, time = 1.20s, wps = 85577, train loss = 7.9971 Iteration 140, time = 1.20s, wps = 85669, train loss = 7.3915 Iteration 160, time = 1.20s, wps = 85418, train loss = 7.0422 Iteration 180, time = 1.20s, wps = 85294, train loss = 7.0054 Iteration 200, time = 1.20s, wps = 85363, train loss = 6.6523 Iteration 220, time = 1.19s, wps = 86040, train loss = 6.5914 Iteration 240, time = 1.19s, wps = 86023, train loss = 6.4995 Iteration 260, time = 1.20s, wps = 85434, train loss = 6.5159 Iteration 280, time = 1.20s, wps = 85388, train loss = 6.4131 Iteration 300, time = 1.20s, wps = 85014, train loss = 6.4005 Iteration 320, time = 1.20s, wps = 85313, train loss = 6.2868 Iteration 340, time = 1.20s, wps = 85267, train loss = 6.2002 Iteration 360, time = 1.20s, wps = 85379, train loss = 6.1375 Iteration 380, time = 1.21s, wps = 84945, train loss = 6.2309 Iteration 400, time = 1.20s, wps = 85153, train loss = 6.1383 Iteration 420, time = 1.20s, wps = 85456, train loss = 6.0252 Iteration 440, time = 1.19s, wps = 85831, train loss = 6.1037 Iteration 460, time = 1.20s, wps = 85533, train loss = 6.0891 Iteration 480, time = 1.19s, wps = 85735, train loss = 6.0393 Iteration 500, time = 1.21s, wps = 84912, train loss = 6.0167 Iteration 520, time = 1.20s, wps = 85075, train loss = 5.8619 Iteration 540, time = 1.20s, wps = 85080, train loss = 5.8530 Iteration 560, time = 1.19s, wps = 86083, train loss = 5.9322 Iteration 580, time = 1.20s, wps = 85524, train loss = 5.8734 Iteration 600, time = 1.20s, wps = 85329, train loss = 5.8860 Iteration 620, time = 1.19s, wps = 85934, train loss = 5.8840 Iteration 640, time = 1.20s, wps = 85239, train loss = 5.8693 Iteration 660, time = 1.20s, wps = 85223, train loss = 5.6498 Iteration 680, time = 1.19s, wps = 86015, train loss = 5.7833 Iteration 700, time = 1.20s, wps = 85438, train loss = 5.6373 Iteration 720, time = 1.20s, wps = 85412, train loss = 5.7543 Iteration 740, time = 1.20s, wps = 85424, train loss = 5.6942 Iteration 760, time = 1.19s, wps = 85804, train loss = 5.6840 Iteration 780, time = 1.21s, wps = 84893, train loss = 5.7244 Iteration 800, time = 1.20s, wps = 85028, train loss = 5.7778 Iteration 820, time = 1.20s, wps = 85447, train loss = 5.6290 Iteration 840, time = 1.21s, wps = 84871, train loss = 5.6519 Iteration 860, time = 1.20s, wps = 85459, train loss = 5.6039 Iteration 880, time = 1.21s, wps = 84648, train loss = 5.6340 Iteration 900, time = 1.20s, wps = 85574, train loss = 5.5957 Iteration 920, time = 1.20s, wps = 85036, train loss = 5.6602 Iteration 940, time = 1.20s, wps = 85528, train loss = 5.5203 Iteration 960, time = 1.20s, wps = 85485, train loss = 5.5349 Iteration 980, time = 1.21s, wps = 84348, train loss = 5.4761 Iteration 1000, time = 1.21s, wps = 84750, train loss = 5.5076 Iteration 1020, time = 1.20s, wps = 85380, train loss = 5.4338 Iteration 1040, time = 1.21s, wps = 84632, train loss = 5.4536 Iteration 1060, time = 1.20s, wps = 85049, train loss = 5.3814 Iteration 1080, time = 1.21s, wps = 84752, train loss = 5.5055 Iteration 1100, time = 1.20s, wps = 85059, train loss = 5.4134 Iteration 1120, time = 1.20s, wps = 85257, train loss = 5.4474 Iteration 1140, time = 1.20s, wps = 85357, train loss = 5.4088 Iteration 1160, time = 1.20s, wps = 85499, train loss = 5.3558 Iteration 1180, time = 1.20s, wps = 85252, train loss = 5.3685 Iteration 1200, time = 1.20s, wps = 85014, train loss = 5.3161 Iteration 1220, time = 1.21s, wps = 84962, train loss = 5.3446 Iteration 1240, time = 1.22s, wps = 84217, train loss = 5.3270 Iteration 1260, time = 1.20s, wps = 85502, train loss = 5.3913 Iteration 1280, time = 1.21s, wps = 84441, train loss = 5.2818 Iteration 1300, time = 1.20s, wps = 85484, train loss = 5.3000 Iteration 1320, time = 1.20s, wps = 84993, train loss = 5.3011 Iteration 1340, time = 1.20s, wps = 85307, train loss = 5.3329 Iteration 1360, time = 1.20s, wps = 85226, train loss = 5.2870 Iteration 1380, time = 1.20s, wps = 85108, train loss = 5.1727 Iteration 1400, time = 1.21s, wps = 84416, train loss = 5.2577 Iteration 1420, time = 1.21s, wps = 84848, train loss = 5.3098 Iteration 1440, time = 1.21s, wps = 84900, train loss = 5.2232 Iteration 1460, time = 1.21s, wps = 84921, train loss = 5.2896 Iteration 1480, time = 1.20s, wps = 85333, train loss = 5.2265 Iteration 1500, time = 1.20s, wps = 85544, train loss = 5.2288 Iteration 1520, time = 1.21s, wps = 84932, train loss = 5.1271 Iteration 1540, time = 1.20s, wps = 85055, train loss = 5.2079 Iteration 1560, time = 1.20s, wps = 85211, train loss = 5.1719 Processing file: ./data/1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00010-of-00100 Finished processing! Iteration 1580, time = 2.81s, wps = 36471, train loss = 5.2575 Iteration 1600, time = 1.20s, wps = 84995, train loss = 5.2166 Iteration 1620, time = 1.21s, wps = 84481, train loss = 5.1379 Iteration 1640, time = 1.20s, wps = 85488, train loss = 5.1700 Iteration 1660, time = 1.20s, wps = 85475, train loss = 5.2017 Iteration 1680, time = 1.20s, wps = 85307, train loss = 5.1859 Iteration 1700, time = 1.20s, wps = 85075, train loss = 5.1740 Iteration 1720, time = 1.20s, wps = 85089, train loss = 5.1809 Iteration 1740, time = 1.21s, wps = 84423, train loss = 5.1381 Iteration 1760, time = 1.21s, wps = 84724, train loss = 5.1801 Iteration 1780, time = 1.20s, wps = 85281, train loss = 5.0957 Iteration 1800, time = 1.21s, wps = 84364, train loss = 5.0738 Iteration 1820, time = 1.20s, wps = 85130, train loss = 5.1357 Iteration 1840, time = 1.20s, wps = 85141, train loss = 5.1127 Iteration 1860, time = 1.21s, wps = 84616, train loss = 5.1621 Iteration 1880, time = 1.21s, wps = 84829, train loss = 5.0400 Iteration 1900, time = 1.21s, wps = 84315, train loss = 5.0016 Iteration 1920, time = 1.21s, wps = 84403, train loss = 5.1560 Iteration 1940, time = 1.21s, wps = 84733, train loss = 5.1270 Iteration 1960, time = 1.20s, wps = 85015, train loss = 5.1282 Iteration 1980, time = 1.20s, wps = 85188, train loss = 5.0336 Iteration 2000, time = 1.20s, wps = 85158, train loss = 5.0526 Iteration 2020, time = 1.21s, wps = 84642, train loss = 5.0756 Iteration 2040, time = 1.20s, wps = 85228, train loss = 5.0552 Iteration 2060, time = 1.20s, wps = 85000, train loss = 5.0011 Iteration 2080, time = 1.21s, wps = 84496, train loss = 5.0679 Iteration 2100, time = 1.20s, wps = 85139, train loss = 4.9412 Iteration 2120, time = 1.20s, wps = 85636, train loss = 4.9454 Iteration 2140, time = 1.21s, wps = 84872, train loss = 5.1141 Iteration 2160, time = 1.21s, wps = 84535, train loss = 5.0636 Iteration 2180, time = 1.21s, wps = 84782, train loss = 4.8543 Iteration 2200, time = 1.20s, wps = 85044, train loss = 5.0159 Iteration 2220, time = 1.20s, wps = 85046, train loss = 4.9128 Iteration 2240, time = 1.21s, wps = 84455, train loss = 4.9335 Iteration 2260, time = 1.21s, wps = 84319, train loss = 4.9879 Iteration 2280, time = 1.21s, wps = 84651, train loss = 4.9503 Iteration 2300, time = 1.21s, wps = 84956, train loss = 4.9419 Iteration 2320, time = 1.21s, wps = 84834, train loss = 4.9484 Iteration 2340, time = 1.21s, wps = 84334, train loss = 4.9748 Iteration 2360, time = 1.21s, wps = 84297, train loss = 4.8973 Iteration 2380, time = 1.21s, wps = 84819, train loss = 5.0397 Iteration 2400, time = 1.22s, wps = 84174, train loss = 4.9730 Iteration 2420, time = 1.21s, wps = 84621, train loss = 4.9517 Iteration 2440, time = 1.22s, wps = 84073, train loss = 4.8580 Iteration 2460, time = 1.20s, wps = 85665, train loss = 4.8647 Iteration 2480, time = 1.20s, wps = 84982, train loss = 4.9258 Iteration 2500, time = 1.20s, wps = 85092, train loss = 4.8973 Iteration 2520, time = 1.21s, wps = 84938, train loss = 4.9219 Iteration 2540, time = 1.21s, wps = 84678, train loss = 4.8572 /usr/local/lib/python3.5/dist-packages/tensorflow/python/summary/writer/writer.py:386: UserWarning: Attempting to use a closed FileWriter. The operation will be a noop unless the FileWriter is explicitly reopened. warnings.warn("Attempting to use a closed FileWriter. " real 3m18.157s user 20m10.743s sys 4m7.426s root@0c523bf46a95:/workspace/nvidia-examples/big_lstm# time python single_lm_train.py --mode=train --logdir=./logs --num_gpus=1 --datadir=./data/1-billion-word- language-modeling-benchmark-r13output WARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0. For more information, please see: * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md * https://github.com/tensorflow/addons If you depend on functionality not listed there, please file an issue. *****HYPER PARAMETERS***** {'average_params': True, 'emb_size': 512, 'do_summaries': False, 'projected_size': 512, 'num_gpus': 1, 'keep_prob': 0.9, 'num_steps': 20, 'vocab_size': 793470, 'max_grad_norm': 10.0, 'learning_rate': 0.2, 'max_time': 180, 'num_sampled': 8192, 'num_shards': 8, 'num_delayed_steps': 150, 'num_layers': 1, 'run_profiler': False, 'batch_size': 128, 'state_size': 2048, 'optimizer': 0} ************************** WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. WARNING:tensorflow:From /opt/tensorflow/nvidia-examples/big_lstm/model_utils.py:33: UniformUnitScaling.__init__ (from tensorflow.python.ops.init_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.initializers.variance_scaling instead with distribution=uniform to get equivalent behavior. WARNING:tensorflow:From /opt/tensorflow/nvidia-examples/big_lstm/language_model.py:75: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version. Instructions for updating: Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`. WARNING:tensorflow:From /opt/tensorflow/nvidia-examples/big_lstm/language_model.py:107: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/nn_impl.py:1444: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version. Instructions for updating: Create a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead. WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/array_grad.py:425: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. Current time: 1586026738.7672887 ALL VARIABLES WARNING:tensorflow:From /opt/tensorflow/nvidia-examples/big_lstm/run_utils.py:18: all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02. Instructions for updating: Please use tf.global_variables instead. model/emb_0:0 (99184, 512) /gpu:0 model/emb_1:0 (99184, 512) /gpu:0 model/emb_2:0 (99184, 512) /gpu:0 model/emb_3:0 (99184, 512) /gpu:0 model/emb_4:0 (99184, 512) /gpu:0 model/emb_5:0 (99184, 512) /gpu:0 model/emb_6:0 (99184, 512) /gpu:0 model/emb_7:0 (99184, 512) /gpu:0 model/lstm_0/LSTMCell/W_0:0 (1024, 8192) /gpu:0 model/lstm_0/LSTMCell/B:0 (8192,) /gpu:0 model/lstm_0/LSTMCell/W_P_0:0 (2048, 512) /gpu:0 model/softmax_w_0:0 (99184, 512) /gpu:0 model/softmax_w_1:0 (99184, 512) /gpu:0 model/softmax_w_2:0 (99184, 512) /gpu:0 model/softmax_w_3:0 (99184, 512) /gpu:0 model/softmax_w_4:0 (99184, 512) /gpu:0 model/softmax_w_5:0 (99184, 512) /gpu:0 model/softmax_w_6:0 (99184, 512) /gpu:0 model/softmax_w_7:0 (99184, 512) /gpu:0 model/softmax_b:0 (793470,) /gpu:0 model/global_step:0 () model/model/emb_0/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_1/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_2/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_3/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_4/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_5/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_6/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_7/Adagrad:0 (99184, 512) /gpu:0 model/model/lstm_0/LSTMCell/W_0/Adagrad:0 (1024, 8192) /gpu:0 model/model/lstm_0/LSTMCell/B/Adagrad:0 (8192,) /gpu:0 model/model/lstm_0/LSTMCell/W_P_0/Adagrad:0 (2048, 512) /gpu:0 model/model/softmax_w_0/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_1/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_2/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_3/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_4/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_5/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_6/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_7/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_b/Adagrad:0 (793470,) /gpu:0 model/model/lstm_0/LSTMCell/W_0/ExponentialMovingAverage:0 (1024, 8192) /gpu:0 model/model/lstm_0/LSTMCell/B/ExponentialMovingAverage:0 (8192,) /gpu:0 model/model/lstm_0/LSTMCell/W_P_0/ExponentialMovingAverage:0 (2048, 512) /gpu:0 TRAINABLE VARIABLES model/emb_0:0 (99184, 512) /gpu:0 model/emb_1:0 (99184, 512) /gpu:0 model/emb_2:0 (99184, 512) /gpu:0 model/emb_3:0 (99184, 512) /gpu:0 model/emb_4:0 (99184, 512) /gpu:0 model/emb_5:0 (99184, 512) /gpu:0 model/emb_6:0 (99184, 512) /gpu:0 model/emb_7:0 (99184, 512) /gpu:0 model/lstm_0/LSTMCell/W_0:0 (1024, 8192) /gpu:0 model/lstm_0/LSTMCell/B:0 (8192,) /gpu:0 model/lstm_0/LSTMCell/W_P_0:0 (2048, 512) /gpu:0 model/softmax_w_0:0 (99184, 512) /gpu:0 model/softmax_w_1:0 (99184, 512) /gpu:0 model/softmax_w_2:0 (99184, 512) /gpu:0 model/softmax_w_3:0 (99184, 512) /gpu:0 model/softmax_w_4:0 (99184, 512) /gpu:0 model/softmax_w_5:0 (99184, 512) /gpu:0 model/softmax_w_6:0 (99184, 512) /gpu:0 model/softmax_w_7:0 (99184, 512) /gpu:0 model/softmax_b:0 (793470,) /gpu:0 LOCAL VARIABLES model/model/state_0_0:0 (128, 2560) /gpu:0 WARNING:tensorflow:From /opt/tensorflow/nvidia-examples/big_lstm/run_utils.py:32: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.train.MonitoredTrainingSession 2020-04-04 18:58:58.970063: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2894555000 Hz 2020-04-04 18:58:58.978242: I tensorflow/compiler/xla/service/service.cc:161] XLA service 0x7576f50 executing computations on platform Host. Devices: 2020-04-04 18:58:58.978284: I tensorflow/compiler/xla/service/service.cc:168] StreamExecutor device (0): , 2020-04-04 18:58:59.196560: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2020-04-04 18:58:59.197466: I tensorflow/compiler/xla/service/service.cc:161] XLA service 0x7576970 executing computations on platform CUDA. Devices: 2020-04-04 18:58:59.197511: I tensorflow/compiler/xla/service/service.cc:168] StreamExecutor device (0): TITAN RTX, Compute Capability 7.5 2020-04-04 18:58:59.197517: I tensorflow/compiler/xla/service/service.cc:168] StreamExecutor device (1): TITAN RTX, Compute Capability 7.5 2020-04-04 18:58:59.197785: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: TITAN RTX major: 7 minor: 5 memoryClockRate(GHz): 1.77 pciBusID: 0000:01:00.0 totalMemory: 23.65GiB freeMemory: 23.22GiB 2020-04-04 18:58:59.197817: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 1 with properties: name: TITAN RTX major: 7 minor: 5 memoryClockRate(GHz): 1.77 pciBusID: 0000:47:00.0 totalMemory: 23.65GiB freeMemory: 23.48GiB 2020-04-04 18:58:59.197866: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0, 1 2020-04-04 18:58:59.584682: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2020-04-04 18:58:59.584723: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 1 2020-04-04 18:58:59.584728: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N N 2020-04-04 18:58:59.584732: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 1: N N 2020-04-04 18:58:59.584830: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 22501 MB memory) -> physical GPU (device: 0, name: TITAN RTX, pci bus id: 0000:01:00.0, compute capability: 7.5) 2020-04-04 18:58:59.585200: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 22759 MB memory) -> physical GPU (device: 1, name: TITAN RTX, pci bus id: 0000:47:00.0, compute capability: 7.5) WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py:1266: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version. Instructions for updating: Use standard file APIs to check for files with this prefix. WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py:1070: get_checkpoint_mtimes (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version. Instructions for updating: Use standard file utilities to get mtimes. Processing file: ./data/1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00057-of-00100 Finished processing! 2020-04-04 18:59:06.829316: I tensorflow/stream_executor/dso_loader.cc:153] successfully opened CUDA library libcublas.so.10 locally Iteration 2556, time = 3.99s, wps = 641, train loss = 5.4457 Iteration 2557, time = 2.05s, wps = 1247, train loss = 4.9574 Iteration 2558, time = 0.06s, wps = 44631, train loss = 4.9001 Iteration 2559, time = 0.05s, wps = 46564, train loss = 4.9579 Iteration 2560, time = 0.06s, wps = 45845, train loss = 4.8777 Iteration 2561, time = 0.06s, wps = 45181, train loss = 4.9757 Iteration 2562, time = 0.05s, wps = 52074, train loss = 4.8620 Iteration 2563, time = 0.04s, wps = 58089, train loss = 4.9408 Iteration 2564, time = 0.05s, wps = 52447, train loss = 4.8647 Iteration 2575, time = 0.51s, wps = 54813, train loss = 5.0668 Iteration 2595, time = 0.95s, wps = 54074, train loss = 4.8544 Iteration 2615, time = 0.93s, wps = 55313, train loss = 4.8538 Iteration 2635, time = 0.93s, wps = 55237, train loss = 5.0296 Iteration 2655, time = 0.95s, wps = 54051, train loss = 4.9586 Iteration 2675, time = 0.93s, wps = 55296, train loss = 4.9102 Iteration 2695, time = 0.94s, wps = 54295, train loss = 4.7617 Iteration 2715, time = 0.93s, wps = 55145, train loss = 4.9849 Iteration 2735, time = 0.94s, wps = 54404, train loss = 4.9054 Iteration 2755, time = 0.93s, wps = 54978, train loss = 4.8363 Iteration 2775, time = 0.93s, wps = 54837, train loss = 4.8954 Iteration 2795, time = 0.92s, wps = 55433, train loss = 4.9150 Iteration 2815, time = 0.92s, wps = 55707, train loss = 4.9253 Iteration 2835, time = 0.92s, wps = 55410, train loss = 4.9269 Iteration 2855, time = 0.93s, wps = 55329, train loss = 4.9642 Iteration 2875, time = 0.92s, wps = 55449, train loss = 4.9700 Iteration 2895, time = 0.93s, wps = 54765, train loss = 4.8830 Iteration 2915, time = 0.95s, wps = 53736, train loss = 4.9480 Iteration 2935, time = 0.93s, wps = 54990, train loss = 4.9372 Iteration 2955, time = 0.92s, wps = 55437, train loss = 4.7861 Iteration 2975, time = 0.93s, wps = 54842, train loss = 4.8810 Iteration 2995, time = 0.93s, wps = 55258, train loss = 4.9673 Iteration 3015, time = 0.92s, wps = 55542, train loss = 4.9127 Iteration 3035, time = 0.93s, wps = 55348, train loss = 4.9768 Iteration 3055, time = 0.93s, wps = 54853, train loss = 4.9948 Iteration 3075, time = 0.92s, wps = 55488, train loss = 4.8531 Iteration 3095, time = 0.93s, wps = 55252, train loss = 4.7673 Iteration 3115, time = 0.94s, wps = 54617, train loss = 4.8727 Iteration 3135, time = 0.94s, wps = 54729, train loss = 4.8563 Iteration 3155, time = 0.93s, wps = 55116, train loss = 4.8623 Iteration 3175, time = 0.92s, wps = 55380, train loss = 4.7871 Iteration 3195, time = 0.93s, wps = 54781, train loss = 4.7807 Iteration 3215, time = 0.93s, wps = 55041, train loss = 4.7036 Iteration 3235, time = 0.92s, wps = 55457, train loss = 4.7509 Iteration 3255, time = 0.94s, wps = 54512, train loss = 4.8365 Iteration 3275, time = 0.94s, wps = 54406, train loss = 4.8741 Iteration 3295, time = 0.93s, wps = 54992, train loss = 4.7668 Iteration 3315, time = 0.93s, wps = 55194, train loss = 4.8427 Iteration 3335, time = 0.94s, wps = 54412, train loss = 4.8875 Iteration 3355, time = 0.93s, wps = 54918, train loss = 4.7453 Iteration 3375, time = 0.94s, wps = 54435, train loss = 4.8080 Iteration 3395, time = 0.93s, wps = 55033, train loss = 4.6033 Iteration 3415, time = 0.93s, wps = 54876, train loss = 4.7742 Iteration 3435, time = 0.93s, wps = 54908, train loss = 4.8483 Iteration 3455, time = 0.93s, wps = 55118, train loss = 4.8629 Iteration 3475, time = 0.94s, wps = 54503, train loss = 4.7824 Iteration 3495, time = 0.94s, wps = 54313, train loss = 4.8121 Iteration 3515, time = 0.94s, wps = 54496, train loss = 4.8024 Iteration 3535, time = 0.94s, wps = 54468, train loss = 4.7177 Iteration 3555, time = 0.95s, wps = 53703, train loss = 4.7370 Iteration 3575, time = 0.94s, wps = 54304, train loss = 4.6882 Iteration 3595, time = 0.94s, wps = 54596, train loss = 4.8458 Iteration 3615, time = 0.93s, wps = 54907, train loss = 4.8717 Iteration 3635, time = 0.95s, wps = 53957, train loss = 4.7562 Iteration 3655, time = 0.94s, wps = 54342, train loss = 4.6920 Iteration 3675, time = 0.92s, wps = 55459, train loss = 4.8005 Iteration 3695, time = 0.93s, wps = 55070, train loss = 4.6671 Iteration 3715, time = 0.91s, wps = 56186, train loss = 4.8861 Iteration 3735, time = 0.94s, wps = 54494, train loss = 4.6652 Iteration 3755, time = 0.95s, wps = 53817, train loss = 4.7806 Iteration 3775, time = 0.94s, wps = 54686, train loss = 4.7310 Iteration 3795, time = 0.93s, wps = 54854, train loss = 4.6961 Iteration 3815, time = 0.94s, wps = 54417, train loss = 4.8825 Iteration 3835, time = 0.94s, wps = 54725, train loss = 4.7024 Iteration 3855, time = 0.94s, wps = 54568, train loss = 4.6273 Iteration 3875, time = 0.93s, wps = 55263, train loss = 4.7453 Iteration 3895, time = 0.93s, wps = 54905, train loss = 4.7468 Iteration 3915, time = 0.93s, wps = 54978, train loss = 4.7453 Iteration 3935, time = 0.94s, wps = 54705, train loss = 4.7489 Iteration 3955, time = 0.93s, wps = 55315, train loss = 4.7188 Iteration 3975, time = 0.95s, wps = 53904, train loss = 4.7976 Iteration 3995, time = 0.94s, wps = 54744, train loss = 4.7807 Iteration 4015, time = 0.93s, wps = 55171, train loss = 4.7639 Iteration 4035, time = 0.93s, wps = 55169, train loss = 4.5727 Iteration 4055, time = 0.93s, wps = 54810, train loss = 4.7127 Iteration 4075, time = 0.95s, wps = 53796, train loss = 4.8182 Iteration 4095, time = 0.94s, wps = 54481, train loss = 4.7960 Iteration 4115, time = 0.94s, wps = 54516, train loss = 4.7135 Iteration 4135, time = 0.95s, wps = 54106, train loss = 4.7178 Iteration 4155, time = 0.94s, wps = 54563, train loss = 4.6930 Iteration 4175, time = 0.93s, wps = 55167, train loss = 4.7215 Iteration 4195, time = 0.92s, wps = 55718, train loss = 4.8075 Iteration 4215, time = 0.93s, wps = 54766, train loss = 4.7016 Iteration 4235, time = 0.92s, wps = 55640, train loss = 4.7510 Iteration 4255, time = 0.96s, wps = 53297, train loss = 4.7622 Iteration 4275, time = 0.95s, wps = 53998, train loss = 4.6780 Iteration 4295, time = 0.93s, wps = 54898, train loss = 4.8009 Iteration 4315, time = 0.94s, wps = 54746, train loss = 4.7625 Iteration 4335, time = 0.92s, wps = 55583, train loss = 4.6532 Iteration 4355, time = 0.94s, wps = 54340, train loss = 4.7359 Iteration 4375, time = 0.93s, wps = 55161, train loss = 4.7016 Iteration 4395, time = 0.94s, wps = 54424, train loss = 4.7856 Iteration 4415, time = 0.93s, wps = 54897, train loss = 4.6348 Iteration 4435, time = 0.93s, wps = 54779, train loss = 4.7159 Iteration 4455, time = 0.95s, wps = 53639, train loss = 4.5915 Iteration 4475, time = 0.92s, wps = 55457, train loss = 4.6663 Iteration 4495, time = 0.92s, wps = 55652, train loss = 4.6493 Iteration 4515, time = 0.95s, wps = 54114, train loss = 4.6804 Iteration 4535, time = 0.95s, wps = 53687, train loss = 4.6151 Iteration 4555, time = 0.95s, wps = 53775, train loss = 4.5978 Iteration 4575, time = 0.93s, wps = 54987, train loss = 4.6868 Iteration 4595, time = 0.93s, wps = 55000, train loss = 4.8245 Iteration 4615, time = 0.93s, wps = 54841, train loss = 4.6286 Iteration 4635, time = 0.94s, wps = 54358, train loss = 4.7547 Iteration 4655, time = 0.94s, wps = 54626, train loss = 4.8280 Iteration 4675, time = 0.93s, wps = 55318, train loss = 4.6179 Iteration 4695, time = 0.95s, wps = 54132, train loss = 4.6132 Iteration 4715, time = 0.91s, wps = 55998, train loss = 4.6449 Iteration 4735, time = 0.94s, wps = 54723, train loss = 4.6167 Iteration 4755, time = 0.93s, wps = 54944, train loss = 4.7562 Iteration 4775, time = 0.93s, wps = 55022, train loss = 4.7531 Iteration 4795, time = 0.94s, wps = 54747, train loss = 4.6794 Iteration 4815, time = 0.94s, wps = 54449, train loss = 4.7036 Iteration 4835, time = 0.95s, wps = 53662, train loss = 4.6839 Iteration 4855, time = 0.92s, wps = 55823, train loss = 4.7769 Iteration 4875, time = 0.94s, wps = 54581, train loss = 4.7411 Iteration 4895, time = 0.96s, wps = 53464, train loss = 4.6581 Iteration 4915, time = 0.94s, wps = 54411, train loss = 4.7080 Iteration 4935, time = 0.96s, wps = 53579, train loss = 4.6513 Iteration 4955, time = 0.95s, wps = 54126, train loss = 4.6458 Iteration 4975, time = 0.95s, wps = 54054, train loss = 4.7226 Iteration 4995, time = 0.94s, wps = 54281, train loss = 4.6082 Iteration 5015, time = 0.93s, wps = 54790, train loss = 4.5583 Iteration 5035, time = 0.92s, wps = 55365, train loss = 4.6130 Iteration 5055, time = 0.94s, wps = 54360, train loss = 4.7849 Iteration 5075, time = 0.94s, wps = 54226, train loss = 4.6457 Iteration 5095, time = 0.93s, wps = 54798, train loss = 4.6879 Iteration 5115, time = 0.95s, wps = 54097, train loss = 4.5745 Iteration 5135, time = 0.97s, wps = 52669, train loss = 4.5077 Iteration 5155, time = 0.94s, wps = 54480, train loss = 4.6604 Iteration 5175, time = 0.95s, wps = 53993, train loss = 4.5972 Iteration 5195, time = 0.95s, wps = 54050, train loss = 4.6850 Iteration 5215, time = 0.94s, wps = 54487, train loss = 4.6367 Iteration 5235, time = 0.94s, wps = 54753, train loss = 4.6337 Iteration 5255, time = 0.95s, wps = 53740, train loss = 4.6361 Iteration 5275, time = 0.92s, wps = 55539, train loss = 4.6401 Iteration 5295, time = 0.96s, wps = 53573, train loss = 4.6028 Iteration 5315, time = 0.94s, wps = 54229, train loss = 4.7175 Iteration 5335, time = 0.94s, wps = 54420, train loss = 4.5613 Iteration 5355, time = 0.93s, wps = 55073, train loss = 4.5956 Iteration 5375, time = 0.96s, wps = 53594, train loss = 4.6441 Iteration 5395, time = 0.94s, wps = 54324, train loss = 4.7201 Iteration 5415, time = 0.95s, wps = 53720, train loss = 4.6607 Iteration 5435, time = 0.93s, wps = 55331, train loss = 4.7668 Iteration 5455, time = 0.94s, wps = 54502, train loss = 4.6502 Iteration 5475, time = 0.95s, wps = 53988, train loss = 4.6066 Iteration 5495, time = 0.94s, wps = 54325, train loss = 4.6676 Iteration 5515, time = 0.93s, wps = 54909, train loss = 4.5926 Iteration 5535, time = 0.95s, wps = 54062, train loss = 4.5794 Iteration 5555, time = 0.94s, wps = 54556, train loss = 4.6974 Iteration 5575, time = 0.94s, wps = 54576, train loss = 4.6299 Iteration 5595, time = 0.96s, wps = 53178, train loss = 4.6369 Iteration 5615, time = 0.94s, wps = 54545, train loss = 4.5497 Iteration 5635, time = 0.95s, wps = 53914, train loss = 4.5640 Iteration 5655, time = 0.95s, wps = 54025, train loss = 4.4470 Iteration 5675, time = 0.94s, wps = 54393, train loss = 4.6182 Processing file: ./data/1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00087-of-00100 Finished processing! Iteration 5695, time = 2.56s, wps = 20011, train loss = 4.5242 Iteration 5715, time = 0.93s, wps = 54920, train loss = 4.7910 Iteration 5735, time = 0.94s, wps = 54208, train loss = 4.6340 Iteration 5755, time = 0.95s, wps = 54072, train loss = 4.6374 Iteration 5775, time = 0.94s, wps = 54249, train loss = 4.6196 Iteration 5795, time = 0.93s, wps = 54761, train loss = 4.5465 Iteration 5815, time = 0.94s, wps = 54327, train loss = 4.6617 Iteration 5835, time = 0.95s, wps = 54054, train loss = 4.5616 Iteration 5855, time = 0.94s, wps = 54410, train loss = 4.7220 Iteration 5875, time = 0.94s, wps = 54394, train loss = 4.4985 Iteration 5895, time = 0.93s, wps = 54771, train loss = 4.6639 Iteration 5915, time = 0.94s, wps = 54282, train loss = 4.5874 Iteration 5935, time = 0.94s, wps = 54394, train loss = 4.6265 Iteration 5955, time = 0.94s, wps = 54462, train loss = 4.5711 Iteration 5975, time = 0.96s, wps = 53604, train loss = 4.5910 Iteration 5995, time = 0.95s, wps = 54150, train loss = 4.5860 Iteration 6015, time = 0.95s, wps = 53825, train loss = 4.6927 Iteration 6035, time = 0.94s, wps = 54245, train loss = 4.6664 Iteration 6055, time = 0.94s, wps = 54709, train loss = 4.5546 Iteration 6075, time = 0.95s, wps = 53878, train loss = 4.5678 Iteration 6095, time = 0.95s, wps = 53962, train loss = 4.5844 Iteration 6115, time = 0.93s, wps = 54865, train loss = 4.6595 Iteration 6135, time = 0.94s, wps = 54437, train loss = 4.6238 /usr/local/lib/python3.5/dist-packages/tensorflow/python/summary/writer/writer.py:386: UserWarning: Attempting to use a closed FileWriter. The operation will be a noop unless the FileWriter is explicitly reopened. warnings.warn("Attempting to use a closed FileWriter. " real 3m12.802s user 11m55.702s sys 3m1.048s root@0c523bf46a95:/workspace/nvidia-examples/big_lstm# cat /etc/os-release NAME="Ubuntu" VERSION="16.04.6 LTS (Xenial Xerus)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 16.04.6 LTS" VERSION_ID="16.04" HOME_URL="http://www.ubuntu.com/" SUPPORT_URL="http://help.ubuntu.com/" BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/" VERSION_CODENAME=xenial UBUNTU_CODENAME=xenial root@0c523bf46a95:/workspace/nvidia-examples/big_lstm# nvcc -V nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2019 NVIDIA Corporation Built on Fri_Feb__8_19:08:17_PST_2019 Cuda compilation tools, release 10.1, V10.1.105 root@0c523bf46a95:/workspace/nvidia-examples/big_lstm# cd data root@0c523bf46a95:/workspace/nvidia-examples/big_lstm/data# ls 1-billion-word-language-modeling-benchmark-r13output root@0c523bf46a95:/workspace/nvidia-examples/big_lstm/data# cd 1-billion-word-language-modeling-benchmark-r13output root@0c523bf46a95:/workspace/nvidia-examples/big_lstm/data/1-billion-word-language-modeling-benchmark-r13output# ls 1b_word_vocab.txt heldout-monolingual.tokenized.shuffled README training-monolingual.tokenized.shuffled root@0c523bf46a95:/workspace/nvidia-examples/big_lstm/data/1-billion-word-language-modeling-benchmark-r13output# cd training-monolingual.tokenized.shuffled root@0c523bf46a95:/workspace/nvidia-examples/big_lstm/data/1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled# ls news.en-00001-of-00100 news.en-00034-of-00100 news.en-00067-of-00100 news.en-00002-of-00100 news.en-00035-of-00100 news.en-00068-of-00100 news.en-00003-of-00100 news.en-00036-of-00100 news.en-00069-of-00100 news.en-00004-of-00100 news.en-00037-of-00100 news.en-00070-of-00100 news.en-00005-of-00100 news.en-00038-of-00100 news.en-00071-of-00100 news.en-00006-of-00100 news.en-00039-of-00100 news.en-00072-of-00100 news.en-00007-of-00100 news.en-00040-of-00100 news.en-00073-of-00100 news.en-00008-of-00100 news.en-00041-of-00100 news.en-00074-of-00100 news.en-00009-of-00100 news.en-00042-of-00100 news.en-00075-of-00100 news.en-00010-of-00100 news.en-00043-of-00100 news.en-00076-of-00100 news.en-00011-of-00100 news.en-00044-of-00100 news.en-00077-of-00100 news.en-00012-of-00100 news.en-00045-of-00100 news.en-00078-of-00100 news.en-00013-of-00100 news.en-00046-of-00100 news.en-00079-of-00100 news.en-00014-of-00100 news.en-00047-of-00100 news.en-00080-of-00100 news.en-00015-of-00100 news.en-00048-of-00100 news.en-00081-of-00100 news.en-00016-of-00100 news.en-00049-of-00100 news.en-00082-of-00100 news.en-00017-of-00100 news.en-00050-of-00100 news.en-00083-of-00100 news.en-00018-of-00100 news.en-00051-of-00100 news.en-00084-of-00100 news.en-00019-of-00100 news.en-00052-of-00100 news.en-00085-of-00100 news.en-00020-of-00100 news.en-00053-of-00100 news.en-00086-of-00100 news.en-00021-of-00100 news.en-00054-of-00100 news.en-00087-of-00100 news.en-00022-of-00100 news.en-00055-of-00100 news.en-00088-of-00100 news.en-00023-of-00100 news.en-00056-of-00100 news.en-00089-of-00100 news.en-00024-of-00100 news.en-00057-of-00100 news.en-00090-of-00100 news.en-00025-of-00100 news.en-00058-of-00100 news.en-00091-of-00100 news.en-00026-of-00100 news.en-00059-of-00100 news.en-00092-of-00100 news.en-00027-of-00100 news.en-00060-of-00100 news.en-00093-of-00100 news.en-00028-of-00100 news.en-00061-of-00100 news.en-00094-of-00100 news.en-00029-of-00100 news.en-00062-of-00100 news.en-00095-of-00100 news.en-00030-of-00100 news.en-00063-of-00100 news.en-00096-of-00100 news.en-00031-of-00100 news.en-00064-of-00100 news.en-00097-of-00100 news.en-00032-of-00100 news.en-00065-of-00100 news.en-00098-of-00100 news.en-00033-of-00100 news.en-00066-of-00100 news.en-00099-of-00100 root@0c523bf46a95:/workspace/nvidia-examples/big_lstm/data/1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled# exit exit [chibi@centos8 ~]$ cat /etc/redhat-release CentOS Linux release 8.1.1911 (Core) [chibi@centos8 ~]$ nvcc -V nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2019 NVIDIA Corporation Built on Wed_Oct_23_19:24:38_PDT_2019 Cuda compilation tools, release 10.2, V10.2.89 [chibi@centos8 ~]$ sensors k10temp-pci-00c3 Adapter: PCI adapter Tdie: +32.5°C (high = +70.0°C) Tctl: +32.5°C asus-isa-0000 Adapter: ISA adapter cpu_fan: 0 RPM [chibi@centos8 ~]$ sudo hddtemp /dev/sda [sudo] chibi のパスワード: /dev/sda: TS128GSSD370S: 18°C [chibi@centos8 ~]$ nvidia-smi nvlink -c GPU 0: TITAN RTX (UUID: GPU-7fb51c1d-c1e7-35cc-aad7-66971f05ddb7) GPU 1: TITAN RTX (UUID: GPU-5a71d61e-f130-637a-b33d-4df555b0ed88) [chibi@centos8 ~]$