chibi@1604:~$ sudo nvidia-docker run --rm -ti nvcr.io/nvidia/tensorflow:18.09-py3 [sudo] chibi のパスワード: ================ == TensorFlow == ================ NVIDIA Release 18.09 (build 687558) Container image Copyright (c) 2018, NVIDIA CORPORATION. All rights reserved. Copyright 2017 The TensorFlow Authors. All rights reserved. Various files include modifications (c) NVIDIA CORPORATION. All rights reserved. NVIDIA modifications are covered by the license terms that apply to the underlying project or file. NOTE: The SHMEM allocation limit is set to the default of 64MB. This may be insufficient for TensorFlow. NVIDIA recommends the use of the following flags: nvidia-docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 ... root@583ba0346921:/workspace# ls README.md docker-examples nvidia-examples root@583ba0346921:/workspace# cd nvidia-examples root@583ba0346921:/workspace/nvidia-examples# ls OpenSeq2Seq big_lstm build_imagenet_data cnn tftrt root@583ba0346921:/workspace/nvidia-examples# cd big_lstm root@583ba0346921:/workspace/nvidia-examples/big_lstm# ls 1b_word_vocab.txt data_utils_test.py language_model_test.py README.md download_1b_words_data.sh model_utils.py __init__.py hparams.py run_utils.py common.py hparams_test.py single_lm_train.py data_utils.py language_model.py testdata root@583ba0346921:/workspace/nvidia-examples/big_lstm# ./download_1b_words_data.sh Please specify root of dataset directory: data Success: dataset root dir validated --2019-03-29 09:43:27-- http://www.statmt.org/lm-benchmark/1-billion-word-language-modeling-benchmark-r13output.tar.gz Resolving www.statmt.org (www.statmt.org)... 129.215.197.184 Connecting to www.statmt.org (www.statmt.org)|129.215.197.184|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 1792209805 (1.7G) [application/x-gzip] Saving to: ‘1-billion-word-language-modeling-benchmark-r13output.tar.gz’ 1-billion-word-langu 100%[===================>] 1.67G 450KB/s in 40m 4s 2019-03-29 10:23:31 (728 KB/s) - ‘1-billion-word-language-modeling-benchmark-r13output.tar.gz’ saved [1792209805/1792209805] 1-billion-word-language-modeling-benchmark-r13output/ 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/ 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00024-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00057-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00055-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00096-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00081-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00033-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00072-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00082-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00018-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00008-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00059-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00005-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00091-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00062-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00031-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00095-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00076-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00006-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00038-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00015-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00087-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00021-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00049-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00009-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00027-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00056-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00046-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00032-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00029-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00088-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00085-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00011-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00012-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00067-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00003-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00093-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00050-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00053-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00044-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00019-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00066-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00028-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00045-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00039-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00071-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00052-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00078-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00037-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00002-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00014-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00048-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00017-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00004-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00077-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00080-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00020-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00051-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00016-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00079-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00043-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00068-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00099-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00064-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00034-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00054-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00040-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00070-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00063-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00041-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00083-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00061-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00073-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00094-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00030-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00060-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00035-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00023-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00042-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00025-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00090-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00089-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00065-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00075-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00022-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00026-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00098-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00084-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00010-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00069-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00013-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00092-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00036-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00097-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00007-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00074-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00001-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00047-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00086-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00058-of-00100 1-billion-word-language-modeling-benchmark-r13output/.svn/ 1-billion-word-language-modeling-benchmark-r13output/.svn/tmp/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/de/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/de/de102cd0c91cd19e6612f0840e68a2f20ba8134c.svn-base 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/de/deed1b75d3bd5cc36ae6aeb85d56680b892b7948.svn-base 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/86/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/86/86c58db52fbf362c5bc329afc33b8805085fcb0d.svn-base 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/9f/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/9f/9f2882e21f860a83ad6ea8898ebab140974ed301.svn-base 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/bc/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/bc/bcdbc523ee7488dc438cab869b6d5e236578dbfa.svn-base 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/d2/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/d2/d2718bc26d0ee0a213d7d4add99a304cb5b39ede.svn-base 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/c5/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/c5/c5b24f61479da923123d0394a188da922ea0359c.svn-base 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/11/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/11/116d6ea61730d8199127596b072e981338597779.svn-base 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/b0/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/b0/b0e26559cfe641245584a9400b35ba28d64f1411.svn-base 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/d3/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/d3/d3ae508e3bcb0e696dd70aecd052410f1f7afc1d.svn-base 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/9e/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/9e/9e148bd766e8805e0eb97eeae250433ec7a2e996.svn-base 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/31/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/31/31b645a482e0b81fda3c567cada307c6fcf7ec80.svn-base 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/da/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/da/da39a3ee5e6b4b0d3255bfef95601890afd80709.svn-base 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/c1/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/c1/c1ed42c415ec884e591fb5c70d373da640a383b5.svn-base 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/e3/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/e3/e37ba0f85e94073ccaced1eed7e4f5d737a25f49.svn-base 1-billion-word-language-modeling-benchmark-r13output/.svn/entries 1-billion-word-language-modeling-benchmark-r13output/.svn/format 1-billion-word-language-modeling-benchmark-r13output/.svn/wc.db 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/ 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00015-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00031-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00027-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00010-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00033-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00042-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00046-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00037-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00000-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00029-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00013-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00002-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00048-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00006-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00030-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00025-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00039-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00008-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00020-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00001-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00034-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00044-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00045-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00016-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00004-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00035-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00038-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00009-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00024-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00022-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00021-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00032-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00011-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00049-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00041-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00019-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00023-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00040-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00014-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00007-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00017-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00012-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00018-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00003-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00028-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en-00000-of-00100 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00043-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00005-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00036-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00026-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00047-of-00050 1-billion-word-language-modeling-benchmark-r13output/README Success! One billion words dataset ready at: data/1-billion-word-language-modeling-benchmark-r13output/ Please pass this dir to single_lm_train.py via the --datadir option. root@583ba0346921:/workspace/nvidia-examples/big_lstm# time python single_lm_train.py --mode=train --logdir=./logs --num_gpus=1 --datadir=./data/1-billion-word-language-modeling-benchmark-r13output *****HYPER PARAMETERS***** {'do_summaries': False, 'run_profiler': False, 'num_gpus': 1, 'num_sampled': 8192, 'keep_prob': 0.9, 'learning_rate': 0.2, 'num_steps': 20, 'projected_size': 512, 'num_delayed_steps': 150, 'optimizer': 0, 'vocab_size': 793470, 'batch_size': 128, 'num_shards': 8, 'max_grad_norm': 10.0, 'max_time': 180, 'state_size': 2048, 'num_layers': 1, 'average_params': True, 'emb_size': 512} ************************** WARNING:tensorflow:From /opt/tensorflow/nvidia-examples/big_lstm/model_utils.py:33: UniformUnitScaling.__init__ (from tensorflow.python.ops.init_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.initializers.variance_scaling instead with distribution=uniform to get equivalent behavior. Current time: 1553855196.2173831 ALL VARIABLES WARNING:tensorflow:From /opt/tensorflow/nvidia-examples/big_lstm/run_utils.py:18: all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02. Instructions for updating: Please use tf.global_variables instead. model/emb_0:0 (99184, 512) /gpu:0 model/emb_1:0 (99184, 512) /gpu:0 model/emb_2:0 (99184, 512) /gpu:0 model/emb_3:0 (99184, 512) /gpu:0 model/emb_4:0 (99184, 512) /gpu:0 model/emb_5:0 (99184, 512) /gpu:0 model/emb_6:0 (99184, 512) /gpu:0 model/emb_7:0 (99184, 512) /gpu:0 model/lstm_0/LSTMCell/W_0:0 (1024, 8192) /gpu:0 model/lstm_0/LSTMCell/B:0 (8192,) /gpu:0 model/lstm_0/LSTMCell/W_P_0:0 (2048, 512) /gpu:0 model/softmax_w_0:0 (99184, 512) /gpu:0 model/softmax_w_1:0 (99184, 512) /gpu:0 model/softmax_w_2:0 (99184, 512) /gpu:0 model/softmax_w_3:0 (99184, 512) /gpu:0 model/softmax_w_4:0 (99184, 512) /gpu:0 model/softmax_w_5:0 (99184, 512) /gpu:0 model/softmax_w_6:0 (99184, 512) /gpu:0 model/softmax_w_7:0 (99184, 512) /gpu:0 model/softmax_b:0 (793470,) /gpu:0 model/global_step:0 () model/model/emb_0/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_1/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_2/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_3/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_4/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_5/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_6/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_7/Adagrad:0 (99184, 512) /gpu:0 model/model/lstm_0/LSTMCell/W_0/Adagrad:0 (1024, 8192) /gpu:0 model/model/lstm_0/LSTMCell/B/Adagrad:0 (8192,) /gpu:0 model/model/lstm_0/LSTMCell/W_P_0/Adagrad:0 (2048, 512) /gpu:0 model/model/softmax_w_0/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_1/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_2/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_3/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_4/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_5/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_6/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_7/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_b/Adagrad:0 (793470,) /gpu:0 model/model/lstm_0/LSTMCell/W_0/ExponentialMovingAverage:0 (1024, 8192) /gpu:0 model/model/lstm_0/LSTMCell/B/ExponentialMovingAverage:0 (8192,) /gpu:0 model/model/lstm_0/LSTMCell/W_P_0/ExponentialMovingAverage:0 (2048, 512) /gpu:0 TRAINABLE VARIABLES model/emb_0:0 (99184, 512) /gpu:0 model/emb_1:0 (99184, 512) /gpu:0 model/emb_2:0 (99184, 512) /gpu:0 model/emb_3:0 (99184, 512) /gpu:0 model/emb_4:0 (99184, 512) /gpu:0 model/emb_5:0 (99184, 512) /gpu:0 model/emb_6:0 (99184, 512) /gpu:0 model/emb_7:0 (99184, 512) /gpu:0 model/lstm_0/LSTMCell/W_0:0 (1024, 8192) /gpu:0 model/lstm_0/LSTMCell/B:0 (8192,) /gpu:0 model/lstm_0/LSTMCell/W_P_0:0 (2048, 512) /gpu:0 model/softmax_w_0:0 (99184, 512) /gpu:0 model/softmax_w_1:0 (99184, 512) /gpu:0 model/softmax_w_2:0 (99184, 512) /gpu:0 model/softmax_w_3:0 (99184, 512) /gpu:0 model/softmax_w_4:0 (99184, 512) /gpu:0 model/softmax_w_5:0 (99184, 512) /gpu:0 model/softmax_w_6:0 (99184, 512) /gpu:0 model/softmax_w_7:0 (99184, 512) /gpu:0 model/softmax_b:0 (793470,) /gpu:0 LOCAL VARIABLES model/model/state_0_0:0 (128, 2560) /gpu:0 WARNING:tensorflow:From /opt/tensorflow/nvidia-examples/big_lstm/run_utils.py:32: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.train.MonitoredTrainingSession 2019-03-29 10:26:36.743927: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 0 with properties: name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.635 pciBusID: 0000:05:00.0 totalMemory: 10.73GiB freeMemory: 10.33GiB 2019-03-29 10:26:36.743958: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 0 2019-03-29 10:26:37.654353: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-03-29 10:26:37.654404: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0 2019-03-29 10:26:37.654413: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 0: N 2019-03-29 10:26:37.655410: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9974 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:05:00.0, compute capability: 7.5) Processing file: ./data/1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00021-of-00100 Finished processing! Iteration 1, time = 8.97s, wps = 285, train loss = 12.9656 Iteration 2, time = 2.17s, wps = 1180, train loss = 13.0277 Iteration 3, time = 0.07s, wps = 37140, train loss = 12.9758 Iteration 4, time = 0.07s, wps = 34673, train loss = 12.9271 Iteration 5, time = 0.07s, wps = 37942, train loss = 12.6601 Iteration 6, time = 0.07s, wps = 36800, train loss = 25.7397 Iteration 7, time = 0.06s, wps = 41351, train loss = 31.4427 Iteration 8, time = 0.07s, wps = 38217, train loss = 14.0656 Iteration 9, time = 0.06s, wps = 43694, train loss = 18.2265 Iteration 20, time = 0.65s, wps = 43363, train loss = 14.3139 Iteration 40, time = 1.23s, wps = 41596, train loss = 10.1626 Iteration 60, time = 1.23s, wps = 41559, train loss = 9.6507 Iteration 80, time = 1.22s, wps = 41862, train loss = 9.0295 Iteration 100, time = 1.24s, wps = 41295, train loss = 8.9224 Iteration 120, time = 1.22s, wps = 42025, train loss = 7.8789 Iteration 140, time = 1.24s, wps = 41217, train loss = 7.5747 Iteration 160, time = 1.22s, wps = 41831, train loss = 7.2930 Iteration 180, time = 1.26s, wps = 40781, train loss = 7.2032 Iteration 200, time = 1.26s, wps = 40707, train loss = 7.0153 Iteration 220, time = 1.22s, wps = 41821, train loss = 6.8531 Iteration 240, time = 1.34s, wps = 38068, train loss = 7.0365 Iteration 260, time = 1.28s, wps = 40046, train loss = 6.5156 Iteration 280, time = 1.22s, wps = 41853, train loss = 6.6915 Iteration 300, time = 1.29s, wps = 39841, train loss = 6.5283 Iteration 320, time = 1.29s, wps = 39832, train loss = 6.6831 Iteration 340, time = 1.28s, wps = 39972, train loss = 6.5944 Iteration 360, time = 1.30s, wps = 39512, train loss = 6.4506 Iteration 380, time = 1.30s, wps = 39398, train loss = 6.3966 Iteration 400, time = 1.29s, wps = 39783, train loss = 6.3137 Iteration 420, time = 1.25s, wps = 40842, train loss = 6.3115 Iteration 440, time = 1.26s, wps = 40583, train loss = 6.2399 Iteration 460, time = 1.30s, wps = 39531, train loss = 6.2131 Iteration 480, time = 1.31s, wps = 39158, train loss = 6.2439 Iteration 500, time = 1.30s, wps = 39464, train loss = 6.2290 Iteration 520, time = 1.28s, wps = 39961, train loss = 6.3131 Iteration 540, time = 1.29s, wps = 39575, train loss = 6.0897 Iteration 560, time = 1.28s, wps = 39914, train loss = 6.1168 Iteration 580, time = 1.28s, wps = 40030, train loss = 6.1855 Iteration 600, time = 1.28s, wps = 39967, train loss = 6.0568 Iteration 620, time = 1.29s, wps = 39607, train loss = 6.1021 Iteration 640, time = 1.29s, wps = 39761, train loss = 5.8848 Iteration 660, time = 1.30s, wps = 39251, train loss = 5.9353 Iteration 680, time = 1.30s, wps = 39400, train loss = 5.8727 Iteration 700, time = 1.29s, wps = 39619, train loss = 6.0305 Iteration 720, time = 1.28s, wps = 40099, train loss = 5.9729 Iteration 740, time = 1.27s, wps = 40172, train loss = 5.9273 Iteration 760, time = 1.31s, wps = 39079, train loss = 5.9317 Iteration 780, time = 1.30s, wps = 39527, train loss = 5.7325 Iteration 800, time = 1.30s, wps = 39417, train loss = 5.9453 Iteration 820, time = 1.30s, wps = 39397, train loss = 5.9423 Iteration 840, time = 1.29s, wps = 39607, train loss = 5.6622 Iteration 860, time = 1.29s, wps = 39566, train loss = 5.9405 Iteration 880, time = 1.28s, wps = 39924, train loss = 5.7385 Iteration 900, time = 1.29s, wps = 39622, train loss = 5.7644 Iteration 920, time = 1.28s, wps = 39997, train loss = 5.8271 Iteration 940, time = 1.28s, wps = 39937, train loss = 5.7446 Iteration 960, time = 1.28s, wps = 40039, train loss = 5.7643 Iteration 980, time = 1.29s, wps = 39835, train loss = 5.6585 Iteration 1000, time = 1.28s, wps = 39861, train loss = 5.7029 Iteration 1020, time = 1.29s, wps = 39829, train loss = 5.7140 Iteration 1040, time = 1.30s, wps = 39451, train loss = 5.6140 Iteration 1060, time = 1.30s, wps = 39370, train loss = 5.6684 Iteration 1080, time = 1.30s, wps = 39344, train loss = 5.6818 Iteration 1100, time = 1.29s, wps = 39602, train loss = 5.6624 Iteration 1120, time = 1.28s, wps = 40109, train loss = 5.6540 Iteration 1140, time = 1.29s, wps = 39707, train loss = 5.6097 Iteration 1160, time = 1.28s, wps = 40144, train loss = 5.7306 Iteration 1180, time = 1.29s, wps = 39558, train loss = 5.6173 Iteration 1200, time = 1.30s, wps = 39261, train loss = 5.6350 Iteration 1220, time = 1.30s, wps = 39350, train loss = 5.4836 Iteration 1240, time = 1.29s, wps = 39541, train loss = 5.5266 Iteration 1260, time = 1.29s, wps = 39833, train loss = 5.5891 Iteration 1280, time = 1.31s, wps = 39188, train loss = 5.5603 Iteration 1300, time = 1.28s, wps = 39863, train loss = 5.6405 Iteration 1320, time = 1.30s, wps = 39287, train loss = 5.4017 Iteration 1340, time = 1.30s, wps = 39346, train loss = 5.5506 Iteration 1360, time = 1.30s, wps = 39435, train loss = 5.5153 Iteration 1380, time = 1.32s, wps = 38834, train loss = 5.5978 Iteration 1400, time = 1.29s, wps = 39574, train loss = 5.4449 Iteration 1420, time = 1.29s, wps = 39701, train loss = 5.4968 Iteration 1440, time = 1.29s, wps = 39663, train loss = 5.5744 Iteration 1460, time = 1.29s, wps = 39660, train loss = 5.4840 Iteration 1480, time = 1.28s, wps = 39871, train loss = 5.4562 Iteration 1500, time = 1.27s, wps = 40435, train loss = 5.4576 Iteration 1520, time = 1.28s, wps = 39987, train loss = 5.4384 Iteration 1540, time = 1.28s, wps = 40047, train loss = 5.3493 Iteration 1560, time = 1.29s, wps = 39707, train loss = 5.5560 Iteration 1580, time = 1.30s, wps = 39309, train loss = 5.4938 Iteration 1600, time = 1.29s, wps = 39714, train loss = 5.4218 Iteration 1620, time = 1.32s, wps = 38932, train loss = 5.4369 Iteration 1640, time = 1.30s, wps = 39319, train loss = 5.3821 Iteration 1660, time = 1.30s, wps = 39464, train loss = 5.4119 Iteration 1680, time = 1.31s, wps = 39125, train loss = 5.3874 Iteration 1700, time = 1.28s, wps = 39884, train loss = 5.3692 Iteration 1720, time = 1.30s, wps = 39253, train loss = 5.4511 Iteration 1740, time = 1.30s, wps = 39315, train loss = 5.5057 Iteration 1760, time = 1.33s, wps = 38548, train loss = 5.4678 Iteration 1780, time = 1.28s, wps = 39878, train loss = 5.2732 Iteration 1800, time = 1.30s, wps = 39347, train loss = 5.2969 Iteration 1820, time = 1.29s, wps = 39656, train loss = 5.1246 Iteration 1840, time = 1.28s, wps = 39894, train loss = 5.3554 Iteration 1860, time = 1.26s, wps = 40537, train loss = 5.3166 Iteration 1880, time = 1.30s, wps = 39292, train loss = 5.3485 Iteration 1900, time = 1.26s, wps = 40478, train loss = 5.3726 Iteration 1920, time = 1.27s, wps = 40200, train loss = 5.3432 Iteration 1940, time = 1.29s, wps = 39602, train loss = 5.3780 Iteration 1960, time = 1.31s, wps = 39050, train loss = 5.3716 Iteration 1980, time = 1.27s, wps = 40174, train loss = 5.3019 Iteration 2000, time = 1.30s, wps = 39353, train loss = 5.2582 Iteration 2020, time = 1.31s, wps = 39184, train loss = 5.3955 Iteration 2040, time = 1.29s, wps = 39718, train loss = 5.3158 Iteration 2060, time = 1.30s, wps = 39333, train loss = 5.3119 Iteration 2080, time = 1.30s, wps = 39348, train loss = 5.3071 Iteration 2100, time = 1.30s, wps = 39324, train loss = 5.2040 Iteration 2120, time = 1.32s, wps = 38826, train loss = 5.2138 Iteration 2140, time = 1.31s, wps = 39226, train loss = 5.2393 Iteration 2160, time = 1.31s, wps = 39037, train loss = 5.1940 Iteration 2180, time = 1.27s, wps = 40446, train loss = 5.3661 Iteration 2200, time = 1.29s, wps = 39593, train loss = 5.2864 Iteration 2220, time = 1.31s, wps = 39197, train loss = 5.1728 Iteration 2240, time = 1.30s, wps = 39451, train loss = 5.1508 Iteration 2260, time = 1.29s, wps = 39696, train loss = 5.3303 real 3m37.999s user 5m59.150s sys 1m12.396s root@583ba0346921:/workspace/nvidia-examples/big_lstm# time python single_lm_train.py --mode=train --logdir=./logs --num_gpus=1 --datadir=./data/1-billion-word-language-modeling-benchmark-r13output *****HYPER PARAMETERS***** {'do_summaries': False, 'num_steps': 20, 'emb_size': 512, 'num_shards': 8, 'projected_size': 512, 'num_gpus': 1, 'num_sampled': 8192, 'max_grad_norm': 10.0, 'keep_prob': 0.9, 'max_time': 180, 'num_layers': 1, 'optimizer': 0, 'vocab_size': 793470, 'learning_rate': 0.2, 'average_params': True, 'num_delayed_steps': 150, 'run_profiler': False, 'state_size': 2048, 'batch_size': 128} ************************** WARNING:tensorflow:From /opt/tensorflow/nvidia-examples/big_lstm/model_utils.py:33: UniformUnitScaling.__init__ (from tensorflow.python.ops.init_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.initializers.variance_scaling instead with distribution=uniform to get equivalent behavior. Current time: 1553855679.4071329 ALL VARIABLES WARNING:tensorflow:From /opt/tensorflow/nvidia-examples/big_lstm/run_utils.py:18: all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02. Instructions for updating: Please use tf.global_variables instead. model/emb_0:0 (99184, 512) /gpu:0 model/emb_1:0 (99184, 512) /gpu:0 model/emb_2:0 (99184, 512) /gpu:0 model/emb_3:0 (99184, 512) /gpu:0 model/emb_4:0 (99184, 512) /gpu:0 model/emb_5:0 (99184, 512) /gpu:0 model/emb_6:0 (99184, 512) /gpu:0 model/emb_7:0 (99184, 512) /gpu:0 model/lstm_0/LSTMCell/W_0:0 (1024, 8192) /gpu:0 model/lstm_0/LSTMCell/B:0 (8192,) /gpu:0 model/lstm_0/LSTMCell/W_P_0:0 (2048, 512) /gpu:0 model/softmax_w_0:0 (99184, 512) /gpu:0 model/softmax_w_1:0 (99184, 512) /gpu:0 model/softmax_w_2:0 (99184, 512) /gpu:0 model/softmax_w_3:0 (99184, 512) /gpu:0 model/softmax_w_4:0 (99184, 512) /gpu:0 model/softmax_w_5:0 (99184, 512) /gpu:0 model/softmax_w_6:0 (99184, 512) /gpu:0 model/softmax_w_7:0 (99184, 512) /gpu:0 model/softmax_b:0 (793470,) /gpu:0 model/global_step:0 () model/model/emb_0/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_1/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_2/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_3/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_4/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_5/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_6/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_7/Adagrad:0 (99184, 512) /gpu:0 model/model/lstm_0/LSTMCell/W_0/Adagrad:0 (1024, 8192) /gpu:0 model/model/lstm_0/LSTMCell/B/Adagrad:0 (8192,) /gpu:0 model/model/lstm_0/LSTMCell/W_P_0/Adagrad:0 (2048, 512) /gpu:0 model/model/softmax_w_0/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_1/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_2/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_3/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_4/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_5/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_6/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_7/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_b/Adagrad:0 (793470,) /gpu:0 model/model/lstm_0/LSTMCell/W_0/ExponentialMovingAverage:0 (1024, 8192) /gpu:0 model/model/lstm_0/LSTMCell/B/ExponentialMovingAverage:0 (8192,) /gpu:0 model/model/lstm_0/LSTMCell/W_P_0/ExponentialMovingAverage:0 (2048, 512) /gpu:0 TRAINABLE VARIABLES model/emb_0:0 (99184, 512) /gpu:0 model/emb_1:0 (99184, 512) /gpu:0 model/emb_2:0 (99184, 512) /gpu:0 model/emb_3:0 (99184, 512) /gpu:0 model/emb_4:0 (99184, 512) /gpu:0 model/emb_5:0 (99184, 512) /gpu:0 model/emb_6:0 (99184, 512) /gpu:0 model/emb_7:0 (99184, 512) /gpu:0 model/lstm_0/LSTMCell/W_0:0 (1024, 8192) /gpu:0 model/lstm_0/LSTMCell/B:0 (8192,) /gpu:0 model/lstm_0/LSTMCell/W_P_0:0 (2048, 512) /gpu:0 model/softmax_w_0:0 (99184, 512) /gpu:0 model/softmax_w_1:0 (99184, 512) /gpu:0 model/softmax_w_2:0 (99184, 512) /gpu:0 model/softmax_w_3:0 (99184, 512) /gpu:0 model/softmax_w_4:0 (99184, 512) /gpu:0 model/softmax_w_5:0 (99184, 512) /gpu:0 model/softmax_w_6:0 (99184, 512) /gpu:0 model/softmax_w_7:0 (99184, 512) /gpu:0 model/softmax_b:0 (793470,) /gpu:0 LOCAL VARIABLES model/model/state_0_0:0 (128, 2560) /gpu:0 WARNING:tensorflow:From /opt/tensorflow/nvidia-examples/big_lstm/run_utils.py:32: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.train.MonitoredTrainingSession 2019-03-29 10:34:39.930373: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 0 with properties: name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.635 pciBusID: 0000:05:00.0 totalMemory: 10.73GiB freeMemory: 10.33GiB 2019-03-29 10:34:39.930405: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 0 2019-03-29 10:34:40.232604: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-03-29 10:34:40.232636: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0 2019-03-29 10:34:40.232644: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 0: N 2019-03-29 10:34:40.233141: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9974 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:05:00.0, compute capability: 7.5) Processing file: ./data/1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00091-of-00100 Finished processing! Iteration 2278, time = 6.83s, wps = 375, train loss = 5.5469 Iteration 2279, time = 2.90s, wps = 883, train loss = 5.2747 Iteration 2280, time = 0.07s, wps = 38296, train loss = 5.2242 Iteration 2281, time = 0.07s, wps = 35460, train loss = 5.2981 Iteration 2282, time = 0.07s, wps = 37998, train loss = 5.1874 Iteration 2283, time = 0.06s, wps = 41500, train loss = 5.1965 Iteration 2284, time = 0.06s, wps = 43754, train loss = 5.2397 Iteration 2285, time = 0.06s, wps = 40223, train loss = 5.1846 Iteration 2286, time = 0.07s, wps = 37514, train loss = 5.1908 Iteration 2297, time = 0.67s, wps = 41996, train loss = 5.1657 Iteration 2317, time = 1.22s, wps = 42107, train loss = 5.2044 Iteration 2337, time = 1.19s, wps = 42946, train loss = 5.2561 Iteration 2357, time = 1.24s, wps = 41196, train loss = 5.2116 Iteration 2377, time = 1.22s, wps = 42050, train loss = 5.1815 Iteration 2397, time = 1.25s, wps = 40837, train loss = 5.1212 Iteration 2417, time = 1.21s, wps = 42181, train loss = 5.3590 Iteration 2437, time = 1.24s, wps = 41174, train loss = 5.1255 Iteration 2457, time = 1.26s, wps = 40504, train loss = 5.1573 Iteration 2477, time = 1.22s, wps = 42026, train loss = 5.0837 Iteration 2497, time = 1.26s, wps = 40664, train loss = 5.2238 Iteration 2517, time = 1.27s, wps = 40416, train loss = 5.2194 Iteration 2537, time = 1.27s, wps = 40185, train loss = 5.1755 Iteration 2557, time = 1.30s, wps = 39464, train loss = 5.1182 Iteration 2577, time = 1.28s, wps = 39894, train loss = 4.9740 Iteration 2597, time = 1.24s, wps = 41158, train loss = 5.1256 Iteration 2617, time = 1.25s, wps = 40994, train loss = 5.0282 Iteration 2637, time = 1.22s, wps = 41884, train loss = 5.0880 Iteration 2657, time = 1.22s, wps = 42091, train loss = 5.0403 Iteration 2677, time = 1.37s, wps = 37500, train loss = 5.0657 Iteration 2697, time = 1.31s, wps = 39201, train loss = 5.1726 Iteration 2717, time = 1.28s, wps = 39915, train loss = 5.1135 Iteration 2737, time = 1.28s, wps = 39943, train loss = 5.1155 Iteration 2757, time = 1.30s, wps = 39443, train loss = 5.1384 Iteration 2777, time = 1.28s, wps = 40101, train loss = 5.0288 Iteration 2797, time = 1.30s, wps = 39383, train loss = 5.1488 Iteration 2817, time = 1.31s, wps = 38960, train loss = 5.1075 Iteration 2837, time = 1.30s, wps = 39536, train loss = 5.0430 Iteration 2857, time = 1.30s, wps = 39494, train loss = 5.0863 Iteration 2877, time = 1.29s, wps = 39700, train loss = 5.1325 Iteration 2897, time = 1.31s, wps = 39190, train loss = 5.0867 Iteration 2917, time = 1.29s, wps = 39660, train loss = 5.0636 Iteration 2937, time = 1.30s, wps = 39409, train loss = 5.1495 Iteration 2957, time = 1.31s, wps = 39034, train loss = 4.9993 Iteration 2977, time = 1.30s, wps = 39297, train loss = 5.1314 Iteration 2997, time = 1.28s, wps = 39899, train loss = 5.1049 Iteration 3017, time = 1.30s, wps = 39518, train loss = 5.0465 Iteration 3037, time = 1.30s, wps = 39234, train loss = 4.9523 Iteration 3057, time = 1.30s, wps = 39476, train loss = 5.0017 Iteration 3077, time = 1.32s, wps = 38778, train loss = 5.0475 Iteration 3097, time = 1.30s, wps = 39392, train loss = 5.1015 Iteration 3117, time = 1.31s, wps = 39206, train loss = 5.0310 Iteration 3137, time = 1.30s, wps = 39397, train loss = 5.0926 Iteration 3157, time = 1.31s, wps = 39228, train loss = 5.0650 Iteration 3177, time = 1.28s, wps = 39894, train loss = 5.0637 Iteration 3197, time = 1.30s, wps = 39417, train loss = 5.0009 Iteration 3217, time = 1.30s, wps = 39503, train loss = 5.0565 Iteration 3237, time = 1.28s, wps = 39894, train loss = 4.9824 Iteration 3257, time = 1.30s, wps = 39451, train loss = 4.9794 Iteration 3277, time = 1.28s, wps = 39945, train loss = 5.0929 Iteration 3297, time = 1.32s, wps = 38660, train loss = 5.0294 Iteration 3317, time = 1.30s, wps = 39294, train loss = 5.0332 Iteration 3337, time = 1.31s, wps = 39031, train loss = 4.9091 Iteration 3357, time = 1.29s, wps = 39700, train loss = 5.1077 Iteration 3377, time = 1.32s, wps = 38912, train loss = 5.0477 Iteration 3397, time = 1.29s, wps = 39581, train loss = 4.9147 Iteration 3417, time = 1.29s, wps = 39581, train loss = 5.0430 Iteration 3437, time = 1.30s, wps = 39277, train loss = 5.0277 Iteration 3457, time = 1.30s, wps = 39483, train loss = 5.0010 Iteration 3477, time = 1.28s, wps = 40134, train loss = 4.8894 Iteration 3497, time = 1.30s, wps = 39316, train loss = 4.9039 Iteration 3517, time = 1.28s, wps = 39924, train loss = 4.9874 Iteration 3537, time = 1.30s, wps = 39375, train loss = 5.0470 Iteration 3557, time = 1.29s, wps = 39827, train loss = 4.9740 Iteration 3577, time = 1.31s, wps = 39139, train loss = 4.9573 Iteration 3597, time = 1.29s, wps = 39636, train loss = 5.0232 Iteration 3617, time = 1.31s, wps = 39155, train loss = 4.8929 Iteration 3637, time = 1.27s, wps = 40216, train loss = 4.8221 Iteration 3657, time = 1.28s, wps = 40072, train loss = 5.1032 Iteration 3677, time = 1.30s, wps = 39536, train loss = 4.9120 Iteration 3697, time = 1.28s, wps = 40003, train loss = 4.8285 Iteration 3717, time = 1.29s, wps = 39662, train loss = 4.9870 Iteration 3737, time = 1.31s, wps = 39072, train loss = 4.9321 Iteration 3757, time = 1.29s, wps = 39761, train loss = 5.0355 Iteration 3777, time = 1.30s, wps = 39498, train loss = 4.9469 Iteration 3797, time = 1.30s, wps = 39520, train loss = 4.9777 Iteration 3817, time = 1.30s, wps = 39257, train loss = 4.8558 Iteration 3837, time = 1.32s, wps = 38715, train loss = 5.1197 Iteration 3857, time = 1.29s, wps = 39767, train loss = 4.9485 Iteration 3877, time = 1.29s, wps = 39778, train loss = 4.7769 Iteration 3897, time = 1.30s, wps = 39442, train loss = 4.9882 Iteration 3917, time = 1.30s, wps = 39297, train loss = 4.7765 Iteration 3937, time = 1.31s, wps = 39039, train loss = 4.9522 Iteration 3957, time = 1.28s, wps = 40037, train loss = 4.9045 Iteration 3977, time = 1.29s, wps = 39840, train loss = 4.8933 Iteration 3997, time = 1.30s, wps = 39448, train loss = 4.9105 Iteration 4017, time = 1.26s, wps = 40497, train loss = 4.8677 Iteration 4037, time = 1.30s, wps = 39455, train loss = 4.8007 Iteration 4057, time = 1.29s, wps = 39760, train loss = 4.9147 Iteration 4077, time = 1.29s, wps = 39811, train loss = 4.9145 Iteration 4097, time = 1.29s, wps = 39738, train loss = 4.9353 Iteration 4117, time = 1.30s, wps = 39423, train loss = 4.9402 Iteration 4137, time = 1.29s, wps = 39666, train loss = 4.8640 Iteration 4157, time = 1.29s, wps = 39596, train loss = 4.8828 Iteration 4177, time = 1.31s, wps = 39143, train loss = 4.7806 Iteration 4197, time = 1.32s, wps = 38860, train loss = 4.9483 Iteration 4217, time = 1.30s, wps = 39371, train loss = 4.7150 Iteration 4237, time = 1.29s, wps = 39766, train loss = 4.9453 Iteration 4257, time = 1.30s, wps = 39263, train loss = 4.8535 Iteration 4277, time = 1.28s, wps = 40097, train loss = 4.8783 Iteration 4297, time = 1.29s, wps = 39629, train loss = 4.7772 Iteration 4317, time = 1.30s, wps = 39252, train loss = 4.8806 Iteration 4337, time = 1.28s, wps = 40001, train loss = 4.8603 Iteration 4357, time = 1.27s, wps = 40184, train loss = 4.8114 Iteration 4377, time = 1.28s, wps = 39986, train loss = 4.8745 Iteration 4397, time = 1.29s, wps = 39773, train loss = 4.8249 Iteration 4417, time = 1.31s, wps = 38938, train loss = 4.9664 Iteration 4437, time = 1.30s, wps = 39473, train loss = 4.7577 Iteration 4457, time = 1.29s, wps = 39598, train loss = 4.7798 Iteration 4477, time = 1.31s, wps = 39142, train loss = 4.8297 Iteration 4497, time = 1.29s, wps = 39716, train loss = 4.6917 Iteration 4517, time = 1.31s, wps = 39032, train loss = 4.7524 Iteration 4537, time = 1.30s, wps = 39419, train loss = 4.8142 Iteration 4557, time = 1.32s, wps = 38921, train loss = 4.8282 Iteration 4577, time = 1.28s, wps = 39928, train loss = 4.9091 Iteration 4597, time = 1.32s, wps = 38867, train loss = 4.7730 Iteration 4617, time = 1.28s, wps = 40065, train loss = 4.8611 Iteration 4637, time = 1.30s, wps = 39304, train loss = 4.7496 Iteration 4657, time = 1.31s, wps = 39159, train loss = 4.7411 Iteration 4677, time = 1.29s, wps = 39780, train loss = 4.8527 Iteration 4697, time = 1.29s, wps = 39547, train loss = 4.8050 Iteration 4717, time = 1.28s, wps = 40101, train loss = 4.6520 Iteration 4737, time = 1.27s, wps = 40247, train loss = 4.8355 Iteration 4757, time = 1.29s, wps = 39740, train loss = 4.6325 Iteration 4777, time = 1.29s, wps = 39659, train loss = 4.6269 Iteration 4797, time = 1.29s, wps = 39548, train loss = 4.8472 real 3m34.234s user 6m15.701s sys 1m12.291s root@583ba0346921:/workspace/nvidia-examples/big_lstm# time python single_lm_train.py --mode=train --logdir=./logs --num_gpus=1 --datadir=./data/1-billion-word-language-modeling-benchmark-r13output *****HYPER PARAMETERS***** {'num_sampled': 8192, 'optimizer': 0, 'max_time': 180, 'run_profiler': False, 'max_grad_norm': 10.0, 'average_params': True, 'state_size': 2048, 'num_steps': 20, 'keep_prob': 0.9, 'do_summaries': False, 'num_gpus': 1, 'num_delayed_steps': 150, 'num_shards': 8, 'num_layers': 1, 'projected_size': 512, 'vocab_size': 793470, 'emb_size': 512, 'learning_rate': 0.2, 'batch_size': 128} ************************** WARNING:tensorflow:From /opt/tensorflow/nvidia-examples/big_lstm/model_utils.py:33: UniformUnitScaling.__init__ (from tensorflow.python.ops.init_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.initializers.variance_scaling instead with distribution=uniform to get equivalent behavior. Current time: 1553856274.8647537 ALL VARIABLES WARNING:tensorflow:From /opt/tensorflow/nvidia-examples/big_lstm/run_utils.py:18: all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02. Instructions for updating: Please use tf.global_variables instead. model/emb_0:0 (99184, 512) /gpu:0 model/emb_1:0 (99184, 512) /gpu:0 model/emb_2:0 (99184, 512) /gpu:0 model/emb_3:0 (99184, 512) /gpu:0 model/emb_4:0 (99184, 512) /gpu:0 model/emb_5:0 (99184, 512) /gpu:0 model/emb_6:0 (99184, 512) /gpu:0 model/emb_7:0 (99184, 512) /gpu:0 model/lstm_0/LSTMCell/W_0:0 (1024, 8192) /gpu:0 model/lstm_0/LSTMCell/B:0 (8192,) /gpu:0 model/lstm_0/LSTMCell/W_P_0:0 (2048, 512) /gpu:0 model/softmax_w_0:0 (99184, 512) /gpu:0 model/softmax_w_1:0 (99184, 512) /gpu:0 model/softmax_w_2:0 (99184, 512) /gpu:0 model/softmax_w_3:0 (99184, 512) /gpu:0 model/softmax_w_4:0 (99184, 512) /gpu:0 model/softmax_w_5:0 (99184, 512) /gpu:0 model/softmax_w_6:0 (99184, 512) /gpu:0 model/softmax_w_7:0 (99184, 512) /gpu:0 model/softmax_b:0 (793470,) /gpu:0 model/global_step:0 () model/model/emb_0/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_1/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_2/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_3/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_4/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_5/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_6/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_7/Adagrad:0 (99184, 512) /gpu:0 model/model/lstm_0/LSTMCell/W_0/Adagrad:0 (1024, 8192) /gpu:0 model/model/lstm_0/LSTMCell/B/Adagrad:0 (8192,) /gpu:0 model/model/lstm_0/LSTMCell/W_P_0/Adagrad:0 (2048, 512) /gpu:0 model/model/softmax_w_0/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_1/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_2/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_3/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_4/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_5/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_6/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_7/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_b/Adagrad:0 (793470,) /gpu:0 model/model/lstm_0/LSTMCell/W_0/ExponentialMovingAverage:0 (1024, 8192) /gpu:0 model/model/lstm_0/LSTMCell/B/ExponentialMovingAverage:0 (8192,) /gpu:0 model/model/lstm_0/LSTMCell/W_P_0/ExponentialMovingAverage:0 (2048, 512) /gpu:0 TRAINABLE VARIABLES model/emb_0:0 (99184, 512) /gpu:0 model/emb_1:0 (99184, 512) /gpu:0 model/emb_2:0 (99184, 512) /gpu:0 model/emb_3:0 (99184, 512) /gpu:0 model/emb_4:0 (99184, 512) /gpu:0 model/emb_5:0 (99184, 512) /gpu:0 model/emb_6:0 (99184, 512) /gpu:0 model/emb_7:0 (99184, 512) /gpu:0 model/lstm_0/LSTMCell/W_0:0 (1024, 8192) /gpu:0 model/lstm_0/LSTMCell/B:0 (8192,) /gpu:0 model/lstm_0/LSTMCell/W_P_0:0 (2048, 512) /gpu:0 model/softmax_w_0:0 (99184, 512) /gpu:0 model/softmax_w_1:0 (99184, 512) /gpu:0 model/softmax_w_2:0 (99184, 512) /gpu:0 model/softmax_w_3:0 (99184, 512) /gpu:0 model/softmax_w_4:0 (99184, 512) /gpu:0 model/softmax_w_5:0 (99184, 512) /gpu:0 model/softmax_w_6:0 (99184, 512) /gpu:0 model/softmax_w_7:0 (99184, 512) /gpu:0 model/softmax_b:0 (793470,) /gpu:0 LOCAL VARIABLES model/model/state_0_0:0 (128, 2560) /gpu:0 WARNING:tensorflow:From /opt/tensorflow/nvidia-examples/big_lstm/run_utils.py:32: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.train.MonitoredTrainingSession 2019-03-29 10:44:35.397039: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 0 with properties: name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.635 pciBusID: 0000:05:00.0 totalMemory: 10.73GiB freeMemory: 10.33GiB 2019-03-29 10:44:35.397071: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 0 2019-03-29 10:44:35.686126: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-03-29 10:44:35.686167: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0 2019-03-29 10:44:35.686184: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 0: N 2019-03-29 10:44:35.686698: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9974 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:05:00.0, compute capability: 7.5) Processing file: ./data/1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00036-of-00100 Finished processing! Iteration 4801, time = 6.70s, wps = 382, train loss = 6.2155 Iteration 4802, time = 2.81s, wps = 910, train loss = 5.1062 Iteration 4803, time = 0.07s, wps = 37104, train loss = 4.9653 Iteration 4804, time = 0.07s, wps = 37362, train loss = 4.8959 Iteration 4805, time = 0.06s, wps = 41055, train loss = 4.9388 Iteration 4806, time = 0.06s, wps = 41185, train loss = 4.8265 Iteration 4807, time = 0.06s, wps = 40391, train loss = 4.9383 Iteration 4808, time = 0.07s, wps = 38231, train loss = 4.9529 Iteration 4809, time = 0.06s, wps = 42883, train loss = 4.8692 Iteration 4820, time = 0.67s, wps = 42154, train loss = 4.6951 Iteration 4840, time = 1.21s, wps = 42271, train loss = 4.7706 Iteration 4860, time = 1.24s, wps = 41259, train loss = 4.8538 Iteration 4880, time = 1.24s, wps = 41276, train loss = 4.8634 Iteration 4900, time = 1.22s, wps = 41990, train loss = 4.7681 Iteration 4920, time = 1.24s, wps = 41393, train loss = 4.7745 Iteration 4940, time = 1.22s, wps = 41844, train loss = 4.8469 Iteration 4960, time = 1.23s, wps = 41637, train loss = 4.7993 Iteration 4980, time = 1.20s, wps = 42840, train loss = 4.7059 Iteration 5000, time = 1.21s, wps = 42404, train loss = 4.8234 Iteration 5020, time = 1.21s, wps = 42141, train loss = 4.8306 Iteration 5040, time = 1.28s, wps = 40104, train loss = 4.8052 Iteration 5060, time = 1.25s, wps = 41118, train loss = 4.7647 Iteration 5080, time = 1.28s, wps = 39874, train loss = 4.8283 Iteration 5100, time = 1.28s, wps = 40103, train loss = 4.7459 Iteration 5120, time = 1.29s, wps = 39728, train loss = 4.8731 Iteration 5140, time = 1.25s, wps = 40883, train loss = 4.7952 Iteration 5160, time = 1.27s, wps = 40254, train loss = 4.7142 Iteration 5180, time = 1.27s, wps = 40466, train loss = 4.7883 Iteration 5200, time = 1.34s, wps = 38259, train loss = 4.7612 Iteration 5220, time = 1.31s, wps = 39197, train loss = 4.7317 Iteration 5240, time = 1.29s, wps = 39718, train loss = 4.7642 Iteration 5260, time = 1.28s, wps = 40083, train loss = 4.6986 Iteration 5280, time = 1.29s, wps = 39561, train loss = 4.7601 Iteration 5300, time = 1.29s, wps = 39787, train loss = 4.7111 Iteration 5320, time = 1.29s, wps = 39777, train loss = 4.8426 Iteration 5340, time = 1.30s, wps = 39303, train loss = 4.6760 Iteration 5360, time = 1.28s, wps = 40084, train loss = 4.7477 Iteration 5380, time = 1.30s, wps = 39300, train loss = 4.7793 Iteration 5400, time = 1.29s, wps = 39619, train loss = 4.7449 Iteration 5420, time = 1.31s, wps = 39206, train loss = 4.6667 Iteration 5440, time = 1.30s, wps = 39317, train loss = 4.8088 Iteration 5460, time = 1.28s, wps = 40079, train loss = 4.7306 Iteration 5480, time = 1.31s, wps = 38947, train loss = 4.7762 Iteration 5500, time = 1.30s, wps = 39284, train loss = 4.7272 Iteration 5520, time = 1.31s, wps = 39034, train loss = 4.8001 Iteration 5540, time = 1.28s, wps = 39996, train loss = 4.7158 Iteration 5560, time = 1.30s, wps = 39530, train loss = 4.7291 Iteration 5580, time = 1.31s, wps = 39186, train loss = 4.7652 Iteration 5600, time = 1.30s, wps = 39395, train loss = 4.6619 Iteration 5620, time = 1.31s, wps = 39098, train loss = 4.8341 Iteration 5640, time = 1.28s, wps = 39918, train loss = 4.6029 Iteration 5660, time = 1.26s, wps = 40620, train loss = 4.6703 Iteration 5680, time = 1.29s, wps = 39678, train loss = 4.6732 Iteration 5700, time = 1.29s, wps = 39602, train loss = 4.7512 Iteration 5720, time = 1.29s, wps = 39663, train loss = 4.7991 Iteration 5740, time = 1.32s, wps = 38876, train loss = 4.6523 Iteration 5760, time = 1.27s, wps = 40289, train loss = 4.7201 Iteration 5780, time = 1.30s, wps = 39524, train loss = 4.8063 Iteration 5800, time = 1.30s, wps = 39408, train loss = 4.7005 Iteration 5820, time = 1.26s, wps = 40609, train loss = 4.6645 Iteration 5840, time = 1.28s, wps = 39894, train loss = 4.7162 Iteration 5860, time = 1.27s, wps = 40187, train loss = 4.6751 Iteration 5880, time = 1.32s, wps = 38826, train loss = 4.7749 Iteration 5900, time = 1.30s, wps = 39389, train loss = 4.7251 Iteration 5920, time = 1.27s, wps = 40262, train loss = 4.6277 Iteration 5940, time = 1.30s, wps = 39344, train loss = 4.7406 Iteration 5960, time = 1.30s, wps = 39281, train loss = 4.7646 Iteration 5980, time = 1.30s, wps = 39382, train loss = 4.6486 Iteration 6000, time = 1.30s, wps = 39297, train loss = 4.7338 Iteration 6020, time = 1.28s, wps = 40039, train loss = 4.5735 Iteration 6040, time = 1.31s, wps = 39035, train loss = 4.6882 Iteration 6060, time = 1.32s, wps = 38656, train loss = 4.6099 Iteration 6080, time = 1.29s, wps = 39742, train loss = 4.8007 Iteration 6100, time = 1.29s, wps = 39613, train loss = 4.7106 Iteration 6120, time = 1.30s, wps = 39340, train loss = 4.7738 Iteration 6140, time = 1.30s, wps = 39517, train loss = 4.7224 Iteration 6160, time = 1.31s, wps = 39161, train loss = 4.6566 Iteration 6180, time = 1.31s, wps = 39143, train loss = 4.7401 Iteration 6200, time = 1.30s, wps = 39285, train loss = 4.6690 Iteration 6220, time = 1.30s, wps = 39430, train loss = 4.6389 Iteration 6240, time = 1.28s, wps = 39868, train loss = 4.7510 Iteration 6260, time = 1.29s, wps = 39629, train loss = 4.6941 Iteration 6280, time = 1.29s, wps = 39747, train loss = 4.6446 Iteration 6300, time = 1.31s, wps = 39232, train loss = 4.7802 Iteration 6320, time = 1.31s, wps = 38994, train loss = 4.6541 Iteration 6340, time = 1.30s, wps = 39472, train loss = 4.5597 Iteration 6360, time = 1.29s, wps = 39605, train loss = 4.5796 Iteration 6380, time = 1.28s, wps = 39866, train loss = 4.5587 Iteration 6400, time = 1.30s, wps = 39408, train loss = 4.7083 Iteration 6420, time = 1.29s, wps = 39802, train loss = 4.6346 Iteration 6440, time = 1.31s, wps = 39022, train loss = 4.7151 Iteration 6460, time = 1.29s, wps = 39794, train loss = 4.7528 Iteration 6480, time = 1.29s, wps = 39746, train loss = 4.7295 Iteration 6500, time = 1.31s, wps = 39142, train loss = 4.7090 Iteration 6520, time = 1.29s, wps = 39784, train loss = 4.6346 Iteration 6540, time = 1.31s, wps = 39031, train loss = 4.7132 Iteration 6560, time = 1.28s, wps = 40059, train loss = 4.7169 Iteration 6580, time = 1.29s, wps = 39723, train loss = 4.7048 Iteration 6600, time = 1.30s, wps = 39309, train loss = 4.7122 Iteration 6620, time = 1.29s, wps = 39678, train loss = 4.5453 Iteration 6640, time = 1.28s, wps = 39850, train loss = 4.6981 Iteration 6660, time = 1.31s, wps = 39214, train loss = 4.6563 Iteration 6680, time = 1.32s, wps = 38869, train loss = 4.6880 Iteration 6700, time = 1.31s, wps = 38997, train loss = 4.7284 Iteration 6720, time = 1.31s, wps = 39188, train loss = 4.6279 Iteration 6740, time = 1.30s, wps = 39436, train loss = 4.6513 Iteration 6760, time = 1.30s, wps = 39281, train loss = 4.6983 Iteration 6780, time = 1.30s, wps = 39374, train loss = 4.7153 Iteration 6800, time = 1.28s, wps = 40125, train loss = 4.5771 Iteration 6820, time = 1.31s, wps = 39016, train loss = 4.7465 Iteration 6840, time = 1.29s, wps = 39683, train loss = 4.6544 Iteration 6860, time = 1.30s, wps = 39466, train loss = 4.5659 Iteration 6880, time = 1.31s, wps = 39044, train loss = 4.6241 Iteration 6900, time = 1.29s, wps = 39823, train loss = 4.5431 Iteration 6920, time = 1.30s, wps = 39387, train loss = 4.5799 Iteration 6940, time = 1.31s, wps = 39194, train loss = 4.6289 Iteration 6960, time = 1.29s, wps = 39711, train loss = 4.5390 Iteration 6980, time = 1.29s, wps = 39562, train loss = 4.6603 Iteration 7000, time = 1.29s, wps = 39674, train loss = 4.6549 Iteration 7020, time = 1.32s, wps = 38829, train loss = 4.6146 Iteration 7040, time = 1.31s, wps = 39179, train loss = 4.7363 Iteration 7060, time = 1.29s, wps = 39745, train loss = 4.6569 Iteration 7080, time = 1.30s, wps = 39413, train loss = 4.6032 Iteration 7100, time = 1.31s, wps = 39025, train loss = 4.5537 Iteration 7120, time = 1.30s, wps = 39281, train loss = 4.6000 Iteration 7140, time = 1.30s, wps = 39384, train loss = 4.6607 Iteration 7160, time = 1.32s, wps = 38876, train loss = 4.6731 Iteration 7180, time = 1.29s, wps = 39601, train loss = 4.6192 Iteration 7200, time = 1.31s, wps = 39105, train loss = 4.6644 Iteration 7220, time = 1.30s, wps = 39282, train loss = 4.5652 Iteration 7240, time = 1.30s, wps = 39396, train loss = 4.6406 Iteration 7260, time = 1.30s, wps = 39461, train loss = 4.4673 Iteration 7280, time = 1.29s, wps = 39605, train loss = 4.7510 Iteration 7300, time = 1.31s, wps = 39231, train loss = 4.6858 Iteration 7320, time = 1.31s, wps = 39165, train loss = 4.5525 real 3m34.179s user 6m15.541s sys 1m10.962s root@583ba0346921:/workspace/nvidia-examples/big_lstm# cat /etc/os-release NAME="Ubuntu" VERSION="16.04.5 LTS (Xenial Xerus)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 16.04.5 LTS" VERSION_ID="16.04" HOME_URL="http://www.ubuntu.com/" SUPPORT_URL="http://help.ubuntu.com/" BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/" VERSION_CODENAME=xenial UBUNTU_CODENAME=xenial root@583ba0346921:/workspace/nvidia-examples/big_lstm# nvcc -V nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2018 NVIDIA Corporation Built on Sat_Aug_25_21:08:01_CDT_2018 Cuda compilation tools, release 10.0, V10.0.130 root@583ba0346921:/workspace/nvidia-examples/big_lstm# cd data root@583ba0346921:/workspace/nvidia-examples/big_lstm/data# ls 1-billion-word-language-modeling-benchmark-r13output root@583ba0346921:/workspace/nvidia-examples/big_lstm/data# cd 1-billion-word-language-modeling-benchmark-r13output root@583ba0346921:/workspace/nvidia-examples/big_lstm/data/1-billion-word-language-modeling-benchmark-r13output# ls 1b_word_vocab.txt heldout-monolingual.tokenized.shuffled README training-monolingual.tokenized.shuffled root@583ba0346921:/workspace/nvidia-examples/big_lstm/data/1-billion-word-language-modeling-benchmark-r13output# cd training-monolingual.tokenized.shuffled root@583ba0346921:/workspace/nvidia-examples/big_lstm/data/1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled# ls news.en-00001-of-00100 news.en-00034-of-00100 news.en-00067-of-00100 news.en-00002-of-00100 news.en-00035-of-00100 news.en-00068-of-00100 news.en-00003-of-00100 news.en-00036-of-00100 news.en-00069-of-00100 news.en-00004-of-00100 news.en-00037-of-00100 news.en-00070-of-00100 news.en-00005-of-00100 news.en-00038-of-00100 news.en-00071-of-00100 news.en-00006-of-00100 news.en-00039-of-00100 news.en-00072-of-00100 news.en-00007-of-00100 news.en-00040-of-00100 news.en-00073-of-00100 news.en-00008-of-00100 news.en-00041-of-00100 news.en-00074-of-00100 news.en-00009-of-00100 news.en-00042-of-00100 news.en-00075-of-00100 news.en-00010-of-00100 news.en-00043-of-00100 news.en-00076-of-00100 news.en-00011-of-00100 news.en-00044-of-00100 news.en-00077-of-00100 news.en-00012-of-00100 news.en-00045-of-00100 news.en-00078-of-00100 news.en-00013-of-00100 news.en-00046-of-00100 news.en-00079-of-00100 news.en-00014-of-00100 news.en-00047-of-00100 news.en-00080-of-00100 news.en-00015-of-00100 news.en-00048-of-00100 news.en-00081-of-00100 news.en-00016-of-00100 news.en-00049-of-00100 news.en-00082-of-00100 news.en-00017-of-00100 news.en-00050-of-00100 news.en-00083-of-00100 news.en-00018-of-00100 news.en-00051-of-00100 news.en-00084-of-00100 news.en-00019-of-00100 news.en-00052-of-00100 news.en-00085-of-00100 news.en-00020-of-00100 news.en-00053-of-00100 news.en-00086-of-00100 news.en-00021-of-00100 news.en-00054-of-00100 news.en-00087-of-00100 news.en-00022-of-00100 news.en-00055-of-00100 news.en-00088-of-00100 news.en-00023-of-00100 news.en-00056-of-00100 news.en-00089-of-00100 news.en-00024-of-00100 news.en-00057-of-00100 news.en-00090-of-00100 news.en-00025-of-00100 news.en-00058-of-00100 news.en-00091-of-00100 news.en-00026-of-00100 news.en-00059-of-00100 news.en-00092-of-00100 news.en-00027-of-00100 news.en-00060-of-00100 news.en-00093-of-00100 news.en-00028-of-00100 news.en-00061-of-00100 news.en-00094-of-00100 news.en-00029-of-00100 news.en-00062-of-00100 news.en-00095-of-00100 news.en-00030-of-00100 news.en-00063-of-00100 news.en-00096-of-00100 news.en-00031-of-00100 news.en-00064-of-00100 news.en-00097-of-00100 news.en-00032-of-00100 news.en-00065-of-00100 news.en-00098-of-00100 news.en-00033-of-00100 news.en-00066-of-00100 news.en-00099-of-00100 root@583ba0346921:/workspace/nvidia-examples/big_lstm/data/1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled# exit exit chibi@1604:~$ cat /etc/os-release NAME="Ubuntu" VERSION="16.04.6 LTS (Xenial Xerus)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 16.04.6 LTS" VERSION_ID="16.04" HOME_URL="http://www.ubuntu.com/" SUPPORT_URL="http://help.ubuntu.com/" BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/" VERSION_CODENAME=xenial UBUNTU_CODENAME=xenial chibi@1604:~$ nvcc -V nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2019 NVIDIA Corporation Built on Fri_Feb__8_19:08:17_PST_2019 Cuda compilation tools, release 10.1, V10.1.105 chibi@1604:~$ sudo hddtemp /dev/sda [sudo] chibi のパスワード: /dev/sda: TS128GSSD370S: 21°C chibi@1604:~$