ubuntu@ubuntu:~$ sudo nvidia-docker run --rm -ti nvcr.io/nvidia/tensorflow:18.09-py3 [sudo] password for ubuntu: ================ == TensorFlow == ================ NVIDIA Release 18.09 (build 687558) Container image Copyright (c) 2018, NVIDIA CORPORATION. All rights reserved. Copyright 2017 The TensorFlow Authors. All rights reserved. Various files include modifications (c) NVIDIA CORPORATION. All rights reserved. NVIDIA modifications are covered by the license terms that apply to the underlying project or file. NOTE: The SHMEM allocation limit is set to the default of 64MB. This may be insufficient for TensorFlow. NVIDIA recommends the use of the following flags: nvidia-docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 ... root@870cf85b00ae:/workspace# ls README.md docker-examples nvidia-examples root@870cf85b00ae:/workspace# cd nvidia-examples root@870cf85b00ae:/workspace/nvidia-examples# ls OpenSeq2Seq big_lstm build_imagenet_data cnn tftrt root@870cf85b00ae:/workspace/nvidia-examples# cd big_lstm root@870cf85b00ae:/workspace/nvidia-examples/big_lstm# ls 1b_word_vocab.txt data_utils_test.py language_model_test.py README.md download_1b_words_data.sh model_utils.py __init__.py hparams.py run_utils.py common.py hparams_test.py single_lm_train.py data_utils.py language_model.py testdata root@870cf85b00ae:/workspace/nvidia-examples/big_lstm# ./download_1b_words_data.sh Please specify root of dataset directory: data Success: dataset root dir validated --2019-03-19 17:21:03-- http://www.statmt.org/lm-benchmark/1-billion-word-language-modeling-benchmark-r13output.tar.gz Resolving www.statmt.org (www.statmt.org)... 129.215.197.184 Connecting to www.statmt.org (www.statmt.org)|129.215.197.184|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 1792209805 (1.7G) [application/x-gzip] Saving to: ‘1-billion-word-language-modeling-benchmark-r13output.tar.gz’ 1-billion-word-langu 100%[===================>] 1.67G 496KB/s in 61m 37s 2019-03-19 18:22:40 (473 KB/s) - ‘1-billion-word-language-modeling-benchmark-r13output.tar.gz’ saved [1792209805/1792209805] 1-billion-word-language-modeling-benchmark-r13output/ 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/ 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00024-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00057-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00055-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00096-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00081-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00033-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00072-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00082-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00018-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00008-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00059-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00005-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00091-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00062-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00031-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00095-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00076-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00006-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00038-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00015-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00087-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00021-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00049-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00009-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00027-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00056-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00046-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00032-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00029-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00088-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00085-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00011-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00012-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00067-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00003-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00093-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00050-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00053-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00044-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00019-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00066-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00028-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00045-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00039-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00071-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00052-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00078-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00037-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00002-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00014-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00048-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00017-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00004-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00077-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00080-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00020-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00051-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00016-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00079-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00043-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00068-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00099-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00064-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00034-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00054-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00040-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00070-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00063-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00041-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00083-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00061-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00073-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00094-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00030-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00060-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00035-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00023-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00042-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00025-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00090-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00089-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00065-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00075-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00022-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00026-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00098-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00084-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00010-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00069-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00013-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00092-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00036-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00097-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00007-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00074-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00001-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00047-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00086-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00058-of-00100 1-billion-word-language-modeling-benchmark-r13output/.svn/ 1-billion-word-language-modeling-benchmark-r13output/.svn/tmp/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/de/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/de/de102cd0c91cd19e6612f0840e68a2f20ba8134c.svn-base 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/de/deed1b75d3bd5cc36ae6aeb85d56680b892b7948.svn-base 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/86/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/86/86c58db52fbf362c5bc329afc33b8805085fcb0d.svn-base 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/9f/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/9f/9f2882e21f860a83ad6ea8898ebab140974ed301.svn-base 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/bc/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/bc/bcdbc523ee7488dc438cab869b6d5e236578dbfa.svn-base 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/d2/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/d2/d2718bc26d0ee0a213d7d4add99a304cb5b39ede.svn-base 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/c5/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/c5/c5b24f61479da923123d0394a188da922ea0359c.svn-base 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/11/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/11/116d6ea61730d8199127596b072e981338597779.svn-base 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/b0/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/b0/b0e26559cfe641245584a9400b35ba28d64f1411.svn-base 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/d3/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/d3/d3ae508e3bcb0e696dd70aecd052410f1f7afc1d.svn-base 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/9e/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/9e/9e148bd766e8805e0eb97eeae250433ec7a2e996.svn-base 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/31/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/31/31b645a482e0b81fda3c567cada307c6fcf7ec80.svn-base 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/da/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/da/da39a3ee5e6b4b0d3255bfef95601890afd80709.svn-base 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/c1/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/c1/c1ed42c415ec884e591fb5c70d373da640a383b5.svn-base 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/e3/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/e3/e37ba0f85e94073ccaced1eed7e4f5d737a25f49.svn-base 1-billion-word-language-modeling-benchmark-r13output/.svn/entries 1-billion-word-language-modeling-benchmark-r13output/.svn/format 1-billion-word-language-modeling-benchmark-r13output/.svn/wc.db 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/ 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00015-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00031-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00027-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00010-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00033-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00042-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00046-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00037-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00000-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00029-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00013-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00002-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00048-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00006-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00030-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00025-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00039-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00008-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00020-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00001-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00034-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00044-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00045-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00016-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00004-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00035-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00038-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00009-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00024-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00022-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00021-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00032-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00011-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00049-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00041-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00019-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00023-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00040-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00014-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00007-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00017-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00012-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00018-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00003-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00028-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en-00000-of-00100 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00043-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00005-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00036-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00026-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00047-of-00050 1-billion-word-language-modeling-benchmark-r13output/README Success! One billion words dataset ready at: data/1-billion-word-language-modeling-benchmark-r13output/ Please pass this dir to single_lm_train.py via the --datadir option. root@870cf85b00ae:/workspace/nvidia-examples/big_lstm# time python single_lm_train.py --mode=train --logdir=./logs --num_gpus=1 --datadir=./data/1-billion-word-language-modeling-benchmark-r13output *****HYPER PARAMETERS***** {'learning_rate': 0.2, 'num_layers': 1, 'state_size': 2048, 'batch_size': 128, 'projected_size': 512, 'vocab_size': 793470, 'max_time': 180, 'do_summaries': False, 'run_profiler': False, 'num_sampled': 8192, 'emb_size': 512, 'num_delayed_steps': 150, 'num_steps': 20, 'average_params': True, 'max_grad_norm': 10.0, 'num_gpus': 1, 'keep_prob': 0.9, 'num_shards': 8, 'optimizer': 0} ************************** WARNING:tensorflow:From /opt/tensorflow/nvidia-examples/big_lstm/model_utils.py:33: UniformUnitScaling.__init__ (from tensorflow.python.ops.init_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.initializers.variance_scaling instead with distribution=uniform to get equivalent behavior. Current time: 1553019912.2779129 ALL VARIABLES WARNING:tensorflow:From /opt/tensorflow/nvidia-examples/big_lstm/run_utils.py:18: all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02. Instructions for updating: Please use tf.global_variables instead. model/emb_0:0 (99184, 512) /gpu:0 model/emb_1:0 (99184, 512) /gpu:0 model/emb_2:0 (99184, 512) /gpu:0 model/emb_3:0 (99184, 512) /gpu:0 model/emb_4:0 (99184, 512) /gpu:0 model/emb_5:0 (99184, 512) /gpu:0 model/emb_6:0 (99184, 512) /gpu:0 model/emb_7:0 (99184, 512) /gpu:0 model/lstm_0/LSTMCell/W_0:0 (1024, 8192) /gpu:0 model/lstm_0/LSTMCell/B:0 (8192,) /gpu:0 model/lstm_0/LSTMCell/W_P_0:0 (2048, 512) /gpu:0 model/softmax_w_0:0 (99184, 512) /gpu:0 model/softmax_w_1:0 (99184, 512) /gpu:0 model/softmax_w_2:0 (99184, 512) /gpu:0 model/softmax_w_3:0 (99184, 512) /gpu:0 model/softmax_w_4:0 (99184, 512) /gpu:0 model/softmax_w_5:0 (99184, 512) /gpu:0 model/softmax_w_6:0 (99184, 512) /gpu:0 model/softmax_w_7:0 (99184, 512) /gpu:0 model/softmax_b:0 (793470,) /gpu:0 model/global_step:0 () model/model/emb_0/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_1/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_2/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_3/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_4/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_5/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_6/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_7/Adagrad:0 (99184, 512) /gpu:0 model/model/lstm_0/LSTMCell/W_0/Adagrad:0 (1024, 8192) /gpu:0 model/model/lstm_0/LSTMCell/B/Adagrad:0 (8192,) /gpu:0 model/model/lstm_0/LSTMCell/W_P_0/Adagrad:0 (2048, 512) /gpu:0 model/model/softmax_w_0/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_1/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_2/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_3/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_4/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_5/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_6/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_7/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_b/Adagrad:0 (793470,) /gpu:0 model/model/lstm_0/LSTMCell/W_0/ExponentialMovingAverage:0 (1024, 8192) /gpu:0 model/model/lstm_0/LSTMCell/B/ExponentialMovingAverage:0 (8192,) /gpu:0 model/model/lstm_0/LSTMCell/W_P_0/ExponentialMovingAverage:0 (2048, 512) /gpu:0 TRAINABLE VARIABLES model/emb_0:0 (99184, 512) /gpu:0 model/emb_1:0 (99184, 512) /gpu:0 model/emb_2:0 (99184, 512) /gpu:0 model/emb_3:0 (99184, 512) /gpu:0 model/emb_4:0 (99184, 512) /gpu:0 model/emb_5:0 (99184, 512) /gpu:0 model/emb_6:0 (99184, 512) /gpu:0 model/emb_7:0 (99184, 512) /gpu:0 model/lstm_0/LSTMCell/W_0:0 (1024, 8192) /gpu:0 model/lstm_0/LSTMCell/B:0 (8192,) /gpu:0 model/lstm_0/LSTMCell/W_P_0:0 (2048, 512) /gpu:0 model/softmax_w_0:0 (99184, 512) /gpu:0 model/softmax_w_1:0 (99184, 512) /gpu:0 model/softmax_w_2:0 (99184, 512) /gpu:0 model/softmax_w_3:0 (99184, 512) /gpu:0 model/softmax_w_4:0 (99184, 512) /gpu:0 model/softmax_w_5:0 (99184, 512) /gpu:0 model/softmax_w_6:0 (99184, 512) /gpu:0 model/softmax_w_7:0 (99184, 512) /gpu:0 model/softmax_b:0 (793470,) /gpu:0 LOCAL VARIABLES model/model/state_0_0:0 (128, 2560) /gpu:0 WARNING:tensorflow:From /opt/tensorflow/nvidia-examples/big_lstm/run_utils.py:32: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.train.MonitoredTrainingSession 2019-03-19 18:25:13.817718: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 0 with properties: name: Tesla V100-PCIE-16GB major: 7 minor: 0 memoryClockRate(GHz): 1.38 pciBusID: 0000:02:00.0 totalMemory: 15.75GiB freeMemory: 15.44GiB 2019-03-19 18:25:13.817797: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 0 2019-03-19 18:25:15.301265: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-03-19 18:25:15.301364: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0 2019-03-19 18:25:15.301382: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 0: N 2019-03-19 18:25:15.303491: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14944 MB memory) -> physical GPU (device: 0, name: Tesla V100-PCIE-16GB, pci bus id: 0000:02:00.0, compute capability: 7.0) Processing file: ./data/1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00086-of-00100 Finished processing! Iteration 1, time = 8.48s, wps = 302, train loss = 12.9763 Iteration 2, time = 3.02s, wps = 847, train loss = 12.9671 Iteration 3, time = 0.13s, wps = 19400, train loss = 12.9710 Iteration 4, time = 0.15s, wps = 17546, train loss = 12.9277 Iteration 5, time = 0.10s, wps = 26488, train loss = 12.7422 Iteration 6, time = 0.11s, wps = 23840, train loss = 14.8289 Iteration 7, time = 0.10s, wps = 25471, train loss = 12.4992 Iteration 8, time = 0.14s, wps = 18912, train loss = 13.5247 Iteration 9, time = 0.15s, wps = 16924, train loss = 12.3287 Iteration 20, time = 1.32s, wps = 21384, train loss = 11.9675 Iteration 40, time = 2.32s, wps = 22102, train loss = 10.7050 Iteration 60, time = 2.30s, wps = 22276, train loss = 9.9531 Iteration 80, time = 2.25s, wps = 22730, train loss = 9.2397 Iteration 100, time = 2.11s, wps = 24232, train loss = 9.2285 Iteration 120, time = 2.00s, wps = 25577, train loss = 8.4085 Iteration 140, time = 2.13s, wps = 24010, train loss = 7.4762 Iteration 160, time = 2.02s, wps = 25402, train loss = 7.2888 Iteration 180, time = 1.97s, wps = 25946, train loss = 6.9740 Iteration 200, time = 1.98s, wps = 25826, train loss = 6.9320 Iteration 220, time = 2.07s, wps = 24755, train loss = 6.9028 Iteration 240, time = 1.96s, wps = 26086, train loss = 6.6835 Iteration 260, time = 1.94s, wps = 26345, train loss = 6.7612 Iteration 280, time = 2.02s, wps = 25298, train loss = 6.4824 Iteration 300, time = 2.02s, wps = 25363, train loss = 6.7071 Iteration 320, time = 2.07s, wps = 24732, train loss = 6.3478 Iteration 340, time = 2.11s, wps = 24297, train loss = 6.3705 Iteration 360, time = 2.16s, wps = 23661, train loss = 6.3252 Iteration 380, time = 1.94s, wps = 26433, train loss = 6.3214 Iteration 400, time = 1.92s, wps = 26614, train loss = 6.1811 Iteration 420, time = 2.07s, wps = 24714, train loss = 6.3317 Iteration 440, time = 2.05s, wps = 25002, train loss = 6.3121 Iteration 460, time = 2.01s, wps = 25416, train loss = 6.2292 Iteration 480, time = 2.03s, wps = 25264, train loss = 6.1926 Iteration 500, time = 1.90s, wps = 26959, train loss = 6.0980 Iteration 520, time = 1.86s, wps = 27465, train loss = 6.1198 Iteration 540, time = 1.98s, wps = 25894, train loss = 5.9989 Iteration 560, time = 2.04s, wps = 25115, train loss = 6.2120 Iteration 580, time = 1.88s, wps = 27269, train loss = 6.1759 Iteration 600, time = 1.88s, wps = 27210, train loss = 6.0263 Iteration 620, time = 2.04s, wps = 25159, train loss = 6.0840 Iteration 640, time = 1.97s, wps = 25956, train loss = 6.0374 Iteration 660, time = 1.93s, wps = 26504, train loss = 5.9034 Iteration 680, time = 2.10s, wps = 24400, train loss = 5.9277 Iteration 700, time = 2.03s, wps = 25210, train loss = 6.0301 Iteration 720, time = 1.95s, wps = 26285, train loss = 5.8357 Iteration 740, time = 1.93s, wps = 26574, train loss = 5.8831 Iteration 760, time = 1.96s, wps = 26109, train loss = 5.8949 Iteration 780, time = 1.87s, wps = 27360, train loss = 5.8523 Iteration 800, time = 1.97s, wps = 26012, train loss = 5.7922 Iteration 820, time = 2.08s, wps = 24588, train loss = 5.7551 Iteration 840, time = 2.04s, wps = 25046, train loss = 6.0713 Iteration 860, time = 2.01s, wps = 25448, train loss = 5.8377 Iteration 880, time = 2.00s, wps = 25607, train loss = 5.8862 Iteration 900, time = 2.05s, wps = 24952, train loss = 5.8857 Iteration 920, time = 2.04s, wps = 25066, train loss = 5.8624 Iteration 940, time = 1.91s, wps = 26842, train loss = 5.8002 Iteration 960, time = 2.00s, wps = 25575, train loss = 5.7316 Iteration 980, time = 2.02s, wps = 25294, train loss = 5.7173 Iteration 1000, time = 1.98s, wps = 25809, train loss = 5.7696 Iteration 1020, time = 2.06s, wps = 24824, train loss = 5.7362 Iteration 1040, time = 1.83s, wps = 27976, train loss = 5.6019 Iteration 1060, time = 2.03s, wps = 25226, train loss = 5.6288 Iteration 1080, time = 2.02s, wps = 25343, train loss = 5.7230 Iteration 1100, time = 2.01s, wps = 25438, train loss = 5.6531 Iteration 1120, time = 1.94s, wps = 26411, train loss = 5.5846 Iteration 1140, time = 2.02s, wps = 25339, train loss = 5.7373 Iteration 1160, time = 2.06s, wps = 24841, train loss = 5.6172 Iteration 1180, time = 1.89s, wps = 27160, train loss = 5.6582 Iteration 1200, time = 1.99s, wps = 25710, train loss = 5.5089 Iteration 1220, time = 1.97s, wps = 26007, train loss = 5.5542 Iteration 1240, time = 1.89s, wps = 27104, train loss = 5.6841 Iteration 1260, time = 2.03s, wps = 25268, train loss = 5.5210 Iteration 1280, time = 2.02s, wps = 25365, train loss = 5.4682 Iteration 1300, time = 1.97s, wps = 25964, train loss = 5.5193 Iteration 1320, time = 2.01s, wps = 25517, train loss = 5.5635 Iteration 1340, time = 2.06s, wps = 24859, train loss = 5.5254 Iteration 1360, time = 2.02s, wps = 25343, train loss = 5.6777 real 3m21.320s user 6m50.512s sys 1m22.780s root@870cf85b00ae:/workspace/nvidia-examples/big_lstm# time python single_lm_train.py --mode=train --logdir=./logs --num_gpus=1 --datadir=./data/1-billion-word-language-modeling-benchmark-r13output *****HYPER PARAMETERS***** {'do_summaries': False, 'projected_size': 512, 'num_shards': 8, 'num_delayed_steps': 150, 'emb_size': 512, 'batch_size': 128, 'num_gpus': 1, 'max_time': 180, 'max_grad_norm': 10.0, 'optimizer': 0, 'average_params': True, 'keep_prob': 0.9, 'state_size': 2048, 'num_layers': 1, 'num_steps': 20, 'num_sampled': 8192, 'learning_rate': 0.2, 'vocab_size': 793470, 'run_profiler': False} ************************** WARNING:tensorflow:From /opt/tensorflow/nvidia-examples/big_lstm/model_utils.py:33: UniformUnitScaling.__init__ (from tensorflow.python.ops.init_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.initializers.variance_scaling instead with distribution=uniform to get equivalent behavior. Current time: 1553020173.3489347 ALL VARIABLES WARNING:tensorflow:From /opt/tensorflow/nvidia-examples/big_lstm/run_utils.py:18: all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02. Instructions for updating: Please use tf.global_variables instead. model/emb_0:0 (99184, 512) /gpu:0 model/emb_1:0 (99184, 512) /gpu:0 model/emb_2:0 (99184, 512) /gpu:0 model/emb_3:0 (99184, 512) /gpu:0 model/emb_4:0 (99184, 512) /gpu:0 model/emb_5:0 (99184, 512) /gpu:0 model/emb_6:0 (99184, 512) /gpu:0 model/emb_7:0 (99184, 512) /gpu:0 model/lstm_0/LSTMCell/W_0:0 (1024, 8192) /gpu:0 model/lstm_0/LSTMCell/B:0 (8192,) /gpu:0 model/lstm_0/LSTMCell/W_P_0:0 (2048, 512) /gpu:0 model/softmax_w_0:0 (99184, 512) /gpu:0 model/softmax_w_1:0 (99184, 512) /gpu:0 model/softmax_w_2:0 (99184, 512) /gpu:0 model/softmax_w_3:0 (99184, 512) /gpu:0 model/softmax_w_4:0 (99184, 512) /gpu:0 model/softmax_w_5:0 (99184, 512) /gpu:0 model/softmax_w_6:0 (99184, 512) /gpu:0 model/softmax_w_7:0 (99184, 512) /gpu:0 model/softmax_b:0 (793470,) /gpu:0 model/global_step:0 () model/model/emb_0/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_1/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_2/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_3/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_4/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_5/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_6/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_7/Adagrad:0 (99184, 512) /gpu:0 model/model/lstm_0/LSTMCell/W_0/Adagrad:0 (1024, 8192) /gpu:0 model/model/lstm_0/LSTMCell/B/Adagrad:0 (8192,) /gpu:0 model/model/lstm_0/LSTMCell/W_P_0/Adagrad:0 (2048, 512) /gpu:0 model/model/softmax_w_0/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_1/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_2/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_3/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_4/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_5/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_6/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_7/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_b/Adagrad:0 (793470,) /gpu:0 model/model/lstm_0/LSTMCell/W_0/ExponentialMovingAverage:0 (1024, 8192) /gpu:0 model/model/lstm_0/LSTMCell/B/ExponentialMovingAverage:0 (8192,) /gpu:0 model/model/lstm_0/LSTMCell/W_P_0/ExponentialMovingAverage:0 (2048, 512) /gpu:0 TRAINABLE VARIABLES model/emb_0:0 (99184, 512) /gpu:0 model/emb_1:0 (99184, 512) /gpu:0 model/emb_2:0 (99184, 512) /gpu:0 model/emb_3:0 (99184, 512) /gpu:0 model/emb_4:0 (99184, 512) /gpu:0 model/emb_5:0 (99184, 512) /gpu:0 model/emb_6:0 (99184, 512) /gpu:0 model/emb_7:0 (99184, 512) /gpu:0 model/lstm_0/LSTMCell/W_0:0 (1024, 8192) /gpu:0 model/lstm_0/LSTMCell/B:0 (8192,) /gpu:0 model/lstm_0/LSTMCell/W_P_0:0 (2048, 512) /gpu:0 model/softmax_w_0:0 (99184, 512) /gpu:0 model/softmax_w_1:0 (99184, 512) /gpu:0 model/softmax_w_2:0 (99184, 512) /gpu:0 model/softmax_w_3:0 (99184, 512) /gpu:0 model/softmax_w_4:0 (99184, 512) /gpu:0 model/softmax_w_5:0 (99184, 512) /gpu:0 model/softmax_w_6:0 (99184, 512) /gpu:0 model/softmax_w_7:0 (99184, 512) /gpu:0 model/softmax_b:0 (793470,) /gpu:0 LOCAL VARIABLES model/model/state_0_0:0 (128, 2560) /gpu:0 WARNING:tensorflow:From /opt/tensorflow/nvidia-examples/big_lstm/run_utils.py:32: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.train.MonitoredTrainingSession 2019-03-19 18:29:34.786804: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 0 with properties: name: Tesla V100-PCIE-16GB major: 7 minor: 0 memoryClockRate(GHz): 1.38 pciBusID: 0000:02:00.0 totalMemory: 15.75GiB freeMemory: 15.44GiB 2019-03-19 18:29:34.786857: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 0 2019-03-19 18:29:35.219176: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-03-19 18:29:35.219233: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0 2019-03-19 18:29:35.219244: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 0: N 2019-03-19 18:29:35.220029: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14944 MB memory) -> physical GPU (device: 0, name: Tesla V100-PCIE-16GB, pci bus id: 0000:02:00.0, compute capability: 7.0) Processing file: ./data/1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00081-of-00100 Finished processing! Iteration 1362, time = 6.20s, wps = 413, train loss = 6.2350 Iteration 1363, time = 3.95s, wps = 649, train loss = 5.5481 Iteration 1364, time = 0.16s, wps = 16324, train loss = 5.5337 Iteration 1365, time = 0.12s, wps = 21030, train loss = 5.7579 Iteration 1366, time = 0.11s, wps = 23703, train loss = 5.6191 Iteration 1367, time = 0.11s, wps = 22715, train loss = 5.5767 Iteration 1368, time = 0.12s, wps = 21135, train loss = 5.6066 Iteration 1369, time = 0.10s, wps = 25241, train loss = 5.4979 Iteration 1370, time = 0.11s, wps = 23082, train loss = 5.4529 Iteration 1381, time = 1.25s, wps = 22470, train loss = 5.6198 Iteration 1401, time = 2.32s, wps = 22105, train loss = 5.4135 Iteration 1421, time = 2.49s, wps = 20551, train loss = 5.5543 Iteration 1441, time = 2.25s, wps = 22790, train loss = 5.4535 Iteration 1461, time = 2.34s, wps = 21907, train loss = 5.5261 Iteration 1481, time = 2.37s, wps = 21633, train loss = 5.4089 Iteration 1501, time = 2.23s, wps = 22964, train loss = 5.4118 Iteration 1521, time = 2.23s, wps = 23003, train loss = 5.3828 Iteration 1541, time = 2.33s, wps = 21948, train loss = 5.3881 Iteration 1561, time = 2.30s, wps = 22255, train loss = 5.3796 Iteration 1581, time = 2.20s, wps = 23235, train loss = 5.4765 Iteration 1601, time = 2.25s, wps = 22708, train loss = 5.3870 Iteration 1621, time = 2.21s, wps = 23143, train loss = 5.3940 Iteration 1641, time = 2.25s, wps = 22727, train loss = 5.3696 Iteration 1661, time = 2.25s, wps = 22720, train loss = 5.2764 Iteration 1681, time = 2.24s, wps = 22827, train loss = 5.3303 Iteration 1701, time = 2.19s, wps = 23375, train loss = 5.3006 Iteration 1721, time = 2.32s, wps = 22094, train loss = 5.3673 Iteration 1741, time = 2.21s, wps = 23155, train loss = 5.4786 Iteration 1761, time = 2.25s, wps = 22780, train loss = 5.2976 Iteration 1781, time = 2.25s, wps = 22774, train loss = 5.1870 Iteration 1801, time = 2.18s, wps = 23482, train loss = 5.4369 Iteration 1821, time = 2.13s, wps = 24029, train loss = 5.3331 Iteration 1841, time = 2.26s, wps = 22639, train loss = 5.4620 Iteration 1861, time = 2.32s, wps = 22030, train loss = 5.3807 Iteration 1881, time = 2.27s, wps = 22563, train loss = 5.3897 Iteration 1901, time = 2.20s, wps = 23224, train loss = 5.2174 Iteration 1921, time = 2.29s, wps = 22394, train loss = 5.3190 Iteration 1941, time = 2.25s, wps = 22772, train loss = 5.2187 Iteration 1961, time = 2.24s, wps = 22829, train loss = 5.2195 Iteration 1981, time = 2.28s, wps = 22414, train loss = 5.2504 Iteration 2001, time = 2.26s, wps = 22691, train loss = 5.2044 Iteration 2021, time = 2.27s, wps = 22511, train loss = 5.2902 Iteration 2041, time = 2.23s, wps = 22972, train loss = 5.4509 Iteration 2061, time = 2.19s, wps = 23328, train loss = 5.2262 Iteration 2081, time = 2.21s, wps = 23200, train loss = 5.2927 Iteration 2101, time = 2.20s, wps = 23238, train loss = 5.3520 Iteration 2121, time = 2.24s, wps = 22907, train loss = 5.2722 Iteration 2141, time = 2.14s, wps = 23956, train loss = 5.1386 Iteration 2161, time = 2.28s, wps = 22408, train loss = 5.2628 Iteration 2181, time = 2.39s, wps = 21396, train loss = 5.3091 Iteration 2201, time = 2.14s, wps = 23935, train loss = 5.0901 Iteration 2221, time = 2.21s, wps = 23198, train loss = 5.2012 Iteration 2241, time = 2.17s, wps = 23567, train loss = 5.1349 Iteration 2261, time = 2.19s, wps = 23428, train loss = 5.1331 Iteration 2281, time = 2.31s, wps = 22183, train loss = 5.0867 Iteration 2301, time = 2.28s, wps = 22418, train loss = 5.1376 Iteration 2321, time = 2.26s, wps = 22638, train loss = 5.1759 Iteration 2341, time = 2.22s, wps = 23089, train loss = 5.0826 Iteration 2361, time = 2.21s, wps = 23139, train loss = 5.2431 Iteration 2381, time = 2.29s, wps = 22318, train loss = 5.1956 Iteration 2401, time = 2.26s, wps = 22703, train loss = 5.1086 Iteration 2421, time = 2.34s, wps = 21863, train loss = 5.1816 Iteration 2441, time = 2.37s, wps = 21582, train loss = 5.2082 Iteration 2461, time = 2.25s, wps = 22732, train loss = 5.1111 Iteration 2481, time = 2.26s, wps = 22672, train loss = 5.2158 Iteration 2501, time = 2.33s, wps = 21978, train loss = 5.2114 Iteration 2521, time = 2.20s, wps = 23251, train loss = 5.0143 Iteration 2541, time = 2.42s, wps = 21183, train loss = 5.2782 Iteration 2561, time = 2.18s, wps = 23463, train loss = 5.1917 Iteration 2581, time = 2.19s, wps = 23331, train loss = 5.2340 Iteration 2601, time = 2.18s, wps = 23509, train loss = 5.2228 Iteration 2621, time = 2.20s, wps = 23315, train loss = 5.1107 Iteration 2641, time = 2.19s, wps = 23334, train loss = 5.1778 Iteration 2661, time = 2.34s, wps = 21853, train loss = 5.2136 Iteration 2681, time = 2.14s, wps = 23965, train loss = 5.0715 Iteration 2701, time = 2.24s, wps = 22849, train loss = 5.2489 Iteration 2721, time = 2.21s, wps = 23142, train loss = 5.0951 Iteration 2741, time = 2.17s, wps = 23603, train loss = 5.0645 Iteration 2761, time = 2.21s, wps = 23151, train loss = 5.0896 real 3m18.087s user 7m38.620s sys 1m11.164s root@870cf85b00ae:/workspace/nvidia-examples/big_lstm# time python single_lm_train.py --mode=train --logdir=./logs --num_gpus=1 --datadir=./data/1-billion-word-language-modeling-benchmark-r13output *****HYPER PARAMETERS***** {'num_shards': 8, 'learning_rate': 0.2, 'optimizer': 0, 'max_time': 180, 'num_delayed_steps': 150, 'state_size': 2048, 'vocab_size': 793470, 'num_steps': 20, 'average_params': True, 'max_grad_norm': 10.0, 'run_profiler': False, 'emb_size': 512, 'do_summaries': False, 'num_sampled': 8192, 'num_gpus': 1, 'keep_prob': 0.9, 'projected_size': 512, 'batch_size': 128, 'num_layers': 1} ************************** WARNING:tensorflow:From /opt/tensorflow/nvidia-examples/big_lstm/model_utils.py:33: UniformUnitScaling.__init__ (from tensorflow.python.ops.init_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.initializers.variance_scaling instead with distribution=uniform to get equivalent behavior. Current time: 1553020981.7967772 ALL VARIABLES WARNING:tensorflow:From /opt/tensorflow/nvidia-examples/big_lstm/run_utils.py:18: all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02. Instructions for updating: Please use tf.global_variables instead. model/emb_0:0 (99184, 512) /gpu:0 model/emb_1:0 (99184, 512) /gpu:0 model/emb_2:0 (99184, 512) /gpu:0 model/emb_3:0 (99184, 512) /gpu:0 model/emb_4:0 (99184, 512) /gpu:0 model/emb_5:0 (99184, 512) /gpu:0 model/emb_6:0 (99184, 512) /gpu:0 model/emb_7:0 (99184, 512) /gpu:0 model/lstm_0/LSTMCell/W_0:0 (1024, 8192) /gpu:0 model/lstm_0/LSTMCell/B:0 (8192,) /gpu:0 model/lstm_0/LSTMCell/W_P_0:0 (2048, 512) /gpu:0 model/softmax_w_0:0 (99184, 512) /gpu:0 model/softmax_w_1:0 (99184, 512) /gpu:0 model/softmax_w_2:0 (99184, 512) /gpu:0 model/softmax_w_3:0 (99184, 512) /gpu:0 model/softmax_w_4:0 (99184, 512) /gpu:0 model/softmax_w_5:0 (99184, 512) /gpu:0 model/softmax_w_6:0 (99184, 512) /gpu:0 model/softmax_w_7:0 (99184, 512) /gpu:0 model/softmax_b:0 (793470,) /gpu:0 model/global_step:0 () model/model/emb_0/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_1/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_2/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_3/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_4/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_5/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_6/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_7/Adagrad:0 (99184, 512) /gpu:0 model/model/lstm_0/LSTMCell/W_0/Adagrad:0 (1024, 8192) /gpu:0 model/model/lstm_0/LSTMCell/B/Adagrad:0 (8192,) /gpu:0 model/model/lstm_0/LSTMCell/W_P_0/Adagrad:0 (2048, 512) /gpu:0 model/model/softmax_w_0/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_1/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_2/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_3/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_4/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_5/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_6/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_7/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_b/Adagrad:0 (793470,) /gpu:0 model/model/lstm_0/LSTMCell/W_0/ExponentialMovingAverage:0 (1024, 8192) /gpu:0 model/model/lstm_0/LSTMCell/B/ExponentialMovingAverage:0 (8192,) /gpu:0 model/model/lstm_0/LSTMCell/W_P_0/ExponentialMovingAverage:0 (2048, 512) /gpu:0 TRAINABLE VARIABLES model/emb_0:0 (99184, 512) /gpu:0 model/emb_1:0 (99184, 512) /gpu:0 model/emb_2:0 (99184, 512) /gpu:0 model/emb_3:0 (99184, 512) /gpu:0 model/emb_4:0 (99184, 512) /gpu:0 model/emb_5:0 (99184, 512) /gpu:0 model/emb_6:0 (99184, 512) /gpu:0 model/emb_7:0 (99184, 512) /gpu:0 model/lstm_0/LSTMCell/W_0:0 (1024, 8192) /gpu:0 model/lstm_0/LSTMCell/B:0 (8192,) /gpu:0 model/lstm_0/LSTMCell/W_P_0:0 (2048, 512) /gpu:0 model/softmax_w_0:0 (99184, 512) /gpu:0 model/softmax_w_1:0 (99184, 512) /gpu:0 model/softmax_w_2:0 (99184, 512) /gpu:0 model/softmax_w_3:0 (99184, 512) /gpu:0 model/softmax_w_4:0 (99184, 512) /gpu:0 model/softmax_w_5:0 (99184, 512) /gpu:0 model/softmax_w_6:0 (99184, 512) /gpu:0 model/softmax_w_7:0 (99184, 512) /gpu:0 model/softmax_b:0 (793470,) /gpu:0 LOCAL VARIABLES model/model/state_0_0:0 (128, 2560) /gpu:0 WARNING:tensorflow:From /opt/tensorflow/nvidia-examples/big_lstm/run_utils.py:32: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.train.MonitoredTrainingSession 2019-03-19 18:43:03.284683: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 0 with properties: name: Tesla V100-PCIE-16GB major: 7 minor: 0 memoryClockRate(GHz): 1.38 pciBusID: 0000:02:00.0 totalMemory: 15.75GiB freeMemory: 15.44GiB 2019-03-19 18:43:03.284767: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 0 2019-03-19 18:43:03.739611: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-03-19 18:43:03.739670: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0 2019-03-19 18:43:03.739679: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 0: N 2019-03-19 18:43:03.740452: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14944 MB memory) -> physical GPU (device: 0, name: Tesla V100-PCIE-16GB, pci bus id: 0000:02:00.0, compute capability: 7.0) Processing file: ./data/1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00032-of-00100 Finished processing! Iteration 2770, time = 6.56s, wps = 390, train loss = 5.5948 Iteration 2771, time = 4.35s, wps = 588, train loss = 5.0823 Iteration 2772, time = 0.13s, wps = 19848, train loss = 5.1371 Iteration 2773, time = 0.11s, wps = 23405, train loss = 5.0828 Iteration 2774, time = 0.10s, wps = 25349, train loss = 5.2435 Iteration 2775, time = 0.17s, wps = 15188, train loss = 5.1593 Iteration 2776, time = 0.10s, wps = 26364, train loss = 5.1481 Iteration 2777, time = 0.11s, wps = 22762, train loss = 5.1168 Iteration 2778, time = 0.10s, wps = 24511, train loss = 5.1631 Iteration 2789, time = 1.21s, wps = 23308, train loss = 4.9713 Iteration 2809, time = 2.36s, wps = 21705, train loss = 5.1434 Iteration 2829, time = 2.28s, wps = 22489, train loss = 5.0395 Iteration 2849, time = 2.31s, wps = 22148, train loss = 5.0984 Iteration 2869, time = 2.20s, wps = 23315, train loss = 5.0897 Iteration 2889, time = 2.37s, wps = 21581, train loss = 5.0006 Iteration 2909, time = 2.19s, wps = 23351, train loss = 5.0918 Iteration 2929, time = 2.25s, wps = 22769, train loss = 5.0587 Iteration 2949, time = 2.22s, wps = 23015, train loss = 5.1586 Iteration 2969, time = 2.30s, wps = 22287, train loss = 4.9636 Iteration 2989, time = 2.30s, wps = 22290, train loss = 4.9810 Iteration 3009, time = 2.32s, wps = 22060, train loss = 5.0801 Iteration 3029, time = 2.24s, wps = 22884, train loss = 5.0730 Iteration 3049, time = 2.19s, wps = 23415, train loss = 5.0786 Iteration 3069, time = 2.24s, wps = 22877, train loss = 4.9007 Iteration 3089, time = 2.33s, wps = 21965, train loss = 5.1625 Iteration 3109, time = 2.24s, wps = 22812, train loss = 5.0389 Iteration 3129, time = 2.35s, wps = 21808, train loss = 4.9490 Iteration 3149, time = 2.40s, wps = 21338, train loss = 5.0371 Iteration 3169, time = 2.29s, wps = 22340, train loss = 4.9576 Iteration 3189, time = 2.24s, wps = 22896, train loss = 5.1494 Iteration 3209, time = 2.28s, wps = 22502, train loss = 5.0900 Iteration 3229, time = 2.18s, wps = 23486, train loss = 5.1559 Iteration 3249, time = 2.28s, wps = 22446, train loss = 5.1341 Iteration 3269, time = 2.18s, wps = 23497, train loss = 5.0340 Iteration 3289, time = 2.22s, wps = 23105, train loss = 5.0093 Iteration 3309, time = 2.28s, wps = 22442, train loss = 4.9637 Iteration 3329, time = 2.40s, wps = 21292, train loss = 5.0279 Iteration 3349, time = 2.24s, wps = 22866, train loss = 5.0317 Iteration 3369, time = 2.28s, wps = 22479, train loss = 5.1923 Iteration 3389, time = 2.27s, wps = 22516, train loss = 5.0870 Iteration 3409, time = 2.31s, wps = 22187, train loss = 4.9634 Iteration 3429, time = 2.42s, wps = 21140, train loss = 4.9188 Iteration 3449, time = 2.31s, wps = 22162, train loss = 5.0517 Iteration 3469, time = 2.42s, wps = 21157, train loss = 4.9004 Iteration 3489, time = 2.23s, wps = 22947, train loss = 4.9519 Iteration 3509, time = 2.22s, wps = 23056, train loss = 5.0505 Iteration 3529, time = 2.19s, wps = 23401, train loss = 4.9921 Iteration 3549, time = 2.28s, wps = 22446, train loss = 4.9737 Iteration 3569, time = 2.24s, wps = 22821, train loss = 4.9682 Iteration 3589, time = 2.22s, wps = 23027, train loss = 4.8610 Iteration 3609, time = 2.15s, wps = 23809, train loss = 5.0442 Iteration 3629, time = 2.19s, wps = 23329, train loss = 4.8797 Iteration 3649, time = 2.23s, wps = 23001, train loss = 4.8267 Iteration 3669, time = 2.32s, wps = 22058, train loss = 4.9466 Iteration 3689, time = 2.34s, wps = 21927, train loss = 4.9434 Iteration 3709, time = 2.30s, wps = 22242, train loss = 5.0254 Iteration 3729, time = 2.21s, wps = 23122, train loss = 4.9338 Iteration 3749, time = 2.43s, wps = 21064, train loss = 4.9714 Iteration 3769, time = 2.36s, wps = 21685, train loss = 4.9036 Iteration 3789, time = 2.42s, wps = 21164, train loss = 4.9453 Iteration 3809, time = 2.27s, wps = 22597, train loss = 4.9611 Iteration 3829, time = 2.26s, wps = 22681, train loss = 4.8766 Iteration 3849, time = 2.23s, wps = 23001, train loss = 5.0155 Iteration 3869, time = 2.41s, wps = 21217, train loss = 4.9775 Iteration 3889, time = 2.36s, wps = 21680, train loss = 4.9087 Iteration 3909, time = 2.20s, wps = 23301, train loss = 4.8473 Iteration 3929, time = 2.36s, wps = 21720, train loss = 4.9163 Iteration 3949, time = 2.30s, wps = 22241, train loss = 4.8683 Iteration 3969, time = 2.31s, wps = 22145, train loss = 4.8345 Iteration 3989, time = 2.20s, wps = 23241, train loss = 5.0844 Iteration 4009, time = 2.21s, wps = 23150, train loss = 4.9201 Iteration 4029, time = 2.21s, wps = 23126, train loss = 4.9395 Iteration 4049, time = 2.27s, wps = 22556, train loss = 4.8643 Iteration 4069, time = 2.35s, wps = 21789, train loss = 4.9246 Iteration 4089, time = 2.31s, wps = 22133, train loss = 4.9030 Iteration 4109, time = 2.17s, wps = 23573, train loss = 4.8557 Iteration 4129, time = 2.19s, wps = 23354, train loss = 4.8338 Iteration 4149, time = 2.20s, wps = 23241, train loss = 4.8710 real 3m18.060s user 7m20.656s sys 1m12.896s root@870cf85b00ae:/workspace/nvidia-examples/big_lstm# # cat /etc/os-release NAME="Ubuntu" VERSION="16.04.5 LTS (Xenial Xerus)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 16.04.5 LTS" VERSION_ID="16.04" HOME_URL="http://www.ubuntu.com/" SUPPORT_URL="http://help.ubuntu.com/" BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/" VERSION_CODENAME=xenial UBUNTU_CODENAME=xenial root@0208ae644042:/workspace/nvidia-examples/big_lstm# nvcc -V nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2018 NVIDIA Corporation Built on Sat_Aug_25_21:08:01_CDT_2018 Cuda compilation tools, release 10.0, V10.0.130 root@0208ae644042:/workspace/nvidia-examples/big_lstm# exit exit ubuntu@ubuntu:~$ cat /etc/os-release NAME="Ubuntu" VERSION="16.04.6 LTS (Xenial Xerus)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 16.04.6 LTS" VERSION_ID="16.04" HOME_URL="http://www.ubuntu.com/" SUPPORT_URL="http://help.ubuntu.com/" BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/" VERSION_CODENAME=xenial UBUNTU_CODENAME=xenial ubuntu@ubuntu:~$ nvcc -V nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2019 NVIDIA Corporation Built on Fri_Feb__8_19:08:17_PST_2019 Cuda compilation tools, release 10.1, V10.1.105 ubuntu@ubuntu:~$