[chibi@manjaro ~]$ sudo nvidia-docker run --rm -ti nvcr.io/nvidia/tensorflow:19.0 4-py3 [sudo] chibi のパスワード: ================ == TensorFlow == ================ NVIDIA Release 19.04 (build 6132408) TensorFlow Version 1.13.1 Container image Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved. Copyright 2017-2019 The TensorFlow Authors. All rights reserved. Various files include modifications (c) NVIDIA CORPORATION. All rights reserved. NVIDIA modifications are covered by the license terms that apply to the underlying project or file. NOTE: MOFED driver for multi-node communication was not detected. Multi-node communication performance may be reduced. NOTE: The SHMEM allocation limit is set to the default of 64MB. This may be insufficient for TensorFlow. NVIDIA recommends the use of the following flags: nvidia-docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 ... root@33a7548b3728:/workspace# ls README.md docker-examples nvidia-examples root@33a7548b3728:/workspace# cd nvidia-example bash: cd: nvidia-example: No such file or directory root@33a7548b3728:/workspace# cd nvidia-examples root@33a7548b3728:/workspace/nvidia-examples# ls NCF bert cnn ssdv1.2 OpenSeq2Seq big_lstm gnmt_v2 tensorrt UNet_Industrial build_imagenet_data resnet50v1.5 root@33a7548b3728:/workspace/nvidia-examples# cd big_lstm root@33a7548b3728:/workspace/nvidia-examples/big_lstm# ls 1b_word_vocab.txt data_utils_test.py language_model_test.py README.md download_1b_words_data.sh model_utils.py __init__.py hparams.py run_utils.py common.py hparams_test.py single_lm_train.py data_utils.py language_model.py testdata root@33a7548b3728:/workspace/nvidia-examples/big_lstm# ./download_1b_words_data.sh Please specify root of dataset directory: data Success: dataset root dir validated --2019-04-28 00:26:48-- http://www.statmt.org/lm-benchmark/1-billion-word-language-modeling-benchmark-r13output.tar.gz Resolving www.statmt.org (www.statmt.org)... 129.215.197.184 Connecting to www.statmt.org (www.statmt.org)|129.215.197.184|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 1792209805 (1.7G) [application/x-gzip] Saving to: ‘1-billion-word-language-modeling-benchmark-r13output.tar.gz’ 1-billion-word-langu 100%[===================>] 1.67G 2.30MB/s in 10m 47s 2019-04-28 00:37:36 (2.64 MB/s) - ‘1-billion-word-language-modeling-benchmark-r13output.tar.gz’ saved [1792209805/1792209805] 1-billion-word-language-modeling-benchmark-r13output/ 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/ 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00024-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00057-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00055-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00096-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00081-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00033-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00072-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00082-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00018-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00008-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00059-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00005-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00091-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00062-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00031-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00095-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00076-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00006-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00038-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00015-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00087-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00021-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00049-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00009-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00027-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00056-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00046-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00032-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00029-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00088-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00085-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00011-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00012-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00067-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00003-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00093-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00050-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00053-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00044-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00019-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00066-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00028-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00045-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00039-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00071-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00052-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00078-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00037-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00002-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00014-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00048-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00017-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00004-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00077-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00080-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00020-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00051-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00016-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00079-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00043-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00068-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00099-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00064-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00034-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00054-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00040-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00070-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00063-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00041-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00083-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00061-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00073-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00094-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00030-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00060-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00035-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00023-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00042-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00025-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00090-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00089-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00065-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00075-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00022-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00026-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00098-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00084-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00010-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00069-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00013-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00092-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00036-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00097-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00007-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00074-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00001-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00047-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00086-of-00100 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00058-of-00100 1-billion-word-language-modeling-benchmark-r13output/.svn/ 1-billion-word-language-modeling-benchmark-r13output/.svn/tmp/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/de/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/de/de102cd0c91cd19e6612f0840e68a2f20ba8134c.svn-base 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/de/deed1b75d3bd5cc36ae6aeb85d56680b892b7948.svn-base 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/86/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/86/86c58db52fbf362c5bc329afc33b8805085fcb0d.svn-base 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/9f/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/9f/9f2882e21f860a83ad6ea8898ebab140974ed301.svn-base 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/bc/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/bc/bcdbc523ee7488dc438cab869b6d5e236578dbfa.svn-base 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/d2/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/d2/d2718bc26d0ee0a213d7d4add99a304cb5b39ede.svn-base 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/c5/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/c5/c5b24f61479da923123d0394a188da922ea0359c.svn-base 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/11/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/11/116d6ea61730d8199127596b072e981338597779.svn-base 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/b0/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/b0/b0e26559cfe641245584a9400b35ba28d64f1411.svn-base 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/d3/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/d3/d3ae508e3bcb0e696dd70aecd052410f1f7afc1d.svn-base 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/9e/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/9e/9e148bd766e8805e0eb97eeae250433ec7a2e996.svn-base 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/31/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/31/31b645a482e0b81fda3c567cada307c6fcf7ec80.svn-base 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/da/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/da/da39a3ee5e6b4b0d3255bfef95601890afd80709.svn-base 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/c1/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/c1/c1ed42c415ec884e591fb5c70d373da640a383b5.svn-base 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/e3/ 1-billion-word-language-modeling-benchmark-r13output/.svn/pristine/e3/e37ba0f85e94073ccaced1eed7e4f5d737a25f49.svn-base 1-billion-word-language-modeling-benchmark-r13output/.svn/entries 1-billion-word-language-modeling-benchmark-r13output/.svn/format 1-billion-word-language-modeling-benchmark-r13output/.svn/wc.db 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/ 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00015-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00031-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00027-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00010-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00033-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00042-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00046-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00037-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00000-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00029-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00013-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00002-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00048-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00006-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00030-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00025-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00039-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00008-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00020-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00001-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00034-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00044-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00045-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00016-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00004-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00035-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00038-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00009-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00024-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00022-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00021-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00032-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00011-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00049-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00041-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00019-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00023-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00040-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00014-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00007-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00017-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00012-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00018-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00003-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00028-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en-00000-of-00100 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00043-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00005-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00036-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00026-of-00050 1-billion-word-language-modeling-benchmark-r13output/heldout-monolingual.tokenized.shuffled/news.en.heldout-00047-of-00050 1-billion-word-language-modeling-benchmark-r13output/README Success! One billion words dataset ready at: data/1-billion-word-language-modeling-benchmark-r13output/ Please pass this dir to single_lm_train.py via the --datadir option. root@33a7548b3728:/workspace/nvidia-examples/big_lstm# time python single_lm_train.py --mode=train --logdir=./logs --num_gpus=2 --datadir=./data/1-billion-word-language-modeling-benchmark-r13output WARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0. For more information, please see: * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md * https://github.com/tensorflow/addons If you depend on functionality not listed there, please file an issue. *****HYPER PARAMETERS***** {'max_time': 180, 'average_params': True, 'num_steps': 20, 'batch_size': 128, 'num_gpus': 2, 'state_size': 2048, 'keep_prob': 0.9, 'num_layers': 1, 'learning_rate': 0.2, 'emb_size': 512, 'num_delayed_steps': 150, 'max_grad_norm': 10.0, 'run_profiler': False, 'projected_size': 512, 'num_shards': 8, 'optimizer': 0, 'num_sampled': 8192, 'do_summaries': False, 'vocab_size': 793470} ************************** WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. WARNING:tensorflow:From /opt/tensorflow/nvidia-examples/big_lstm/model_utils.py:33: UniformUnitScaling.__init__ (from tensorflow.python.ops.init_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.initializers.variance_scaling instead with distribution=uniform to get equivalent behavior. WARNING:tensorflow:From /opt/tensorflow/nvidia-examples/big_lstm/language_model.py:75: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version. Instructions for updating: Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`. WARNING:tensorflow:From /opt/tensorflow/nvidia-examples/big_lstm/language_model.py:107: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/nn_impl.py:1444: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version. Instructions for updating: Create a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead. WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/array_grad.py:425: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. Current time: 1556411925.423171 ALL VARIABLES WARNING:tensorflow:From /opt/tensorflow/nvidia-examples/big_lstm/run_utils.py:18: all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02. Instructions for updating: Please use tf.global_variables instead. model/emb_0:0 (99184, 512) /gpu:0 model/emb_1:0 (99184, 512) /gpu:0 model/emb_2:0 (99184, 512) /gpu:0 model/emb_3:0 (99184, 512) /gpu:0 model/emb_4:0 (99184, 512) /gpu:0 model/emb_5:0 (99184, 512) /gpu:0 model/emb_6:0 (99184, 512) /gpu:0 model/emb_7:0 (99184, 512) /gpu:0 model/lstm_0/LSTMCell/W_0:0 (1024, 8192) /gpu:0 model/lstm_0/LSTMCell/B:0 (8192,) /gpu:0 model/lstm_0/LSTMCell/W_P_0:0 (2048, 512) /gpu:0 model/softmax_w_0:0 (99184, 512) /gpu:0 model/softmax_w_1:0 (99184, 512) /gpu:0 model/softmax_w_2:0 (99184, 512) /gpu:0 model/softmax_w_3:0 (99184, 512) /gpu:0 model/softmax_w_4:0 (99184, 512) /gpu:0 model/softmax_w_5:0 (99184, 512) /gpu:0 model/softmax_w_6:0 (99184, 512) /gpu:0 model/softmax_w_7:0 (99184, 512) /gpu:0 model/softmax_b:0 (793470,) /gpu:0 model/global_step:0 () model/model/emb_0/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_1/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_2/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_3/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_4/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_5/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_6/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_7/Adagrad:0 (99184, 512) /gpu:0 model/model/lstm_0/LSTMCell/W_0/Adagrad:0 (1024, 8192) /gpu:0 model/model/lstm_0/LSTMCell/B/Adagrad:0 (8192,) /gpu:0 model/model/lstm_0/LSTMCell/W_P_0/Adagrad:0 (2048, 512) /gpu:0 model/model/softmax_w_0/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_1/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_2/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_3/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_4/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_5/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_6/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_7/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_b/Adagrad:0 (793470,) /gpu:0 model/model/lstm_0/LSTMCell/W_0/ExponentialMovingAverage:0 (1024, 8192) /gpu:0 model/model/lstm_0/LSTMCell/B/ExponentialMovingAverage:0 (8192,) /gpu:0 model/model/lstm_0/LSTMCell/W_P_0/ExponentialMovingAverage:0 (2048, 512) /gpu:0 TRAINABLE VARIABLES model/emb_0:0 (99184, 512) /gpu:0 model/emb_1:0 (99184, 512) /gpu:0 model/emb_2:0 (99184, 512) /gpu:0 model/emb_3:0 (99184, 512) /gpu:0 model/emb_4:0 (99184, 512) /gpu:0 model/emb_5:0 (99184, 512) /gpu:0 model/emb_6:0 (99184, 512) /gpu:0 model/emb_7:0 (99184, 512) /gpu:0 model/lstm_0/LSTMCell/W_0:0 (1024, 8192) /gpu:0 model/lstm_0/LSTMCell/B:0 (8192,) /gpu:0 model/lstm_0/LSTMCell/W_P_0:0 (2048, 512) /gpu:0 model/softmax_w_0:0 (99184, 512) /gpu:0 model/softmax_w_1:0 (99184, 512) /gpu:0 model/softmax_w_2:0 (99184, 512) /gpu:0 model/softmax_w_3:0 (99184, 512) /gpu:0 model/softmax_w_4:0 (99184, 512) /gpu:0 model/softmax_w_5:0 (99184, 512) /gpu:0 model/softmax_w_6:0 (99184, 512) /gpu:0 model/softmax_w_7:0 (99184, 512) /gpu:0 model/softmax_b:0 (793470,) /gpu:0 LOCAL VARIABLES model/model/state_0_0:0 (128, 2560) /gpu:0 model/model_1/state_1_0:0 (128, 2560) /gpu:1 WARNING:tensorflow:From /opt/tensorflow/nvidia-examples/big_lstm/run_utils.py:32: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.train.MonitoredTrainingSession 2019-04-28 00:38:45.992117: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2999735000 Hz 2019-04-28 00:38:45.993715: I tensorflow/compiler/xla/service/service.cc:161] XLA service 0x81ae5b0 executing computations on platform Host. Devices: 2019-04-28 00:38:45.993759: I tensorflow/compiler/xla/service/service.cc:168] StreamExecutor device (0): , 2019-04-28 00:38:46.306699: I tensorflow/compiler/xla/service/service.cc:161] XLA service 0x81adfd0 executing computations on platform CUDA. Devices: 2019-04-28 00:38:46.306742: I tensorflow/compiler/xla/service/service.cc:168] StreamExecutor device (0): GeForce RTX 2080 Ti, Compute Capability 7.5 2019-04-28 00:38:46.306755: I tensorflow/compiler/xla/service/service.cc:168] StreamExecutor device (1): GeForce RTX 2080 Ti, Compute Capability 7.5 2019-04-28 00:38:46.307783: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.635 pciBusID: 0000:05:00.0 totalMemory: 10.73GiB freeMemory: 10.57GiB 2019-04-28 00:38:46.308600: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 1 with properties: name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.635 pciBusID: 0000:09:00.0 totalMemory: 10.73GiB freeMemory: 10.53GiB 2019-04-28 00:38:46.308652: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0, 1 2019-04-28 00:38:47.217960: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-04-28 00:38:47.217999: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 1 2019-04-28 00:38:47.218014: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N Y 2019-04-28 00:38:47.218020: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 1: Y N 2019-04-28 00:38:47.219033: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10196 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:05:00.0, compute capability: 7.5) 2019-04-28 00:38:47.219479: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 10157 MB memory) -> physical GPU (device: 1, name: GeForce RTX 2080 Ti, pci bus id: 0000:09:00.0, compute capability: 7.5) Processing file: ./data/1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00078-of-00100 Finished processing! 2019-04-28 00:39:19.915304: I tensorflow/stream_executor/dso_loader.cc:153] successfully opened CUDA library libcublas.so.10 locally Iteration 1, time = 13.79s, wps = 371, train loss = 12.9597 Iteration 2, time = 9.76s, wps = 525, train loss = 12.9444 Iteration 3, time = 0.08s, wps = 61061, train loss = 12.8988 Iteration 4, time = 0.08s, wps = 60284, train loss = 12.7064 Iteration 5, time = 0.08s, wps = 61174, train loss = 34.5448 Iteration 6, time = 0.08s, wps = 65949, train loss = 36.3255 Iteration 7, time = 0.09s, wps = 55255, train loss = 15.2265 Iteration 8, time = 0.08s, wps = 61904, train loss = 13.3023 Iteration 9, time = 0.08s, wps = 64051, train loss = 13.1975 Iteration 20, time = 0.86s, wps = 65479, train loss = 32.8259 Iteration 40, time = 1.61s, wps = 63772, train loss = 8.9011 Iteration 60, time = 1.61s, wps = 63508, train loss = 9.2358 Iteration 80, time = 1.60s, wps = 64158, train loss = 9.8567 Iteration 100, time = 1.64s, wps = 62415, train loss = 8.2404 Iteration 120, time = 1.63s, wps = 62863, train loss = 7.8700 Iteration 140, time = 1.64s, wps = 62524, train loss = 7.6383 Iteration 160, time = 1.61s, wps = 63743, train loss = 7.1656 Iteration 180, time = 1.65s, wps = 62053, train loss = 6.9706 Iteration 200, time = 1.60s, wps = 64087, train loss = 6.7891 Iteration 220, time = 1.62s, wps = 63094, train loss = 6.6002 Iteration 240, time = 1.61s, wps = 63647, train loss = 6.4411 Iteration 260, time = 1.61s, wps = 63630, train loss = 6.4339 Iteration 280, time = 1.65s, wps = 62014, train loss = 6.4271 Iteration 300, time = 1.60s, wps = 63903, train loss = 6.2785 Iteration 320, time = 1.60s, wps = 63815, train loss = 6.3221 Iteration 340, time = 1.61s, wps = 63732, train loss = 6.1421 Iteration 360, time = 1.61s, wps = 63487, train loss = 6.0584 Iteration 380, time = 1.60s, wps = 63917, train loss = 6.1164 Iteration 400, time = 1.59s, wps = 64288, train loss = 6.1214 Iteration 420, time = 1.60s, wps = 63885, train loss = 6.0470 Iteration 440, time = 1.59s, wps = 64203, train loss = 6.0398 Iteration 460, time = 1.65s, wps = 62162, train loss = 6.0719 Iteration 480, time = 1.63s, wps = 62991, train loss = 6.0706 Iteration 500, time = 1.60s, wps = 63873, train loss = 5.9236 Iteration 520, time = 1.64s, wps = 62507, train loss = 5.8470 Iteration 540, time = 1.63s, wps = 62636, train loss = 5.9780 Iteration 560, time = 1.62s, wps = 63312, train loss = 6.0229 Iteration 580, time = 1.65s, wps = 62136, train loss = 5.8848 Iteration 600, time = 1.61s, wps = 63475, train loss = 5.8820 Iteration 620, time = 1.61s, wps = 63599, train loss = 5.8441 Iteration 640, time = 1.62s, wps = 63138, train loss = 5.7133 Iteration 660, time = 1.61s, wps = 63713, train loss = 5.8342 Iteration 680, time = 1.61s, wps = 63643, train loss = 5.7198 Iteration 700, time = 1.62s, wps = 63215, train loss = 5.7212 Iteration 720, time = 1.62s, wps = 63391, train loss = 5.7142 Iteration 740, time = 1.65s, wps = 62054, train loss = 5.6916 Iteration 760, time = 1.59s, wps = 64480, train loss = 5.6115 Iteration 780, time = 1.62s, wps = 63026, train loss = 5.6435 Iteration 800, time = 1.64s, wps = 62520, train loss = 5.5626 Iteration 820, time = 1.63s, wps = 62924, train loss = 5.6337 Iteration 840, time = 1.60s, wps = 64109, train loss = 5.6714 Iteration 860, time = 1.59s, wps = 64273, train loss = 5.6126 Iteration 880, time = 1.64s, wps = 62270, train loss = 5.5172 Iteration 900, time = 1.59s, wps = 64396, train loss = 5.5961 Iteration 920, time = 1.60s, wps = 63814, train loss = 5.4999 Iteration 940, time = 1.62s, wps = 63046, train loss = 5.5473 Iteration 960, time = 1.62s, wps = 63177, train loss = 5.5295 Iteration 980, time = 1.62s, wps = 63384, train loss = 5.4986 Iteration 1000, time = 1.60s, wps = 63811, train loss = 5.5579 Iteration 1020, time = 1.63s, wps = 63014, train loss = 5.3945 Iteration 1040, time = 1.61s, wps = 63511, train loss = 5.4253 Iteration 1060, time = 1.61s, wps = 63515, train loss = 5.4546 Iteration 1080, time = 1.62s, wps = 63180, train loss = 5.4374 Iteration 1100, time = 1.62s, wps = 63040, train loss = 5.4299 Iteration 1120, time = 1.65s, wps = 62235, train loss = 5.4246 Iteration 1140, time = 1.60s, wps = 64082, train loss = 5.4792 Iteration 1160, time = 1.60s, wps = 63953, train loss = 5.3636 Iteration 1180, time = 1.62s, wps = 63056, train loss = 5.3960 Iteration 1200, time = 1.65s, wps = 61995, train loss = 5.3834 Iteration 1220, time = 1.62s, wps = 63223, train loss = 5.4136 Iteration 1240, time = 1.60s, wps = 63965, train loss = 5.4183 Iteration 1260, time = 1.61s, wps = 63793, train loss = 5.3447 Iteration 1280, time = 1.60s, wps = 64194, train loss = 5.3302 Iteration 1300, time = 1.62s, wps = 63363, train loss = 5.2852 Iteration 1320, time = 1.61s, wps = 63680, train loss = 5.3653 Iteration 1340, time = 1.61s, wps = 63800, train loss = 5.3207 Iteration 1360, time = 1.60s, wps = 64092, train loss = 5.2513 Iteration 1380, time = 1.64s, wps = 62423, train loss = 5.3015 Iteration 1400, time = 1.61s, wps = 63715, train loss = 5.1799 Iteration 1420, time = 1.61s, wps = 63567, train loss = 5.1817 Iteration 1440, time = 1.60s, wps = 64116, train loss = 5.2870 Iteration 1460, time = 1.61s, wps = 63416, train loss = 5.2286 Iteration 1480, time = 1.60s, wps = 63868, train loss = 5.2270 Iteration 1500, time = 1.64s, wps = 62501, train loss = 5.1794 Iteration 1520, time = 1.62s, wps = 63323, train loss = 5.1743 Iteration 1540, time = 1.57s, wps = 65241, train loss = 5.0818 Iteration 1560, time = 1.64s, wps = 62268, train loss = 5.1499 Iteration 1580, time = 1.65s, wps = 62200, train loss = 5.2196 Processing file: ./data/1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00055-of-00100 Finished processing! Iteration 1600, time = 4.68s, wps = 21886, train loss = 5.2003 Iteration 1620, time = 1.60s, wps = 64160, train loss = 5.0723 /usr/local/lib/python3.5/dist-packages/tensorflow/python/summary/writer/writer.py:386: UserWarning: Attempting to use a closed FileWriter. The operation will be a noop unless the FileWriter is explicitly reopened. warnings.warn("Attempting to use a closed FileWriter. " real 3m19.007s user 8m44.262s sys 1m5.871s root@33a7548b3728:/workspace/nvidia-examples/big_lstm# time python single_lm_train.py --mode=train --logdir=./logs --num_gpus=2 --datadir=./data/1-billion-word-language-modeling-benchmark-r13output WARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0. For more information, please see: * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md * https://github.com/tensorflow/addons If you depend on functionality not listed there, please file an issue. *****HYPER PARAMETERS***** {'num_layers': 1, 'num_shards': 8, 'num_gpus': 2, 'do_summaries': False, 'max_grad_norm': 10.0, 'state_size': 2048, 'projected_size': 512, 'average_params': True, 'num_delayed_steps': 150, 'batch_size': 128, 'num_sampled': 8192, 'emb_size': 512, 'max_time': 180, 'num_steps': 20, 'keep_prob': 0.9, 'learning_rate': 0.2, 'run_profiler': False, 'vocab_size': 793470, 'optimizer': 0} ************************** WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. WARNING:tensorflow:From /opt/tensorflow/nvidia-examples/big_lstm/model_utils.py:33: UniformUnitScaling.__init__ (from tensorflow.python.ops.init_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.initializers.variance_scaling instead with distribution=uniform to get equivalent behavior. WARNING:tensorflow:From /opt/tensorflow/nvidia-examples/big_lstm/language_model.py:75: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version. Instructions for updating: Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`. WARNING:tensorflow:From /opt/tensorflow/nvidia-examples/big_lstm/language_model.py:107: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/nn_impl.py:1444: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version. Instructions for updating: Create a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead. WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/array_grad.py:425: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. Current time: 1556412582.5321198 ALL VARIABLES WARNING:tensorflow:From /opt/tensorflow/nvidia-examples/big_lstm/run_utils.py:18: all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02. Instructions for updating: Please use tf.global_variables instead. model/emb_0:0 (99184, 512) /gpu:0 model/emb_1:0 (99184, 512) /gpu:0 model/emb_2:0 (99184, 512) /gpu:0 model/emb_3:0 (99184, 512) /gpu:0 model/emb_4:0 (99184, 512) /gpu:0 model/emb_5:0 (99184, 512) /gpu:0 model/emb_6:0 (99184, 512) /gpu:0 model/emb_7:0 (99184, 512) /gpu:0 model/lstm_0/LSTMCell/W_0:0 (1024, 8192) /gpu:0 model/lstm_0/LSTMCell/B:0 (8192,) /gpu:0 model/lstm_0/LSTMCell/W_P_0:0 (2048, 512) /gpu:0 model/softmax_w_0:0 (99184, 512) /gpu:0 model/softmax_w_1:0 (99184, 512) /gpu:0 model/softmax_w_2:0 (99184, 512) /gpu:0 model/softmax_w_3:0 (99184, 512) /gpu:0 model/softmax_w_4:0 (99184, 512) /gpu:0 model/softmax_w_5:0 (99184, 512) /gpu:0 model/softmax_w_6:0 (99184, 512) /gpu:0 model/softmax_w_7:0 (99184, 512) /gpu:0 model/softmax_b:0 (793470,) /gpu:0 model/global_step:0 () model/model/emb_0/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_1/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_2/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_3/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_4/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_5/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_6/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_7/Adagrad:0 (99184, 512) /gpu:0 model/model/lstm_0/LSTMCell/W_0/Adagrad:0 (1024, 8192) /gpu:0 model/model/lstm_0/LSTMCell/B/Adagrad:0 (8192,) /gpu:0 model/model/lstm_0/LSTMCell/W_P_0/Adagrad:0 (2048, 512) /gpu:0 model/model/softmax_w_0/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_1/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_2/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_3/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_4/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_5/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_6/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_7/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_b/Adagrad:0 (793470,) /gpu:0 model/model/lstm_0/LSTMCell/W_0/ExponentialMovingAverage:0 (1024, 8192) /gpu:0 model/model/lstm_0/LSTMCell/B/ExponentialMovingAverage:0 (8192,) /gpu:0 model/model/lstm_0/LSTMCell/W_P_0/ExponentialMovingAverage:0 (2048, 512) /gpu:0 TRAINABLE VARIABLES model/emb_0:0 (99184, 512) /gpu:0 model/emb_1:0 (99184, 512) /gpu:0 model/emb_2:0 (99184, 512) /gpu:0 model/emb_3:0 (99184, 512) /gpu:0 model/emb_4:0 (99184, 512) /gpu:0 model/emb_5:0 (99184, 512) /gpu:0 model/emb_6:0 (99184, 512) /gpu:0 model/emb_7:0 (99184, 512) /gpu:0 model/lstm_0/LSTMCell/W_0:0 (1024, 8192) /gpu:0 model/lstm_0/LSTMCell/B:0 (8192,) /gpu:0 model/lstm_0/LSTMCell/W_P_0:0 (2048, 512) /gpu:0 model/softmax_w_0:0 (99184, 512) /gpu:0 model/softmax_w_1:0 (99184, 512) /gpu:0 model/softmax_w_2:0 (99184, 512) /gpu:0 model/softmax_w_3:0 (99184, 512) /gpu:0 model/softmax_w_4:0 (99184, 512) /gpu:0 model/softmax_w_5:0 (99184, 512) /gpu:0 model/softmax_w_6:0 (99184, 512) /gpu:0 model/softmax_w_7:0 (99184, 512) /gpu:0 model/softmax_b:0 (793470,) /gpu:0 LOCAL VARIABLES model/model/state_0_0:0 (128, 2560) /gpu:0 model/model_1/state_1_0:0 (128, 2560) /gpu:1 WARNING:tensorflow:From /opt/tensorflow/nvidia-examples/big_lstm/run_utils.py:32: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.train.MonitoredTrainingSession 2019-04-28 00:49:43.102040: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2999735000 Hz 2019-04-28 00:49:43.103404: I tensorflow/compiler/xla/service/service.cc:161] XLA service 0x9851b60 executing computations on platform Host. Devices: 2019-04-28 00:49:43.103441: I tensorflow/compiler/xla/service/service.cc:168] StreamExecutor device (0): , 2019-04-28 00:49:43.478212: I tensorflow/compiler/xla/service/service.cc:161] XLA service 0x9851580 executing computations on platform CUDA. Devices: 2019-04-28 00:49:43.478271: I tensorflow/compiler/xla/service/service.cc:168] StreamExecutor device (0): GeForce RTX 2080 Ti, Compute Capability 7.5 2019-04-28 00:49:43.478296: I tensorflow/compiler/xla/service/service.cc:168] StreamExecutor device (1): GeForce RTX 2080 Ti, Compute Capability 7.5 2019-04-28 00:49:43.479888: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.635 pciBusID: 0000:05:00.0 totalMemory: 10.73GiB freeMemory: 10.57GiB 2019-04-28 00:49:43.481145: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 1 with properties: name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.635 pciBusID: 0000:09:00.0 totalMemory: 10.73GiB freeMemory: 10.53GiB 2019-04-28 00:49:43.481227: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0, 1 2019-04-28 00:49:44.194013: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-04-28 00:49:44.194053: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 1 2019-04-28 00:49:44.194066: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N Y 2019-04-28 00:49:44.194072: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 1: Y N 2019-04-28 00:49:44.195068: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10196 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:05:00.0, compute capability: 7.5) 2019-04-28 00:49:44.195556: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 10157 MB memory) -> physical GPU (device: 1, name: GeForce RTX 2080 Ti, pci bus id: 0000:09:00.0, compute capability: 7.5) WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py:1266: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version. Instructions for updating: Use standard file APIs to check for files with this prefix. WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py:1070: get_checkpoint_mtimes (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version. Instructions for updating: Use standard file utilities to get mtimes. Processing file: ./data/1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00017-of-00100 Finished processing! 2019-04-28 00:50:03.847050: I tensorflow/stream_executor/dso_loader.cc:153] successfully opened CUDA library libcublas.so.10 locally Iteration 1636, time = 13.70s, wps = 374, train loss = 5.5702 Iteration 1637, time = 9.75s, wps = 525, train loss = 5.1788 Iteration 1638, time = 0.08s, wps = 62005, train loss = 5.1816 Iteration 1639, time = 0.08s, wps = 66916, train loss = 5.2493 Iteration 1640, time = 0.08s, wps = 66067, train loss = 5.1906 Iteration 1641, time = 0.08s, wps = 62537, train loss = 5.1462 Iteration 1642, time = 0.08s, wps = 65137, train loss = 5.1706 Iteration 1643, time = 0.08s, wps = 62869, train loss = 5.2408 Iteration 1644, time = 0.08s, wps = 65022, train loss = 5.1744 Iteration 1655, time = 0.87s, wps = 64749, train loss = 5.0697 Iteration 1675, time = 1.60s, wps = 64018, train loss = 5.1137 Iteration 1695, time = 1.54s, wps = 66478, train loss = 5.1324 Iteration 1715, time = 1.60s, wps = 63892, train loss = 5.0929 Iteration 1735, time = 1.61s, wps = 63617, train loss = 5.1117 Iteration 1755, time = 1.67s, wps = 61143, train loss = 5.1405 Iteration 1775, time = 1.62s, wps = 63103, train loss = 5.1155 Iteration 1795, time = 1.63s, wps = 62879, train loss = 5.0829 Iteration 1815, time = 1.63s, wps = 62922, train loss = 5.0924 Iteration 1835, time = 1.59s, wps = 64401, train loss = 5.0183 Iteration 1855, time = 1.61s, wps = 63596, train loss = 5.1545 Iteration 1875, time = 1.59s, wps = 64230, train loss = 5.1650 Iteration 1895, time = 1.66s, wps = 61621, train loss = 5.0763 Iteration 1915, time = 1.60s, wps = 64064, train loss = 5.0696 Iteration 1935, time = 1.63s, wps = 62851, train loss = 5.0408 Iteration 1955, time = 1.61s, wps = 63642, train loss = 5.0050 Iteration 1975, time = 1.62s, wps = 63045, train loss = 5.0222 Iteration 1995, time = 1.62s, wps = 63047, train loss = 4.9776 Iteration 2015, time = 1.62s, wps = 63318, train loss = 4.9363 Iteration 2035, time = 1.62s, wps = 63326, train loss = 4.9926 Iteration 2055, time = 1.61s, wps = 63441, train loss = 5.0541 Iteration 2075, time = 1.63s, wps = 62935, train loss = 4.9523 Iteration 2095, time = 1.65s, wps = 61977, train loss = 5.0463 Iteration 2115, time = 1.61s, wps = 63660, train loss = 5.0128 Iteration 2135, time = 1.65s, wps = 62235, train loss = 4.9846 Iteration 2155, time = 1.61s, wps = 63545, train loss = 5.0162 Iteration 2175, time = 1.60s, wps = 63980, train loss = 4.9579 Iteration 2195, time = 1.65s, wps = 62165, train loss = 4.9849 Iteration 2215, time = 1.61s, wps = 63524, train loss = 5.0168 Iteration 2235, time = 1.61s, wps = 63614, train loss = 4.9403 Iteration 2255, time = 1.59s, wps = 64286, train loss = 5.0217 Iteration 2275, time = 1.62s, wps = 63360, train loss = 4.9553 Iteration 2295, time = 1.61s, wps = 63495, train loss = 4.9689 Iteration 2315, time = 1.63s, wps = 62677, train loss = 4.9083 Iteration 2335, time = 1.64s, wps = 62302, train loss = 4.8876 Iteration 2355, time = 1.62s, wps = 63143, train loss = 4.9363 Iteration 2375, time = 1.63s, wps = 62970, train loss = 4.8548 Iteration 2395, time = 1.60s, wps = 63855, train loss = 4.8866 Iteration 2415, time = 1.62s, wps = 63303, train loss = 4.9194 Iteration 2435, time = 1.64s, wps = 62397, train loss = 4.9045 Iteration 2455, time = 1.61s, wps = 63603, train loss = 4.8802 Iteration 2475, time = 1.63s, wps = 63008, train loss = 4.7980 Iteration 2495, time = 1.63s, wps = 62971, train loss = 4.8983 Iteration 2515, time = 1.62s, wps = 63248, train loss = 4.8908 Iteration 2535, time = 1.64s, wps = 62478, train loss = 4.8597 Iteration 2555, time = 1.60s, wps = 64153, train loss = 4.8556 Iteration 2575, time = 1.61s, wps = 63438, train loss = 4.9171 Iteration 2595, time = 1.65s, wps = 62214, train loss = 4.8533 Iteration 2615, time = 1.61s, wps = 63531, train loss = 4.8217 Iteration 2635, time = 1.62s, wps = 63196, train loss = 4.8729 Iteration 2655, time = 1.61s, wps = 63640, train loss = 4.8526 Iteration 2675, time = 1.59s, wps = 64522, train loss = 4.9248 Iteration 2695, time = 1.62s, wps = 63040, train loss = 4.8525 Iteration 2715, time = 1.63s, wps = 62954, train loss = 4.8400 Iteration 2735, time = 1.65s, wps = 62135, train loss = 4.8860 Iteration 2755, time = 1.66s, wps = 61865, train loss = 4.7353 Iteration 2775, time = 1.60s, wps = 63936, train loss = 4.7831 Iteration 2795, time = 1.61s, wps = 63425, train loss = 4.8003 Iteration 2815, time = 1.64s, wps = 62431, train loss = 4.8338 Iteration 2835, time = 1.61s, wps = 63586, train loss = 4.8671 Iteration 2855, time = 1.62s, wps = 63215, train loss = 4.8183 Iteration 2875, time = 1.60s, wps = 64182, train loss = 4.7446 Iteration 2895, time = 1.62s, wps = 63172, train loss = 4.8078 Iteration 2915, time = 1.64s, wps = 62454, train loss = 4.7571 Iteration 2935, time = 1.60s, wps = 63933, train loss = 4.7869 Iteration 2955, time = 1.60s, wps = 63890, train loss = 4.8440 Iteration 2975, time = 1.62s, wps = 63237, train loss = 4.8568 Iteration 2995, time = 1.65s, wps = 62094, train loss = 4.7659 Iteration 3015, time = 1.63s, wps = 62901, train loss = 4.7601 Iteration 3035, time = 1.62s, wps = 63045, train loss = 4.7535 Iteration 3055, time = 1.62s, wps = 63265, train loss = 4.7754 Iteration 3075, time = 1.63s, wps = 62753, train loss = 4.7962 Iteration 3095, time = 1.63s, wps = 62787, train loss = 4.7795 Iteration 3115, time = 1.65s, wps = 61994, train loss = 4.7490 Iteration 3135, time = 1.61s, wps = 63726, train loss = 4.7976 Iteration 3155, time = 1.59s, wps = 64311, train loss = 4.7953 Iteration 3175, time = 1.63s, wps = 62726, train loss = 4.7330 Iteration 3195, time = 1.65s, wps = 62089, train loss = 4.6979 Iteration 3215, time = 1.62s, wps = 63344, train loss = 4.7777 Processing file: ./data/1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00031-of-00100 Finished processing! Iteration 3235, time = 4.70s, wps = 21800, train loss = 4.7080 Iteration 3255, time = 1.65s, wps = 62074, train loss = 4.7395 Iteration 3275, time = 1.62s, wps = 63099, train loss = 4.7182 Iteration 3295, time = 1.60s, wps = 63970, train loss = 4.7416 Iteration 3315, time = 1.61s, wps = 63507, train loss = 4.7339 Iteration 3335, time = 1.58s, wps = 64662, train loss = 4.8762 Iteration 3355, time = 1.61s, wps = 63535, train loss = 4.7146 Iteration 3375, time = 1.64s, wps = 62626, train loss = 4.7589 Iteration 3395, time = 1.60s, wps = 64169, train loss = 4.7271 Iteration 3415, time = 1.61s, wps = 63475, train loss = 4.6897 /usr/local/lib/python3.5/dist-packages/tensorflow/python/summary/writer/writer.py:386: UserWarning: Attempting to use a closed FileWriter. The operation will be a noop unless the FileWriter is explicitly reopened. warnings.warn("Attempting to use a closed FileWriter. " real 3m18.273s user 9m16.914s sys 1m3.304s root@33a7548b3728:/workspace/nvidia-examples/big_lstm# time python single_lm_train.py --mode=train --logdir=./logs --num_gpus=2 --datadir=./data/1-billion-word-language-modeling-benchmark-r13output WARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0. For more information, please see: * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md * https://github.com/tensorflow/addons If you depend on functionality not listed there, please file an issue. *****HYPER PARAMETERS***** {'state_size': 2048, 'keep_prob': 0.9, 'num_sampled': 8192, 'num_steps': 20, 'emb_size': 512, 'num_layers': 1, 'num_shards': 8, 'run_profiler': False, 'vocab_size': 793470, 'max_grad_norm': 10.0, 'do_summaries': False, 'batch_size': 128, 'num_gpus': 2, 'max_time': 180, 'average_params': True, 'learning_rate': 0.2, 'num_delayed_steps': 150, 'projected_size': 512, 'optimizer': 0} ************************** WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. WARNING:tensorflow:From /opt/tensorflow/nvidia-examples/big_lstm/model_utils.py:33: UniformUnitScaling.__init__ (from tensorflow.python.ops.init_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.initializers.variance_scaling instead with distribution=uniform to get equivalent behavior. WARNING:tensorflow:From /opt/tensorflow/nvidia-examples/big_lstm/language_model.py:75: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version. Instructions for updating: Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`. WARNING:tensorflow:From /opt/tensorflow/nvidia-examples/big_lstm/language_model.py:107: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/nn_impl.py:1444: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version. Instructions for updating: Create a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead. WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/array_grad.py:425: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. Current time: 1556413043.0484493 ALL VARIABLES WARNING:tensorflow:From /opt/tensorflow/nvidia-examples/big_lstm/run_utils.py:18: all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02. Instructions for updating: Please use tf.global_variables instead. model/emb_0:0 (99184, 512) /gpu:0 model/emb_1:0 (99184, 512) /gpu:0 model/emb_2:0 (99184, 512) /gpu:0 model/emb_3:0 (99184, 512) /gpu:0 model/emb_4:0 (99184, 512) /gpu:0 model/emb_5:0 (99184, 512) /gpu:0 model/emb_6:0 (99184, 512) /gpu:0 model/emb_7:0 (99184, 512) /gpu:0 model/lstm_0/LSTMCell/W_0:0 (1024, 8192) /gpu:0 model/lstm_0/LSTMCell/B:0 (8192,) /gpu:0 model/lstm_0/LSTMCell/W_P_0:0 (2048, 512) /gpu:0 model/softmax_w_0:0 (99184, 512) /gpu:0 model/softmax_w_1:0 (99184, 512) /gpu:0 model/softmax_w_2:0 (99184, 512) /gpu:0 model/softmax_w_3:0 (99184, 512) /gpu:0 model/softmax_w_4:0 (99184, 512) /gpu:0 model/softmax_w_5:0 (99184, 512) /gpu:0 model/softmax_w_6:0 (99184, 512) /gpu:0 model/softmax_w_7:0 (99184, 512) /gpu:0 model/softmax_b:0 (793470,) /gpu:0 model/global_step:0 () model/model/emb_0/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_1/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_2/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_3/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_4/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_5/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_6/Adagrad:0 (99184, 512) /gpu:0 model/model/emb_7/Adagrad:0 (99184, 512) /gpu:0 model/model/lstm_0/LSTMCell/W_0/Adagrad:0 (1024, 8192) /gpu:0 model/model/lstm_0/LSTMCell/B/Adagrad:0 (8192,) /gpu:0 model/model/lstm_0/LSTMCell/W_P_0/Adagrad:0 (2048, 512) /gpu:0 model/model/softmax_w_0/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_1/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_2/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_3/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_4/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_5/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_6/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_w_7/Adagrad:0 (99184, 512) /gpu:0 model/model/softmax_b/Adagrad:0 (793470,) /gpu:0 model/model/lstm_0/LSTMCell/W_0/ExponentialMovingAverage:0 (1024, 8192) /gpu:0 model/model/lstm_0/LSTMCell/B/ExponentialMovingAverage:0 (8192,) /gpu:0 model/model/lstm_0/LSTMCell/W_P_0/ExponentialMovingAverage:0 (2048, 512) /gpu:0 TRAINABLE VARIABLES model/emb_0:0 (99184, 512) /gpu:0 model/emb_1:0 (99184, 512) /gpu:0 model/emb_2:0 (99184, 512) /gpu:0 model/emb_3:0 (99184, 512) /gpu:0 model/emb_4:0 (99184, 512) /gpu:0 model/emb_5:0 (99184, 512) /gpu:0 model/emb_6:0 (99184, 512) /gpu:0 model/emb_7:0 (99184, 512) /gpu:0 model/lstm_0/LSTMCell/W_0:0 (1024, 8192) /gpu:0 model/lstm_0/LSTMCell/B:0 (8192,) /gpu:0 model/lstm_0/LSTMCell/W_P_0:0 (2048, 512) /gpu:0 model/softmax_w_0:0 (99184, 512) /gpu:0 model/softmax_w_1:0 (99184, 512) /gpu:0 model/softmax_w_2:0 (99184, 512) /gpu:0 model/softmax_w_3:0 (99184, 512) /gpu:0 model/softmax_w_4:0 (99184, 512) /gpu:0 model/softmax_w_5:0 (99184, 512) /gpu:0 model/softmax_w_6:0 (99184, 512) /gpu:0 model/softmax_w_7:0 (99184, 512) /gpu:0 model/softmax_b:0 (793470,) /gpu:0 LOCAL VARIABLES model/model/state_0_0:0 (128, 2560) /gpu:0 model/model_1/state_1_0:0 (128, 2560) /gpu:1 WARNING:tensorflow:From /opt/tensorflow/nvidia-examples/big_lstm/run_utils.py:32: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.train.MonitoredTrainingSession 2019-04-28 00:57:23.622035: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2999735000 Hz 2019-04-28 00:57:23.623569: I tensorflow/compiler/xla/service/service.cc:161] XLA service 0x88dde60 executing computations on platform Host. Devices: 2019-04-28 00:57:23.623616: I tensorflow/compiler/xla/service/service.cc:168] StreamExecutor device (0): , 2019-04-28 00:57:23.976116: I tensorflow/compiler/xla/service/service.cc:161] XLA service 0x86cc2b0 executing computations on platform CUDA. Devices: 2019-04-28 00:57:23.976164: I tensorflow/compiler/xla/service/service.cc:168] StreamExecutor device (0): GeForce RTX 2080 Ti, Compute Capability 7.5 2019-04-28 00:57:23.976180: I tensorflow/compiler/xla/service/service.cc:168] StreamExecutor device (1): GeForce RTX 2080 Ti, Compute Capability 7.5 2019-04-28 00:57:23.977428: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.635 pciBusID: 0000:05:00.0 totalMemory: 10.73GiB freeMemory: 10.57GiB 2019-04-28 00:57:23.978439: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 1 with properties: name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.635 pciBusID: 0000:09:00.0 totalMemory: 10.73GiB freeMemory: 10.53GiB 2019-04-28 00:57:23.978500: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0, 1 2019-04-28 00:57:24.704792: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-04-28 00:57:24.704832: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 1 2019-04-28 00:57:24.704861: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N Y 2019-04-28 00:57:24.704868: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 1: Y N 2019-04-28 00:57:24.705830: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10196 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:05:00.0, compute capability: 7.5) 2019-04-28 00:57:24.706267: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 10155 MB memory) -> physical GPU (device: 1, name: GeForce RTX 2080 Ti, pci bus id: 0000:09:00.0, compute capability: 7.5) WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py:1266: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version. Instructions for updating: Use standard file APIs to check for files with this prefix. WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py:1070: get_checkpoint_mtimes (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version. Instructions for updating: Use standard file utilities to get mtimes. Processing file: ./data/1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00024-of-00100 Finished processing! 2019-04-28 00:57:44.469879: I tensorflow/stream_executor/dso_loader.cc:153] successfully opened CUDA library libcublas.so.10 locally Iteration 3431, time = 13.73s, wps = 373, train loss = 5.0563 Iteration 3432, time = 9.76s, wps = 525, train loss = 4.7930 Iteration 3433, time = 0.09s, wps = 56109, train loss = 4.7167 Iteration 3434, time = 0.08s, wps = 64209, train loss = 4.7180 Iteration 3435, time = 0.08s, wps = 61204, train loss = 4.7031 Iteration 3436, time = 0.09s, wps = 59641, train loss = 4.6660 Iteration 3437, time = 0.08s, wps = 62710, train loss = 4.7192 Iteration 3438, time = 0.08s, wps = 64817, train loss = 4.9075 Iteration 3439, time = 0.09s, wps = 59812, train loss = 4.7786 Iteration 3450, time = 0.89s, wps = 63540, train loss = 4.7344 Iteration 3470, time = 1.60s, wps = 63980, train loss = 4.7379 Iteration 3490, time = 1.62s, wps = 63129, train loss = 4.7246 Iteration 3510, time = 1.60s, wps = 64193, train loss = 4.7408 Iteration 3530, time = 1.62s, wps = 63169, train loss = 4.8164 Iteration 3550, time = 1.63s, wps = 62785, train loss = 4.6811 Iteration 3570, time = 1.62s, wps = 63128, train loss = 4.6651 Iteration 3590, time = 1.60s, wps = 63905, train loss = 4.6777 Iteration 3610, time = 1.60s, wps = 63871, train loss = 4.7126 Iteration 3630, time = 1.60s, wps = 64062, train loss = 4.6588 Iteration 3650, time = 1.63s, wps = 62818, train loss = 4.7067 Iteration 3670, time = 1.60s, wps = 63936, train loss = 4.6712 Iteration 3690, time = 1.60s, wps = 64133, train loss = 4.7586 Iteration 3710, time = 1.61s, wps = 63642, train loss = 4.6967 Iteration 3730, time = 1.60s, wps = 64034, train loss = 4.6832 Iteration 3750, time = 1.62s, wps = 63061, train loss = 4.5413 Iteration 3770, time = 1.60s, wps = 63971, train loss = 4.7273 Iteration 3790, time = 1.61s, wps = 63422, train loss = 4.6815 Iteration 3810, time = 1.60s, wps = 63924, train loss = 4.6280 Iteration 3830, time = 1.60s, wps = 64061, train loss = 4.6135 Iteration 3850, time = 1.63s, wps = 62779, train loss = 4.6245 Iteration 3870, time = 1.60s, wps = 63810, train loss = 4.6535 Iteration 3890, time = 1.62s, wps = 63119, train loss = 4.5852 Iteration 3910, time = 1.61s, wps = 63498, train loss = 4.6286 Iteration 3930, time = 1.64s, wps = 62550, train loss = 4.6353 Iteration 3950, time = 1.61s, wps = 63547, train loss = 4.6033 Iteration 3970, time = 1.63s, wps = 62943, train loss = 4.6387 Iteration 3990, time = 1.61s, wps = 63689, train loss = 4.7194 Iteration 4010, time = 1.59s, wps = 64226, train loss = 4.6185 Iteration 4030, time = 1.61s, wps = 63652, train loss = 4.6202 Iteration 4050, time = 1.65s, wps = 62145, train loss = 4.6748 Iteration 4070, time = 1.59s, wps = 64274, train loss = 4.5819 Iteration 4090, time = 1.63s, wps = 62922, train loss = 4.6544 Iteration 4110, time = 1.60s, wps = 63915, train loss = 4.6162 Iteration 4130, time = 1.64s, wps = 62382, train loss = 4.6638 Iteration 4150, time = 1.61s, wps = 63702, train loss = 4.6429 Iteration 4170, time = 1.63s, wps = 62693, train loss = 4.5383 Iteration 4190, time = 1.62s, wps = 63190, train loss = 4.6428 Iteration 4210, time = 1.63s, wps = 62809, train loss = 4.5609 Iteration 4230, time = 1.63s, wps = 62989, train loss = 4.6309 Iteration 4250, time = 1.59s, wps = 64384, train loss = 4.6468 Iteration 4270, time = 1.63s, wps = 62712, train loss = 4.6280 Iteration 4290, time = 1.59s, wps = 64361, train loss = 4.6337 Iteration 4310, time = 1.61s, wps = 63648, train loss = 4.6745 Iteration 4330, time = 1.61s, wps = 63443, train loss = 4.5360 Iteration 4350, time = 1.61s, wps = 63643, train loss = 4.6209 Iteration 4370, time = 1.58s, wps = 64707, train loss = 4.5795 Iteration 4390, time = 1.60s, wps = 64176, train loss = 4.6141 Iteration 4410, time = 1.60s, wps = 63945, train loss = 4.6497 Iteration 4430, time = 1.61s, wps = 63513, train loss = 4.6351 Iteration 4450, time = 1.60s, wps = 63955, train loss = 4.6029 Iteration 4470, time = 1.62s, wps = 63249, train loss = 4.5871 Iteration 4490, time = 1.59s, wps = 64421, train loss = 4.5390 Iteration 4510, time = 1.61s, wps = 63767, train loss = 4.4735 Iteration 4530, time = 1.60s, wps = 63811, train loss = 4.5823 Iteration 4550, time = 1.61s, wps = 63430, train loss = 4.5945 Iteration 4570, time = 1.60s, wps = 64198, train loss = 4.5381 Iteration 4590, time = 1.61s, wps = 63620, train loss = 4.5488 Iteration 4610, time = 1.64s, wps = 62259, train loss = 4.5480 Iteration 4630, time = 1.63s, wps = 62835, train loss = 4.6283 Iteration 4650, time = 1.63s, wps = 62833, train loss = 4.4722 Iteration 4670, time = 1.61s, wps = 63780, train loss = 4.5882 Iteration 4690, time = 1.60s, wps = 63865, train loss = 4.4992 Iteration 4710, time = 1.63s, wps = 62969, train loss = 4.4917 Iteration 4730, time = 1.64s, wps = 62404, train loss = 4.4792 Iteration 4750, time = 1.63s, wps = 63015, train loss = 4.5587 Iteration 4770, time = 1.63s, wps = 62887, train loss = 4.5349 Iteration 4790, time = 1.63s, wps = 62975, train loss = 4.6011 Iteration 4810, time = 1.59s, wps = 64441, train loss = 4.5937 Iteration 4830, time = 1.61s, wps = 63628, train loss = 4.5640 Iteration 4850, time = 1.63s, wps = 62840, train loss = 4.5311 Iteration 4870, time = 1.63s, wps = 62754, train loss = 4.5507 Iteration 4890, time = 1.63s, wps = 62745, train loss = 4.6125 Iteration 4910, time = 1.59s, wps = 64520, train loss = 4.6389 Iteration 4930, time = 1.62s, wps = 63173, train loss = 4.5209 Iteration 4950, time = 1.63s, wps = 62723, train loss = 4.5718 Iteration 4970, time = 1.61s, wps = 63557, train loss = 4.5989 Iteration 4990, time = 1.61s, wps = 63682, train loss = 4.4892 Processing file: ./data/1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en-00063-of-00100 Finished processing! Iteration 5010, time = 4.71s, wps = 21743, train loss = 4.5478 Iteration 5030, time = 1.62s, wps = 63353, train loss = 4.5155 Iteration 5050, time = 1.62s, wps = 63307, train loss = 4.4878 Iteration 5070, time = 1.60s, wps = 64200, train loss = 4.5517 Iteration 5090, time = 1.63s, wps = 62980, train loss = 4.5614 Iteration 5110, time = 1.64s, wps = 62511, train loss = 4.5320 Iteration 5130, time = 1.57s, wps = 65075, train loss = 4.4969 Iteration 5150, time = 1.61s, wps = 63456, train loss = 4.4787 Iteration 5170, time = 1.62s, wps = 63079, train loss = 4.5372 Iteration 5190, time = 1.63s, wps = 62709, train loss = 4.4670 Iteration 5210, time = 1.63s, wps = 62864, train loss = 4.4559 /usr/local/lib/python3.5/dist-packages/tensorflow/python/summary/writer/writer.py:386: UserWarning: Attempting to use a closed FileWriter. The operation will be a noop unless the FileWriter is explicitly reopened. warnings.warn("Attempting to use a closed FileWriter. " real 3m18.162s user 9m17.859s sys 1m2.117s root@33a7548b3728:/workspace/nvidia-examples/big_lstm# cat /etc/os-release NAME="Ubuntu" VERSION="16.04.6 LTS (Xenial Xerus)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 16.04.6 LTS" VERSION_ID="16.04" HOME_URL="http://www.ubuntu.com/" SUPPORT_URL="http://help.ubuntu.com/" BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/" VERSION_CODENAME=xenial UBUNTU_CODENAME=xenial root@33a7548b3728:/workspace/nvidia-examples/big_lstm# nvcc -V nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2019 NVIDIA Corporation Built on Fri_Feb__8_19:08:17_PST_2019 Cuda compilation tools, release 10.1, V10.1.105 root@33a7548b3728:/workspace/nvidia-examples/big_lstm# cd data root@33a7548b3728:/workspace/nvidia-examples/big_lstm/data# ls 1-billion-word-language-modeling-benchmark-r13output root@33a7548b3728:/workspace/nvidia-examples/big_lstm/data# cd 1-billion-word-language-modeling-benchmark-r13output root@33a7548b3728:/workspace/nvidia-examples/big_lstm/data/1-billion-word-language-modeling-benchmark-r13output# ls 1b_word_vocab.txt heldout-monolingual.tokenized.shuffled README training-monolingual.tokenized.shuffled root@33a7548b3728:/workspace/nvidia-examples/big_lstm/data/1-billion-word-language-modeling-benchmark-r13output# cd training-monolingual.tokenized.shuffled root@33a7548b3728:/workspace/nvidia-examples/big_lstm/data/1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled# ls news.en-00001-of-00100 news.en-00034-of-00100 news.en-00067-of-00100 news.en-00002-of-00100 news.en-00035-of-00100 news.en-00068-of-00100 news.en-00003-of-00100 news.en-00036-of-00100 news.en-00069-of-00100 news.en-00004-of-00100 news.en-00037-of-00100 news.en-00070-of-00100 news.en-00005-of-00100 news.en-00038-of-00100 news.en-00071-of-00100 news.en-00006-of-00100 news.en-00039-of-00100 news.en-00072-of-00100 news.en-00007-of-00100 news.en-00040-of-00100 news.en-00073-of-00100 news.en-00008-of-00100 news.en-00041-of-00100 news.en-00074-of-00100 news.en-00009-of-00100 news.en-00042-of-00100 news.en-00075-of-00100 news.en-00010-of-00100 news.en-00043-of-00100 news.en-00076-of-00100 news.en-00011-of-00100 news.en-00044-of-00100 news.en-00077-of-00100 news.en-00012-of-00100 news.en-00045-of-00100 news.en-00078-of-00100 news.en-00013-of-00100 news.en-00046-of-00100 news.en-00079-of-00100 news.en-00014-of-00100 news.en-00047-of-00100 news.en-00080-of-00100 news.en-00015-of-00100 news.en-00048-of-00100 news.en-00081-of-00100 news.en-00016-of-00100 news.en-00049-of-00100 news.en-00082-of-00100 news.en-00017-of-00100 news.en-00050-of-00100 news.en-00083-of-00100 news.en-00018-of-00100 news.en-00051-of-00100 news.en-00084-of-00100 news.en-00019-of-00100 news.en-00052-of-00100 news.en-00085-of-00100 news.en-00020-of-00100 news.en-00053-of-00100 news.en-00086-of-00100 news.en-00021-of-00100 news.en-00054-of-00100 news.en-00087-of-00100 news.en-00022-of-00100 news.en-00055-of-00100 news.en-00088-of-00100 news.en-00023-of-00100 news.en-00056-of-00100 news.en-00089-of-00100 news.en-00024-of-00100 news.en-00057-of-00100 news.en-00090-of-00100 news.en-00025-of-00100 news.en-00058-of-00100 news.en-00091-of-00100 news.en-00026-of-00100 news.en-00059-of-00100 news.en-00092-of-00100 news.en-00027-of-00100 news.en-00060-of-00100 news.en-00093-of-00100 news.en-00028-of-00100 news.en-00061-of-00100 news.en-00094-of-00100 news.en-00029-of-00100 news.en-00062-of-00100 news.en-00095-of-00100 news.en-00030-of-00100 news.en-00063-of-00100 news.en-00096-of-00100 news.en-00031-of-00100 news.en-00064-of-00100 news.en-00097-of-00100 news.en-00032-of-00100 news.en-00065-of-00100 news.en-00098-of-00100 news.en-00033-of-00100 news.en-00066-of-00100 news.en-00099-of-00100 root@33a7548b3728:/workspace/nvidia-examples/big_lstm/data/1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled# exit exit [chibi@manjaro ~]$ cat /etc/os-release NAME="Manjaro Linux" ID=manjaro ID_LIKE=arch PRETTY_NAME="Manjaro Linux" ANSI_COLOR="1;32" HOME_URL="https://www.manjaro.org/" SUPPORT_URL="https://www.manjaro.org/" BUG_REPORT_URL="https://bugs.manjaro.org/" [chibi@manjaro ~]$ nvcc -V nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2019 NVIDIA Corporation Built on Fri_Feb__8_19:08:17_PST_2019 Cuda compilation tools, release 10.1, V10.1.105 [chibi@manjaro ~]$ nvidia-smi nvlink -c GPU 0: GeForce RTX 2080 Ti (UUID: GPU-1ac935c2-557f-282e-14e5-3f749ffd63ac) Link 0, P2P is supported: true Link 0, Access to system memory supported: true Link 0, P2P atomics supported: true Link 0, System memory atomics supported: true Link 0, SLI is supported: true Link 0, Link is supported: false Link 1, P2P is supported: true Link 1, Access to system memory supported: true Link 1, P2P atomics supported: true Link 1, System memory atomics supported: true Link 1, SLI is supported: true Link 1, Link is supported: false GPU 1: GeForce RTX 2080 Ti (UUID: GPU-13277ce5-e1e9-0cb1-8cee-6c9e6618e774) Link 0, P2P is supported: true Link 0, Access to system memory supported: true Link 0, P2P atomics supported: true Link 0, System memory atomics supported: true Link 0, SLI is supported: true Link 0, Link is supported: false Link 1, P2P is supported: true Link 1, Access to system memory supported: true Link 1, P2P atomics supported: true Link 1, System memory atomics supported: true Link 1, SLI is supported: true Link 1, Link is supported: false [chibi@manjaro ~]$ sudo nvme list [sudo] chibi のパスワード: Node SN Model Namespace Usage Format FW Rev ---------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- -------- /dev/nvme0n1 2I2720039250 ADATA SX8200NP 1 240.06 GB / 240.06 GB 512 B + 0 B SVN139B [chibi@manjaro ~]$ sudo nvme smart-log /dev/nvme0n1 Smart Log for NVME device:nvme0n1 namespace-id:ffffffff critical_warning : 0 temperature : 30 C available_spare : 100% available_spare_threshold : 10% percentage_used : 0% data_units_read : 1,178,921 data_units_written : 1,309,586 host_read_commands : 18,603,966 host_write_commands : 18,134,235 controller_busy_time : 192 power_cycles : 75 power_on_hours : 406 unsafe_shutdowns : 7 media_errors : 0 num_err_log_entries : 0 Warning Temperature Time : 0 Critical Composite Temperature Time : 0 Thermal Management T1 Trans Count : 0 Thermal Management T2 Trans Count : 0 Thermal Management T1 Total Time : 0 Thermal Management T2 Total Time : 0 [chibi@manjaro ~]$