I am using fairseq to run some multi-GPU training of NLP models. After upgrading Pytorch from 1.4.0 to 1.5.0 through conda (on the pytorch channel), I consistently get this error:
Error: mkl-service + Intel(R) MKL: MKL_THREADING_LAYER=INTEL is incompatible with libgomp.so.1 library.
Try to import numpy first or set the threading layer accordingly. Set MKL_SERVICE_FORCE_INTEL to force it.
Error: mkl-service + Intel(R) MKL: MKL_THREADING_LAYER=INTEL is incompatible with libgomp.so.1 library.
Try to import numpy first or set the threading layer accordingly. Set MKL_SERVICE_FORCE_INTEL to force it.
Error: mkl-service + Intel(R) MKL: MKL_THREADING_LAYER=INTEL is incompatible with libgomp.so.1 library.
Try to import numpy first or set the threading layer accordingly. Set MKL_SERVICE_FORCE_INTEL to force it.
Traceback (most recent call last):
File "/data/mwright/anaconda3/envs/gpu/bin/fairseq-train", line 11, in <module>
load_entry_point('fairseq', 'console_scripts', 'fairseq-train')()
File "/data/mwright/fairseq/fairseq_cli/train.py", line 355, in cli_main
nprocs=args.distributed_world_size,
File "/data/mwright/anaconda3/envs/gpu/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 200, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/data/mwright/anaconda3/envs/gpu/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 158, in start_processes
while not context.join():
File "/data/mwright/anaconda3/envs/gpu/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 113, in join
(error_index, exitcode)
Exception: process 0 terminated with exit code 1
The above error message was generated when trying to train on 3 GPUs, so I assume that the three repetitions of the incompatibliity error means each process is generating one.
This error does not occur if I downgrade to Pytorch 1.4.0.
To ReproduceSteps to reproduce the behavior:
The above error does not occur
EnvironmentPlease copy and paste the output from our
environment collection script
(or fill out the checklist below manually).
You can get the script and run it with:
wget https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py
# For security purposes, please check the contents of collect_env.py before running it.
python collect_env.py
conda
, pip
, source): condaname: gpu channels: - pytorch - defaults - conda-forge dependencies: - _libgcc_mutex=0.1=main - _tflow_select=2.1.0=gpu - absl-py=0.9.0=py37_0 - asn1crypto=1.3.0=py37_0 - astor=0.8.0=py37_0 - blas=1.0=mkl - blinker=1.4=py37_0 - c-ares=1.15.0=h7b6447c_1001 - ca-certificates=2020.1.1=0 - cachetools=3.1.1=py_0 - certifi=2020.4.5.1=py37_0 - cffi=1.14.0=py37h2e261b9_0 - chardet=3.0.4=py37_1003 - click=7.1.1=py_0 - cryptography=2.8=py37h1ba5d50_0 - cudatoolkit=10.1.243=h6bb024c_0 - cudnn=7.6.5=cuda10.1_0 - cupti=10.1.168=0 - cycler=0.10.0=py37_0 - cython=0.29.15=py37he6710b0_0 - dbus=1.13.12=h746ee38_0 - expat=2.2.6=he6710b0_0 - fontconfig=2.13.0=h9420a91_0 - freetype=2.9.1=h8a8886c_1 - future=0.18.2=py37_0 - gast=0.2.2=py37_0 - glib=2.63.1=h5a9c865_0 - google-auth=1.13.1=py_0 - google-auth-oauthlib=0.4.1=py_2 - google-pasta=0.2.0=py_0 - grpcio=1.27.2=py37hf8bcb03_0 - gst-plugins-base=1.14.0=hbbd80ab_1 - gstreamer=1.14.0=hb453b48_1 - h5py=2.10.0=py37h7918eee_0 - hdf5=1.10.4=hb1b8bf9_0 - icu=58.2=h9c2bf20_1 - idna=2.9=py_1 - intel-openmp=2020.0=166 - jpeg=9b=h024ee3a_2 - keras-applications=1.0.8=py_0 - keras-preprocessing=1.1.0=py_1 - kiwisolver=1.1.0=py37he6710b0_0 - ld_impl_linux-64=2.33.1=h53a641e_7 - libedit=3.1.20181209=hc058e9b_0 - libffi=3.2.1=hd88cf55_4 - libgcc-ng=9.1.0=hdf63c60_0 - libgfortran-ng=7.3.0=hdf63c60_0 - libpng=1.6.37=hbc83047_0 - libprotobuf=3.11.4=hd408876_0 - libstdcxx-ng=9.1.0=hdf63c60_0 - libuuid=1.0.3=h1bed415_2 - libxcb=1.13=h1bed415_1 - libxml2=2.9.9=hea5a465_1 - markdown=3.1.1=py37_0 - matplotlib=3.1.3=py37_0 - matplotlib-base=3.1.3=py37hef1b27d_0 - mkl=2020.0=166 - mkl-service=2.3.0=py37he904b0f_0 - mkl_fft=1.0.15=py37ha843d7b_0 - mkl_random=1.1.0=py37hd6b4f25_0 - ncurses=6.2=he6710b0_0 - ninja=1.9.0=py37hfd86e86_0 - numpy=1.18.1=py37h4f9e942_0 - numpy-base=1.18.1=py37hde5b4d6_1 - oauthlib=3.1.0=py_0 - openssl=1.1.1g=h7b6447c_0 - opt_einsum=3.1.0=py_0 - pcre=8.43=he6710b0_0 - pip=20.0.2=py37_1 - portalocker=1.5.2=py37_0 - protobuf=3.11.4=py37he6710b0_0 - pyasn1=0.4.8=py_0 - pyasn1-modules=0.2.7=py_0 - pycparser=2.20=py_0 - pyjwt=1.7.1=py37_0 - pyopenssl=19.1.0=py37_0 - pyparsing=2.4.6=py_0 - pyqt=5.9.2=py37h05f1152_2 - pysocks=1.7.1=py37_0 - python=3.7.7=hcf32534_0_cpython - python-dateutil=2.8.1=py_0 - pytorch=1.5.0=py3.7_cuda10.1.243_cudnn7.6.3_0 - qt=5.9.7=h5867ecd_1 - readline=8.0=h7b6447c_0 - regex=2020.4.4=py37h7b6447c_0 - requests=2.23.0=py37_0 - requests-oauthlib=1.3.0=py_0 - rsa=4.0=py_0 - scipy=1.4.1=py37h0b6359f_0 - setuptools=46.1.3=py37_0 - sip=4.19.8=py37hf484d3e_0 - six=1.14.0=py37_0 - sqlite=3.31.1=h62c20be_1 - tensorboard=2.1.0=py3_0 - tensorboardx=2.0=py_0 - tensorflow=2.1.0=gpu_py37h7a4bb67_0 - tensorflow-base=2.1.0=gpu_py37h6c5654b_0 - tensorflow-estimator=2.1.0=pyhd54b08b_0 - tensorflow-gpu=2.1.0=h0d30ee6_0 - termcolor=1.1.0=py37_1 - tk=8.6.8=hbc83047_0 - tornado=6.0.4=py37h7b6447c_1 - tqdm=4.45.0=py_0 - typing=3.6.4=py37_0 - urllib3=1.25.8=py37_0 - werkzeug=1.0.1=py_0 - wheel=0.34.2=py37_0 - wrapt=1.12.1=py37h7b6447c_1 - xz=5.2.5=h7b6447c_0 - zlib=1.2.11=h7b6447c_3 - pip: - mecab-python3==0.996.5 - sacrebleu==1.4.8 prefix: /data/mwright/anaconda3/envs/gpu
cc @ezyang @gchanan @zou3519 @malfet
emaballarin, royyoung388, zixiliuUSC, villmow, odelalleau and 18 morechebee7i, bgshih, rhzhang-ustc and YorkNishi999
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4