RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://github.com/pytorch/pytorch/issues/37377 below:

Pytorch 1.5.0 (installed from conda) errors with complaints about incompatibility between MKL and libgomp when using Pytorch's multiprocessing · Issue #37377 · pytorch/pytorch · GitHub

🐛 Bug

I am using fairseq to run some multi-GPU training of NLP models. After upgrading Pytorch from 1.4.0 to 1.5.0 through conda (on the pytorch channel), I consistently get this error:

Error: mkl-service + Intel(R) MKL: MKL_THREADING_LAYER=INTEL is incompatible with libgomp.so.1 library.
        Try to import numpy first or set the threading layer accordingly. Set MKL_SERVICE_FORCE_INTEL to force it.
Error: mkl-service + Intel(R) MKL: MKL_THREADING_LAYER=INTEL is incompatible with libgomp.so.1 library.
        Try to import numpy first or set the threading layer accordingly. Set MKL_SERVICE_FORCE_INTEL to force it.
Error: mkl-service + Intel(R) MKL: MKL_THREADING_LAYER=INTEL is incompatible with libgomp.so.1 library.
        Try to import numpy first or set the threading layer accordingly. Set MKL_SERVICE_FORCE_INTEL to force it.
Traceback (most recent call last):
  File "/data/mwright/anaconda3/envs/gpu/bin/fairseq-train", line 11, in <module>
    load_entry_point('fairseq', 'console_scripts', 'fairseq-train')()
  File "/data/mwright/fairseq/fairseq_cli/train.py", line 355, in cli_main
    nprocs=args.distributed_world_size,
  File "/data/mwright/anaconda3/envs/gpu/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 200, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "/data/mwright/anaconda3/envs/gpu/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 158, in start_processes
    while not context.join():
  File "/data/mwright/anaconda3/envs/gpu/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 113, in join
    (error_index, exitcode)
Exception: process 0 terminated with exit code 1

The above error message was generated when trying to train on 3 GPUs, so I assume that the three repetitions of the incompatibliity error means each process is generating one.

This error does not occur if I downgrade to Pytorch 1.4.0.

To Reproduce

Steps to reproduce the behavior:

Install Pytorch 1.5.0 through conda.
Install fairseq to the conda environment from source
Run a multi-GPU training run on fairseq

Expected behavior

The above error does not occur

Environment

Please copy and paste the output from our
environment collection script
(or fill out the checklist below manually).

You can get the script and run it with:

wget https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py
# For security purposes, please check the contents of collect_env.py before running it.
python collect_env.py

PyTorch Version (e.g., 1.0): 1.5.0
OS (e.g., Linux): Ubuntu Linux 18.04.2
How you installed PyTorch (conda, pip, source): conda
Build command you used (if compiling from source): N/A
Python version: 3.7.7
CUDA/cuDNN version: 10.1/7.6.5 (also from conda)
GPU models and configuration: 3x Tesla V100
Any other relevant information:
Here is the yaml of my conda environment:

name: gpu
channels:
  - pytorch
  - defaults
  - conda-forge
dependencies:
  - _libgcc_mutex=0.1=main
  - _tflow_select=2.1.0=gpu
  - absl-py=0.9.0=py37_0
  - asn1crypto=1.3.0=py37_0
  - astor=0.8.0=py37_0
  - blas=1.0=mkl
  - blinker=1.4=py37_0
  - c-ares=1.15.0=h7b6447c_1001
  - ca-certificates=2020.1.1=0
  - cachetools=3.1.1=py_0
  - certifi=2020.4.5.1=py37_0
  - cffi=1.14.0=py37h2e261b9_0
  - chardet=3.0.4=py37_1003
  - click=7.1.1=py_0
  - cryptography=2.8=py37h1ba5d50_0
  - cudatoolkit=10.1.243=h6bb024c_0
  - cudnn=7.6.5=cuda10.1_0
  - cupti=10.1.168=0
  - cycler=0.10.0=py37_0
  - cython=0.29.15=py37he6710b0_0
  - dbus=1.13.12=h746ee38_0
  - expat=2.2.6=he6710b0_0
  - fontconfig=2.13.0=h9420a91_0
  - freetype=2.9.1=h8a8886c_1
  - future=0.18.2=py37_0
  - gast=0.2.2=py37_0
  - glib=2.63.1=h5a9c865_0
  - google-auth=1.13.1=py_0
  - google-auth-oauthlib=0.4.1=py_2
  - google-pasta=0.2.0=py_0
  - grpcio=1.27.2=py37hf8bcb03_0
  - gst-plugins-base=1.14.0=hbbd80ab_1
  - gstreamer=1.14.0=hb453b48_1
  - h5py=2.10.0=py37h7918eee_0
  - hdf5=1.10.4=hb1b8bf9_0
  - icu=58.2=h9c2bf20_1
  - idna=2.9=py_1
  - intel-openmp=2020.0=166
  - jpeg=9b=h024ee3a_2
  - keras-applications=1.0.8=py_0
  - keras-preprocessing=1.1.0=py_1
  - kiwisolver=1.1.0=py37he6710b0_0
  - ld_impl_linux-64=2.33.1=h53a641e_7
  - libedit=3.1.20181209=hc058e9b_0
  - libffi=3.2.1=hd88cf55_4
  - libgcc-ng=9.1.0=hdf63c60_0
  - libgfortran-ng=7.3.0=hdf63c60_0
  - libpng=1.6.37=hbc83047_0
  - libprotobuf=3.11.4=hd408876_0
  - libstdcxx-ng=9.1.0=hdf63c60_0
  - libuuid=1.0.3=h1bed415_2
  - libxcb=1.13=h1bed415_1
  - libxml2=2.9.9=hea5a465_1
  - markdown=3.1.1=py37_0
  - matplotlib=3.1.3=py37_0
  - matplotlib-base=3.1.3=py37hef1b27d_0
  - mkl=2020.0=166
  - mkl-service=2.3.0=py37he904b0f_0
  - mkl_fft=1.0.15=py37ha843d7b_0
  - mkl_random=1.1.0=py37hd6b4f25_0
  - ncurses=6.2=he6710b0_0
  - ninja=1.9.0=py37hfd86e86_0
  - numpy=1.18.1=py37h4f9e942_0
  - numpy-base=1.18.1=py37hde5b4d6_1
  - oauthlib=3.1.0=py_0
  - openssl=1.1.1g=h7b6447c_0
  - opt_einsum=3.1.0=py_0
  - pcre=8.43=he6710b0_0
  - pip=20.0.2=py37_1
  - portalocker=1.5.2=py37_0
  - protobuf=3.11.4=py37he6710b0_0
  - pyasn1=0.4.8=py_0
  - pyasn1-modules=0.2.7=py_0
  - pycparser=2.20=py_0
  - pyjwt=1.7.1=py37_0
  - pyopenssl=19.1.0=py37_0
  - pyparsing=2.4.6=py_0
  - pyqt=5.9.2=py37h05f1152_2
  - pysocks=1.7.1=py37_0
  - python=3.7.7=hcf32534_0_cpython
  - python-dateutil=2.8.1=py_0
  - pytorch=1.5.0=py3.7_cuda10.1.243_cudnn7.6.3_0
  - qt=5.9.7=h5867ecd_1
  - readline=8.0=h7b6447c_0
  - regex=2020.4.4=py37h7b6447c_0
  - requests=2.23.0=py37_0
  - requests-oauthlib=1.3.0=py_0
  - rsa=4.0=py_0
  - scipy=1.4.1=py37h0b6359f_0
  - setuptools=46.1.3=py37_0
  - sip=4.19.8=py37hf484d3e_0
  - six=1.14.0=py37_0
  - sqlite=3.31.1=h62c20be_1
  - tensorboard=2.1.0=py3_0
  - tensorboardx=2.0=py_0
  - tensorflow=2.1.0=gpu_py37h7a4bb67_0
  - tensorflow-base=2.1.0=gpu_py37h6c5654b_0
  - tensorflow-estimator=2.1.0=pyhd54b08b_0
  - tensorflow-gpu=2.1.0=h0d30ee6_0
  - termcolor=1.1.0=py37_1
  - tk=8.6.8=hbc83047_0
  - tornado=6.0.4=py37h7b6447c_1
  - tqdm=4.45.0=py_0
  - typing=3.6.4=py37_0
  - urllib3=1.25.8=py37_0
  - werkzeug=1.0.1=py_0
  - wheel=0.34.2=py37_0
  - wrapt=1.12.1=py37h7b6447c_1
  - xz=5.2.5=h7b6447c_0
  - zlib=1.2.11=h7b6447c_3
  - pip:
    - mecab-python3==0.996.5
    - sacrebleu==1.4.8
prefix: /data/mwright/anaconda3/envs/gpu

cc @ezyang @gchanan @zou3519 @malfet

emaballarin, royyoung388, zixiliuUSC, villmow, odelalleau and 18 morechebee7i, bgshih, rhzhang-ustc and YorkNishi999

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4