It is possible to set the number of threads used by OpenBLAS via openblas_set_num_threads
. For the "custom thread" solution this works quite well: Independent of what the application may do the set number of threads is used inside OpenBLAS.
However for OpenMP this is not the case: An application might want to have OpenBLAS use 4 of 16 threads while using OpenMP itself to schedule other work or run 4 OpenBLAS operations in parallel each using 4 threads (up to the runtime if that is even possible, but the first use case should be). Another use case would be that OpenBLAS should use only 4 threads (e.g. due to performance reasons, usual matrix size, ...) but the application wants to use OpenMP (at other times, so not in parallel to OpenBLAS) with all 16 threads.
Now OpenBLAS does something nasty: It uses the max number of openmp threads and sets the max number of used threads to that value. So it is impossible to use less than the number of OpenMP threads.
In code the problem is 2-fold:
num_cpu_avail
which should only query does a modification: https://github.com/xianyi/OpenBLAS/blob/develop/common_thread.h#L154-L156So for a first fix I'd suggest to make num_cpu_avail
return the lesser of blas_cpu_number
and openmp_nthreads
instead of setting anything.
jeremiedbb, tigert1998, jjerphan, vitbrichnac, pati-ni and 1 more
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4