To use the Stan math library include exactly one of the following header files.
include contents also includesstan/math/prim.hpp
primitives n/a stan/math/rev.hpp
reverse-mode autodiff prim.hpp
stan/math/fwd.hpp
forward-mode autodiff prim.hpp
stan/math/mix.hpp
mixed-mode autodiff prim.hpp, fwd.hpp, rev.hpp
These top-level header files ensure that all necessary traits files are included in the appropriate order. For more detail, see
The message passing interface (MPI) allows the exchange of messages between different processes. We can use MPI to parallelize the computation of a single log probability computation by using multiple processes. The target audience for MPI are those with large computer clusters. For users looking for parallel computation on a single computer, please turn to a threading based approach, which is easier to use and provides similar performance gains.
Stan supports MPI on Mac OS X and Linux. Windows is not supported, though we have heard that some people have successfully used it.
A base MPI installation must be installed on the system. See the instructions from boost.mpi
to verify that there is a working MPI system. For Mac OS X and Linux, any MPI installation that works with boost.mpi
is supported. The two major open-source base MPI implementations are mpich
and openMPI
. The Math library is tested with these two implementations while others supported by boost.mpi
may work as well. The base MPI installation provides the command line tools
mpicxx
: The recommended compiler command to use when building any MPI application.mpirun
: Wrapper binary used to start a MPI enabled binary on a given machine.Please ask your system administrator for details on how to compile, execute, and submit MPI applications on a cluster.
Install mpich
from MacPorts or homebrew.
The package distribution system on your version of linux should have pre-built mpich
(or openmpi
) available. In addition to that, you must have the following packages installed (Ubuntu package names listed): python-dev, libxml2-dev, libxslt-dev
, and you may be required to add the following to your make/local
: LDLIBS+=-lpthread
.
Stan builds it's own boost.mpi
and boost.serialization
libraries and installs these into its library subfolder. If the operating system provides these Boost libraries and it's required to use them, there is additional configuration that needs to be done (through make/local
) to use that installation. The boost libraries are built using the boost build system. Boost build will attempt to auto-detect the MPI installation specifics on your system and the toolset to use. Should Boost's auto-detect fail or a specific configuration be required, then users can configure the boost build system through the configuration file stan-math/lib/boost_1.xx.x/user_config.jam
manually as needed.
We strongly recommend to use the mpicxx
command to build any program using MPI within Math. While it is possible to change the compiler used with these commands (openMPI has a -cxx=
option, for example), this can only be done with great caution. The complication is that during compilation of the base MPI libraries the exact bit representation of each type is analyzed and strong deviations due to compiler changes may lead to unexpected behavior. In case of compiler mismatch between the base MPI libraries and boost.mpi
(and Math) changes in the compiler ABI can lead to unexpected segfaults. Therefore, we recommend to use the mpicxx
as compiler and do not recommend to deviate from the compiler used to build MPI. Often this means to use the system default compiler which may be rather old and not ideal for Stan. In such cases a more modern gcc (if gcc is the system compiler) can be considered as long as no ABI changes are known.
Stan uses the boost.mpi
library to interface with the installed MPI implementation. boost.mpi
is built automatically by the Math library when the Math library is configured for MPI. To configure MPI for the Math library,
make/local
; if it does not exist, create one.make/local
file:STAN_MPI=true
CXX=mpicxx
TBB_CXX_TYPE=gcc
CXX=mpicxx
, the user can specify the compiler with the proper compiler and linker options needed to build an MPI enabled binary (the command mpicxx -show
displays for mpich
what is executed / openmpi
uses mpicxx -show-me
), but please read the note on compilers above.make/local
, all the tests should be rebuilt. Please type:Once the Math library is configured for MPI, the tests will be built with MPI. Note that the boost.mpi
and boost.serialization
library are build and linked against dynamically.
OpenCL is an open-source framework for writing programs that utilize a platform with heterogeneous hardware. Stan uses OpenCL to design the GPU routines for the Cholesky Decomposition and it's derivative. Other routines will be available in the future. These routines are suitable for programs which require solving large NxM
matrices (N>600
) such as algorithms that utilize large covariance matrices.
Users must have suitable hardware (e.g. Nvidia or AMD gpu) that supports OpenCL 1.2, valid OpenCL driver and a suitable C/C++ compiler installed on their computer.
The following guide is for Ubuntu, but it should be similar for any other Linux distribution. You should have the GNU compiler suite or clang compiler installed beforehand.
Install the Nvidia CUDA toolkit and clinfo tool if you have an Nvidia GPU
apt update apt install nvidia-cuda-toolkit clinfo
Those with AMD devices can install the OpenCL driver available through
apt install -y libclc-amdgcn mesa-opencl-icd clinfo
If your device is not supported by the current drivers available you can try Paulo Miguel PPA
add-apt-repository ppa:paulo-miguel-dias/mesa apt-get update apt-get install libclc-amdgcn mesa-opencl-icd
Mac's should already have the OpenCL driver installed if you have the appropriate hardware.
Note that if you are building on a mac laptop you may not have a GPU device. You can still use the OpenCL routines for parallelization on your CPU.
Install the latest Rtools suite if you don't already have it. During the installation make sure that the 64 bit toolchain is installed. You also need to verify that you have the System Enviroment variable Path
updated to include the path to the g++ compiler (<Rtools installation path>\mingw_64\bin
).
If you have a Nvidia card, install the latest Nvidia CUDA toolkit. AMD users should use AMD APP SDK.
Users can check that their installation is valid by downloading and running clinfo.
Setting up the Math Library to run on a GPUTo turn on GPU computation:
clinfo -l # Platform #0: Clover # Platform #1: Portable Computing Language # `-- Device #0: pthread-AMD Ryzen Threadripper 2950X 16-Core Processor # Platform #2: NVIDIA CUDA # +-- Device #0: TITAN Xp # `-- Device #1: GeForce GTX 1080 Ti
STAN_OPENCL=true OPENCL_DEVICE_ID=${CHOSEN_INDEX} OPENCL_PLATFORM_ID=${CHOSEN_INDEX}
where the user will replace ${CHOSEN_INDEX} with the index of the device and platform they would like to use. In most cases these two will be 0. If you are using Windows append the following lines at the end of the make/local file in order to link with the appropriate OpenCL library:
CC = g++ LDFLAGS_OPENCL= -L"$(CUDA_PATH)\lib\x64" -lOpenCL
CC = g++ LDFLAGS_OPENCL= -L"$(AMDAPPSDKROOT)lib\x86_64" -lOpenCLRunning Tests with OpenCL
Once you have done the above step, runTests.py
should execute with the GPU enabled. All tests will match the phrase *_opencl_*
and tests can be filtered such as
./runTests.py test/unit -f opencl
We currently have support for the following methods
TODO(Rok): provide example models for GLMs and GP
If you see the following error:
clBuildProgram CL_OUT_OF_HOST_MEMORY: Unknown error -6
you have most likely run of out available memory on your host system. OpenCL kernels are compiled just-in-time at the start of any OpenCL-enabled Stan/Stan Math program and thus may require more memory than when running without CPU support. If several CmdStan processes are started at the same time each process needs that memory for a moment. If there is not enough memory to compile OpenCL kernels, you will experience this error. Try running your model with less processes. Upgrading your GPU driver may also reduce the RAM usage for OpenCL kernel compilation.
By default the stan-math
library is not thread safe which is due to the requirement that the autodiff stack uses a global autodiff tape which records all operations of functions being evaluated. Starting with version 2.18 of stan-math
threading support can be switched on using the compile-time switch STAN_THREADS
. Defining this variable at compile time can be achieved by adding to make/local
CXXFLAGS += -DSTAN_THREADS
On develop
you can also achieve the same by adding the following to make/local
:
Once this is set stan-math
will use the C++11 thread_local
facility such that the autodiff stack is maintained per thread and not anymore globally. This allows the use of autodiff in a threaded application as this enables the calculation of the derivatives of a function inside a thread (the function itself may not use threads). Only if the function to be evaluated is composed of independent tasks it maybe possible to evaluate derivatives of a function in a threaded approach. An example of a threaded gradient evaluation is the map_rect
function in stan-math
.
In addition to making stan-math
thread safe this also turns on parallel execution support of the map_rect
function. Currently, the maximal number of threads being used by the function is controlled by the environment variable STAN_NUM_THREADS
at runtime. Setting this variable to a positive integer number defines the maximal number of threads being used. In case the variable is set to the special value of -1
this requests that as many threads as physical cores are being used. If the variable is not set a single thread is used. Any illegal value (not an integer, zero, other negative) will cause an exception to be thrown.
The Intel TBB library is used in stan-math since version 2.21.0. The Intel TBB library uses a threadpool internally and distributes work through a task-based approach. The tasks are dispatched to the threadpool via a the Intel TBB work-stealing scheduler. For example, whenever threading is enabled via STAN_THREADS
the map_rect
function in stan-math will use the tbb::parallel_for
of the TBB. This will execute the work chunks given to map_rect
with scheduling and thus load-balance CPU core utilization.
By default stan-math builds only the main tbb
library by defining the makefile
variable
The Intel TBB provides in addition to the main library memory allocators which are specifically designed to speedup threaded programs. These speedups have so far only been observed on MacOS systems for Stan programs such that on MacOS the default is set to
TBB_LIBRARIES=tbb tbbmalloc tbbmalloc_proxy
Users may override the default choices by defining TBB_LIBRARIES
in the make/local
file manually. Please refer to the pull request which merged the Intel TBB for further details on the performance evaluations.
Threading support requires a fully C++11 compliant compiler which has a working thread_local
implementation. Below you find for each operating system what is known to work. Known to work configurations refers to run successfully by developers.
The compiler support for the C++11 thread_local
keyword for the major open-source compilers is available since these versions:
-pthread
to the CXXFLAGS
variablethread_local
which should be fixed with clang >=4.0Known to work:
Should work:
thread_local
keyword since Mac OS Sierra (Xcode 8.0)Known to work:
With clang
on linux there are issues during the linking step of programs which happens on old distributions like ubuntu trusty 14.04 (the 16.04 LTS is fine). A solution can be found here. It is likely that clang 4.0 if used with libc++ 4.0 will work just fine, but developers have not yet confirmed this.
Known to work:
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4