A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://pytorch.org/docs/stable/notes/cpu_threading_torchscript_inference.html below:

CPU threading and TorchScript inference — PyTorch 2.8 documentation

CPU threading and TorchScript inference#

Created On: Jul 29, 2019 | Last Updated On: Mar 26, 2020

PyTorch allows using multiple CPU threads during TorchScript model inference. The following figure shows different levels of parallelism one would find in a typical application:

One or more inference threads execute a model’s forward pass on the given inputs. Each inference thread invokes a JIT interpreter that executes the ops of a model inline, one by one. A model can utilize a fork TorchScript primitive to launch an asynchronous task. Forking several operations at once results in a task that is executed in parallel. The fork operator returns a Future object which can be used to synchronize on later, for example:

@torch.jit.script
def compute_z(x):
    return torch.mm(x, self.w_z)

@torch.jit.script
def forward(x):
    # launch compute_z asynchronously:
    fut = torch.jit._fork(compute_z, x)
    # execute the next operation in parallel to compute_z:
    y = torch.mm(x, self.w_y)
    # wait for the result of compute_z:
    z = torch.jit._wait(fut)
    return y + z

PyTorch uses a single thread pool for the inter-op parallelism, this thread pool is shared by all inference tasks that are forked within the application process.

In addition to the inter-op parallelism, PyTorch can also utilize multiple threads within the ops (intra-op parallelism). This can be useful in many cases, including element-wise ops on large tensors, convolutions, GEMMs, embedding lookups and others.

Build options#

PyTorch uses an internal ATen library to implement ops. In addition to that, PyTorch can also be built with support of external libraries, such as MKL and MKL-DNN, to speed up computations on CPU.

ATen, MKL and MKL-DNN support intra-op parallelism and depend on the following parallelization libraries to implement it:

OpenMP historically has been used by a large number of libraries. It is known for a relative ease of use and support for loop-based parallelism and other primitives.

TBB is used to a lesser extent in external libraries, but, at the same time, is optimized for the concurrent environments. PyTorch’s TBB backend guarantees that there’s a separate, single, per-process intra-op thread pool used by all of the ops running in the application.

Depending of the use case, one might find one or another parallelization library a better choice in their application.

PyTorch allows selecting of the parallelization backend used by ATen and other libraries at the build time with the following build options:

It is recommended not to mix OpenMP and TBB within one build.

Any of the TBB values above require USE_TBB=1 build setting (default: OFF). A separate setting USE_OPENMP=1 (default: ON) is required for OpenMP parallelism.

Runtime API#

The following API is used to control thread settings:

For the intra-op parallelism settings, at::set_num_threads, torch.set_num_threads always take precedence over environment variables, MKL_NUM_THREADS variable takes precedence over OMP_NUM_THREADS.

Tuning the number of threads#

The following simple script shows how a runtime of matrix multiplication changes with the number of threads:

import timeit
runtimes = []
threads = [1] + [t for t in range(2, 49, 2)]
for t in threads:
    torch.set_num_threads(t)
    r = timeit.timeit(setup = "import torch; x = torch.randn(1024, 1024); y = torch.randn(1024, 1024)", stmt="torch.mm(x, y)", number=100)
    runtimes.append(r)
# ... plotting (threads, runtimes) ...

Running the script on a system with 24 physical CPU cores (Xeon E5-2680, MKL and OpenMP based build) results in the following runtimes:

The following considerations should be taken into account when tuning the number of intra- and inter-op threads:

Warning

OpenMP does not guarantee that a single per-process intra-op thread pool is going to be used in the application. On the contrary, two different application or inter-op threads may use different OpenMP thread pools for intra-op work. This might result in a large number of threads used by the application. Extra care in tuning the number of threads is needed to avoid oversubscription in multi-threaded applications in OpenMP case.

Note

Pre-built PyTorch releases are compiled with OpenMP support.

Note

parallel_info utility prints information about thread settings and can be used for debugging. Similar output can be also obtained in Python with torch.__config__.parallel_info() call.


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4