A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://github.com/NVIDIA/nvbench below:

NVIDIA/nvbench: CUDA Kernel Benchmarking Library

This project is a work-in-progress. Everything is subject to change.

NVBench is a C++17 library designed to simplify CUDA kernel benchmarking. It features:

Check out this talk for an overview of the challenges inherent to CUDA kernel benchmarking and how NVBench solves them for you!

Supported Compilers and Tools

A basic kernel benchmark can be created with just a few lines of CUDA C++:

void my_benchmark(nvbench::state& state) {
  state.exec([](nvbench::launch& launch) {
    my_kernel<<<num_blocks, 256, 0, launch.get_stream()>>>();
  });
}
NVBENCH_BENCH(my_benchmark);

See Benchmarks for information on customizing benchmarks and implementing parameter sweeps.

Each benchmark executable produced by NVBench provides a rich set of command-line options for configuring benchmark execution at runtime. See the CLI overview and CLI axis specification for more information.

This repository provides a number of examples that demonstrate various NVBench features and usecases:

To build the examples:

mkdir -p build
cd build
cmake -DNVBench_ENABLE_EXAMPLES=ON -DCMAKE_CUDA_ARCHITECTURES=70 .. && make

Be sure to set CMAKE_CUDA_ARCHITECTURE based on the GPU you are running on.

Examples are built by default into build/bin and are prefixed with nvbench.example.

Example output from `nvbench.example.throughput`
# Devices

## [0] `Quadro GV100`
* SM Version: 700 (PTX Version: 700)
* Number of SMs: 80
* SM Default Clock Rate: 1627 MHz
* Global Memory: 32163 MiB Free / 32508 MiB Total
* Global Memory Bus Peak: 870 GiB/sec (4096-bit DDR @850MHz)
* Max Shared Memory: 96 KiB/SM, 48 KiB/Block
* L2 Cache Size: 6144 KiB
* Maximum Active Blocks: 32/SM
* Maximum Active Threads: 2048/SM, 1024/Block
* Available Registers: 65536/SM, 65536/Block
* ECC Enabled: No

# Log

Run:  throughput_bench [Device=0]
Warn: Current measurement timed out (15.00s) while over noise threshold (1.26% > 0.50%)
Pass: Cold: 0.262392ms GPU, 0.267860ms CPU, 7.19s total GPU, 27393x
Pass: Batch: 0.261963ms GPU, 7.18s total GPU, 27394x

# Benchmark Results

## throughput_bench

### [0] Quadro GV100

| NumElements |  DataSize  | Samples |  CPU Time  | Noise |  GPU Time  | Noise | Elem/s  | GlobalMem BW  | BWPeak | Batch GPU  | Batch  |
|-------------|------------|---------|------------|-------|------------|-------|---------|---------------|--------|------------|--------|
|    16777216 | 64.000 MiB |  27393x | 267.860 us | 1.25% | 262.392 us | 1.26% | 63.940G | 476.387 GiB/s | 58.77% | 261.963 us | 27394x |

To get started using NVBench with your own kernels, consider trying out the NVBench Demo Project.

nvbench_demo provides a simple CMake project that uses NVBench to build an example benchmark. It's a great way to experiment with the library without a lot of investment.

Contributions are welcome!

For current issues, see the issue board. Issues labeled with are good for first time contributors.

To build nvbench tests:

mkdir -p build
cd build
cmake -DNVBench_ENABLE_TESTING=ON .. && make

Tests are built by default into build/bin and prefixed with nvbench.test.

To run all tests:

or

NVBench is released under the Apache 2.0 License with LLVM exceptions. See LICENSE.

Scope and Related Projects

NVBench will measure the CPU and CUDA GPU execution time of a single host-side critical region per benchmark. It is intended for regression testing and parameter tuning of individual kernels. For in-depth analysis of end-to-end performance of multiple applications, the NVIDIA Nsight tools are more appropriate.

NVBench is focused on evaluating the performance of CUDA kernels. It also provides CPU-only benchmarking facilities intended for non-trivial CPU workloads, but is not optimized for CPU microbenchmarks. This may change in the future, but for now, consider using Google Benchmark for high resolution CPU benchmarks.


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4