Finite Field Operations on GPGPU
In recent times, I've been interested in Finite Field operations, so I decided to implement few fields in SYCL DPC++, targeting accelerators ( specifically GPGPUs ).
In this repository, currently I keep implementation of two finite field's arithmetic operations, accompanied with relevant benchmarks on both CPU, GPGPU.
F(2 ** 32)
F(2 ** 64 - 2 ** 32 + 1)
I've also written following implementations, along with benchmark results on CPU, GPU.
make
, clang-format
and dpcpp
/ clang++
installed$ lsb_release -d Description: Ubuntu 20.04.3 LTS
$ dpcpp --version Intel(R) oneAPI DPC++/C++ Compiler 2022.0.0 (2022.0.0.20211123) Target: x86_64-unknown-linux-gnu Thread model: posix
or
$ clang++ --version clang version 14.0.0 (https://github.com/intel/llvm dc9bd3fafdeacd28528eb4b1fef3ad9b76ef3b92) Target: x86_64-unknown-linux-gnu Thread model: posix
make # JIT kernel compilation on *default* device, for AOT read below ./run
DEVICE=cpu make # still JIT, but in runtime use CPU DEVICE=gpu make # still JIT, but in runtime use GPU DEVICE=host make # still JIT, but in runtime use HOST
Aforementioned steps compile kernels JIT, but if target device is already known it's better to compile them AOT, which saves some time in runtime, though compiled binary is device specific now.
I provide AOT kernel compilation recipe for CPUs using avx2
instructions. You can check whether your CPU supports that.
You may have some other hardware, consider taking a look at AOT compilation guidelines & make necessary changes in
Makefile
.
Targeting Nvidia GPU with CUDA backend :
For targeting Nvidia GPU, you want to run
DEVICE=gpu make cuda
, so that benchmark suite is compiled for CUDA backend. I suggest you read this for setting up your machine with Nvidia GPU, if you've not yet.
I run benchmark suite on both Intel CPU/ GPU and Nvidia GPU, keeping results 👇
Intel CPU/ GPU
F(2 ** 64 - 2 ** 32 + 1)
F(2 ** 64 - 2 ** 32 + 1)
Nvidia GPU
F(2 ** 64 - 2 ** 32 + 1)
F(2 ** 64 - 2 ** 32 + 1)
You can run basic test cases using
# set variable to runtime target device DEVICE=cpu|gpu|host make test
There's another set of randomised test cases, which asserts results ( obtained from my prime field implementation ) with another finite field implementation module, written in Python
, named galois
.
For running those, I suggest you first compile shared object using
# set variable to runtime target device DEVICE=cpu|gpu|host make genlib
After that you can follow next steps here.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4