RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://github.com/kaanolgu/bfs-sycl-fpga/wiki/Optimisation-Guide below:

Optimisation Guide · kaanolgu/bfs-sycl-fpga Wiki · GitHub

Attempts for Optimising the Code

This page is used for the paper (doi soon) to showcase the code optimisations done and their effect on the performance metrics All the test results are according to the tests with RMAT-19-32 with 4 Compute units

         // Old code
         unsigned int nodes_start = usm_nodes_start[idx];
        unsigned int nodes_end = usm_nodes_start[idx +1 ];

        // New code with split device pointers
        device_ptr<unsigned int> DevicePtr_start(usm_nodes_start+offset);  
        device_ptr<unsigned int> DevicePtr_end(usm_nodes_start + 1+offset);

        unsigned int nodes_start = DevicePtr_start[idx];
        unsigned int nodes_end = DevicePtr_end[idx];

Vector version - execution time : 0.10s Single version - execution time : 0.08s

// Vector code
std::vector<event> e_vec;
e_vec.push_back(e1);
e_vec.push_back(e2);
e_vec.push_back(e3);

auto e =q.single_task<class Task>( e_vec, [=](){
...

// Seperate Events
 auto e =q.single_task<class Task>( {e1,e2,e3}, [=]() {
 ...

Single Task vs NDRange vs Paralllel_for

Throughput : parallel_for > singletask > NDrange

  q.single_task<class SingleTask>(  [=]() [[intel::kernel_args_restrict]] {
           #pragma unroll 16
           for(int tid =0; tid < no_of_nodes; tid++){
             unsigned int condition = usm_updating_mask[tid];
             if(condition){
               usm_updating_mask[tid]=0;  
            }
           }
         });
  q.parallel_for<class ParallelFor>(no_of_nodes, [=]() [[intel::kernel_args_restrict]] {
             unsigned int condition = usm_updating_mask[tid];
             if(condition){
               usm_updating_mask[tid]=0;  
            }
         });

  int BLOCK_SIZE = 512;
  int global_work_size = (no_of_nodes + BLOCK_SIZE - 1) / BLOCK_SIZE * BLOCK_SIZE;
  range<1> gws (global_work_size);
  range<1> lws (BLOCK_SIZE);
  q.parallel_for<class NDRange>(nd_range<1>(gws, lws), [=] (nd_item<1> item) [[intel::kernel_args_restrict]]  {
              int gid = item.get_global_id();
              if(gid<no_of_nodes){
             unsigned int condition = usm_updating_mask[tid];
             if(condition){
               usm_updating_mask[tid]=0;  
            }
           }
         });

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4