A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://kokkos.github.io/kokkos-core-wiki/API/core/policies/ThreadVectorRange.html below:

ThreadVectorRange - Kokkos documentation

Toggle table of contents sidebar

ThreadVectorRange

Header File: Kokkos_Core.hpp

Usage:

parallel_for(ThreadVectorRange(team,range), [=] (int i) {...});
parallel_reduce(ThreadVectorRange(team,begin,end),
  [=] (int i, double& lsum) {...},sum);
parallel_scan(ThreadVectorRange(team,range),
  [=] (int i, double& lsum, bool final) {...});

ThreadVectorRange is a nested execution policy used inside hierarchical parallelism. In contrast to global policies, the public interface for nested policies is implemented as functions, in order to enable implicit templating on the execution space type via the team handle.

Synopsis
template<class TeamMemberType, class iType>
/* implementation defined */ ThreadVectorRange(TeamMemberType team, iType count);
template<class TeamMemberType, class iType1, class iType2>
/* implementation defined */ ThreadVectorRange(TeamMemberType team, iType1 begin, iType2 end);
Description
template<class TeamMemberType, class iType>
/* Implementation defined */ ThreadVectorRange(TeamMemberType team, iType count);

Splits the index range 0 to count-1 over the vector lanes of the calling thread.

template<class TeamMemberType, class iType1, class iType2>
/* Implementation defined */ ThreadVectorRange(TeamMemberType team, iType1 begin, iType2 end);

Splits the index range begin to end-1 over the vector lanes of the calling thread.

Examples
typedef TeamPolicy<>::member_type team_handle;
parallel_for(TeamPolicy<>(N,AUTO,4), KOKKOS_LAMBDA (const team_handle& team) {
 int n = team.league_rank();
 parallel_for(TeamThreadRange(team,M), [&] (const int i) {
   parallel_for(ThreadVectorRange(team,K), [&] (const int j) {
     A(n,i,j) = B(n,i) + j;
   });
 });
 team.team_barrier();
 int team_sum;
 parallel_reduce(TeamThreadRange(team,M), [&] (const int& i, int& threadsum) {
   int tsum = 0;
   parallel_reduce(ThreadVectorRange(team,K), [&] (const int& j, int& lsum) {
     lsum += A(n,i,j);
   },tsum);
   single(PerThread(team),[&] () {
     threadsum += tsum;
   });
 },team_sum);

   lsum += A(n,i);
 },team_sum);
 single(PerTeam(team),[&] () {
   A_rowsum(n) += team_sum;
 });
});

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4