ThreadVectorMDRange
¶
Header File: <Kokkos_Core.hpp>
ThreadVectorMDRange is a nested execution policy used inside of hierarchical parallelism.
Interface¶Constructor
Splits the index range 0
to extent
over the vector lanes of the calling thread, where extent
is the backend-dependent rank that will be vectorized
team – TeamHandle to the calling team execution context
extent_1, extent_2, ... – index range lengths of each rank
Requirements
TeamHandle
is a type that models TeamHandle
extent_1, extent_2, ...
are ints
extent_i
is such that i >= 2 && i <= 8
is true. For example:
ThreadVectorMDRange(team, 4); // NOT OK, violates i>=2 ThreadVectorMDRange(team, 4,5); // OK ThreadVectorMDRange(team, 4,5,6); // OK ThreadVectorMDRange(team, 4,5,6,2,3,4,5,6); // OK, max num of extents allowed
The constructor can not be called inside a parallel operation dispatched using a TeamVectorRange
policy, TeamVectorRange
policy, TeamVectorMDRange
policy or ThreadVectorMDRange
policy.
Note that when used in parallel_reduce, the reduction is limited to a sum.
Examples¶using TeamHandle = TeamPolicy<>::member_type; parallel_for(TeamPolicy<>(N, Kokkos::AUTO), KOKKOS_LAMBDA(TeamHandle const& team) { int leagueRank = team.league_rank(); auto teamThreadRange = TeamThreadRange(team, n0); auto threadVectorMDRange = ThreadVectorMDRange<Rank<3>, TeamHandle>( team, n1, n2, n3); parallel_for(teamThreadRange, [=](int i0) { parallel_for(threadVectorMDRange, [=](int i1, int i2, int i3) { A(leagueRank, i0, i1, i2, i3) += B(leagueRank, i1) + C(i1, i2, i3); }); }); team.team_barrier(); int teamSum = 0; parallel_for(teamThreadRange, [=, &teamSum](int const& i0) { int threadSum = 0; parallel_reduce(threadVectorMDRange, [=](int i1, int i2, int i3, int& vectorSum) { vectorSum += D(leagueRank, i0, i1, i2, i3); }, threadSum ); teamSum += threadSum; }); });
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4