Arrays with acceleration indices.
AcceleratedArrays provides (secondary) acceleration indexes for Julia AbstractArray
s. Such acceleration indexes can be used to speed up certain operations, particularly those involving searching through the values - for example, an AcceleratedArray
may have more efficient implementations of functions such as findall
, filter
, and unique
.
As a general rule, this package has been implemented for the purposes of accelerating analytics workloads and is designed to support functional, non-mutating workflows. It is currently not supported to add an index to data you expect to mutate afterwards.
To download this package, from Julia v1.0 press ]
to enter package mode and type:
pkg> dev https://github.com/andyferris/AcceleratedArrays.jl
An AcceleratedArray
is generally created by using the accelerate
and accelerate!
functions.
# Construct a hash mapping to unique names a = accelerate(["Alice", "Bob", "Charlie"], UniqueHashIndex) # Rearrange an array of random numbers into ascending order b = accelerate!(rand(1:100, 100), SortIndex)
The resulting arrays can be used just like regular Julia arrays, except some operations become faster. For example, the hash map will let us find a certain element without exhaustively searching the array, or we can easily find all the elements within a given interval with a sorted array.
# Find the index of "Bob" in `a` findall(isequal("Bob"), a) # Return all the numbers in `b` between 40 and 60 filter(in(40..60), b)
Accelerations are fully implemented for the following functions, where a
is an AcceleratedArray
:
x ∈ a
count(pred, a)
findall(pred, a)
filter(pred, a)
There is some work-in-progress on a variety of other functions, including some from SplitApplyCombine:
findfirst(pred, a)
and findlast(pred, a)
unique(a)
group
, groupinds
, groupview
and groupreduce
innerjoin
Accelerations are only available for some predicates pred
, which naturally depend on the acceleration index used (see below for a full set).
The package intruduces the AbstractIndex
supertype and the following concrete implemetations. Generally, an index is created when the user calls accelerate
or accelerate!
.
This index constructs a hashmap between values in the array, and the corresponding array indices. For example, invoking findall
to search for the locations of certain values will be reduced to a simple dictionary lookup. Primarily accelerates commands using the isequal
predicate.
Like HashIndex
, except each value in the array can only appear once. Apart from guaranteeing uniqueness, certain operations may be faster with a UniqueHashIndex
than with a HashIndex
.
This index determines the order of the elements (with respect to isless
). This index can accelerate not only the isequal
predicate, but a variety of other order-based predicates as well (see below).
The accelerate!
function will rearrange the input array, like sort!
. This can speed up operations due to simplified algorithms and cache locality.
Like SortIndex
, except each value in the array can only appear once. Apart from guaranteeing uniqueness, certain operations may be faster with a UniqueSortIndex
than with a SortIndex
.
It is simple for a user or another package to implement an AbstractIndex
- for instance a third-party package may provide a spatial acceleration index, or an index for fast textual search. Simply overload accelerate
(and optionally accelerate!
) as well as the operations you would like to accelerate, such as filter
, findall
, etc. Indices for unique sets of values may inherit from AbstractUniqueIndex <: AbstractIndex
.
In Julia, sorting is (typically) achieved using the isless
and isequal
predicates, which are designed to provide a canonical total order for values. Currently, the acceleration indices rely on these rather than the comparison operators ==
, <
, <=
, >
, >=
and !=
.
To make life easier, this package introduces a number of new convenience functions:
islessequal(a, b) = isless(a, b) || isequal(a, b)
isgreater(a, b) = isless(b, a)
isgreaterequal(a, b) = isless(b, a) || isequal(a, b)
Any of these support "currying", which is a simple syntax for creating a closure such as isequal(a) = (b -> isequal(a, b))
. Such curried predicates are picked up by multiple dispatch to accelerate operations like findall(isequal(3.0), accelerated_array)
.
It is common to want to search for all values in a range. This package introduces an Interval
type to represent the set of of values between two endpoints (with respect to isless
and isequal
).
An interval is easily created with the ..
operator via the syntax a .. b
. To find if a value is in this range, use the in
function/operator (alternatively spelled ∈
, which can be inserted at the REPL via \in <TAB>
). For example, 3 ∈ 0 .. 10
is true
but 13 ∈ 0 .. 10
is false
.
By default, an interval is inclusive of its endpoints, such that 10 ∈ 0 .. 10
. An endpoint can be excluded via the lessthan
or greaterthan
function, which returns a value almost equal to but slightly less/greater than its input. An interval exclusive of both its endpoints can be expressed as greaterthan(a) .. lessthan(b)
. For example 10 ∉ 0 .. lessthan(10)
.
This package is still young, and could support some more features, such as:
SplitApplyCombine
.missing
, ==
, <
with either a hash- or sort-based index.Interval
s into their own package, potentially reconcile with IntervalSets.jl (which currently uses <=
and >=
for comparisons).RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4