Core module is the crucial module in OpenCV. All other modules depend on it.
It must form a solid foundation for the future-proof OpenCV.
What are the desirable key properties of the module:
OpenCV's Core module in its current form has been created in ~2009, where cv::Mat, a multi-dimensional dense array, has been introduced as a complete replacement for CvMat, CvMatND and IplImage. The whole OpenCV API has been reconstructed (before 2009 it was a C API) around this cv::Mat and a few other basic structures like std::vector<> (to handle point clouds etc.). The idea behind combinding image, matrix and multi-dimensional array (tensor) has been borrowed from Matlab, where toolboxes, including image processing toolbox, basic linear algebra toolbox, Jean-Yves Bouguet camera calibration toolbox etc. all happily use Matlab matices and so it's super-easy to create pipelines that use algorithms from different areas.
It seems that Python's famous numerical extension numpy borrowed the same idea and also implemented ubiquitous matrix/array type called ndarray there. On top of numpy some bigger packages have been developed like scipy, scikit-learn etc. An efficient and yet quite comprehensive set of basic operations extended by the derived packages mostly eliminated the problem of very low speed of manually-written Python code (because all the kernels in numpy are implemented in efficient C or Fortran). That suddenly made Python a sound substitution for Fortran & Matlab in the new century.
With the rise of Deep Learning the idea has been greatly extended. Efficient, GPU-accelerated, comprehensive set of operations (very similar to numpy) that can be put together into graphs, together with automatic differentiation tools, formed the foundation of the modern Deep Learning technology. If one looks at PyTorch, Tensorflow, JAX, ONNX specification etc., he/she will find many similarities with numpy. In particular, many ONNX operations follow numpy quite closely and use numpy for illustration of those operations. Of course, there are some deep learning-specific operations like Convolution or SoftMax or Dropout or Attention, but most of ONNX operations have numpy counterparts.
Python community (since all aforementioned frameworks, except for OpenCV, mainly use Python) noticed this close resemblance of many array processing frameworks and decided to introduce so called Python array API standard. It's clear that this is emerging standard, as its API lacks some important numpy functionality, some important PyTorch/ONNX operations, it lacks the notion of an external accelerator (like GPU or NPU) where user may want to transfer array/tensor to, perform a set of operations there and transfer the results back. This is crucial functionality for deep learning frameworks, for OpenCV, its deep learning module and its GPU-accelerated image processing functionality etc. So the standard will definitely evolve, but it makes sense for us in OpenCV 5 to comply with it more or less even now. Besides implementation of already specified API, for us it's opportunity to offer extra kernels to the community that are important for computer vision and image processing use cases.
The list of functions to implement/improve in Core module in OpenCV 5.0Basically, OpenCV's core module should implement a big subset of "Python array API standard" with certain extensions that we consider useful.
The list of functions below has been directly copied from https://data-apis.org/array-api/latest/API_specification/index.html. Probably, the following content should be presented in a table.
Unary/binary arithmetic, math and logic operations. We have implementation of many of those operations already (sometimes under slightly different names), but we need to support broadcasting for binary operations.
abs // cv::absdiff with 0 as a second parameter
acos
acosh // via cv::log()
add // cv::add
asin
asinh // via cv::log()
atan
atan2 // cv::polarToCart
atanh // via cv::log()
bitwise_and // cv::bitwise_and
bitwise_left_shift
bitwise_invert // cv::bitwise_not
bitwise_or // cv::bitwise_or
bitwise_right_shift
bitwise_xor // cv::bitwise_xor
ceil
conj
cos // cv::cartToPolar
cosh // via cv::exp()
divide // cv::divide
equal // cv::compare(..., CMP_EQ)
exp // cv::exp
expm1
floor
floor_divide
greater // cv::compare(..., CMP_GT)
greater_equal // cv::compare(..., CMP_GE)
imag
isfinite
isinf
isnan
less // cv::compare(..., CMP_LT)
less_equal // cv::compare(..., CMP_LE)
log // cv::log
log1p
log2
log10
logaddexp
logical_and
logical_not
logical_or
logical_xor
multiply // cv::multiply
negative
not_equal // compare(...,CMP_NE)
positive // copyTo()
pow // cv::pow
real
remainder
round // convertTo()
sign
sin // only cartToPolar
sinh // via cv::exp()
square // via cv::multiply()
sqrt // cv::sqrt()
subtract // cv::subtract()
tan
tanh // no direct function. Can be computed via cv::exp()
trunc
Linear algebra functions. Same situation.
matmul // cv::gemm()
matrix_transpose // cv::transpose()
tensordot
vecdot
Array permutation functions. Same situation:
broadcast_arrays
broadcast_to // +
concat // in OpenCV we have 2D hconcat and vconcat
expand_dims
flip // 2D only for now
permute_dims // in cv::dnn we have general Transpose. in core we have 2D transpose
reshape
roll
squeeze
stack
Statistical functions. Same situation:
max // minMaxIdx() computes min, max and their indices.
mean // cv::mean
min // via minMaxIdx()
prod
std // cv::meanStdDev() computes both mean and standard deviation
sum // cv::sum
var // via cv::meanStdDev()
Misc functions from several other groups. Mostly implemented as well in one form or another:
// searching functions
argmax
argmin
nonzero // ~ cv::countNonZero()
where // element-wise ternary operator ?:
// set functions
unique_all
unique_counts
unique_inverse
unique_values
// sorting
argsort // called cv::sortIdx() in OpenCV
sort // cv::sort()
// utility functions
all // as cv::countNonZero(m) == m.total()
any // cv::hasNonZero
// initialization functions
arange
asarray // many non-mat array can be converted to cv::Mat using getMat().
// in Python bindings Mat is implicitly constructed from ndarray and vice versa
empty // Mat()
empty_like
eye // Mat::eye()
from_dlpack
full
full_like
linspace
meshgrid
ones // Mat::ones()
ones_like
tril
triu
zeros // Mat::zeros()
zeros_like
Some useful extra operations not included into "Python array API standard", but included into numpy and/or ONNX specifications:
einsum // already implemented in cv::dnn
einops.* // a family of operations from excellent einops package:
// https://github.com/arogozhnikov/einops
reduce(..., sum|min|max|avg|...) // in core we already have 2D reduce(),
// need to extend it to ND, as in ONNX or cv::dnn
Also, in Core we already have a bunch of functions implemented in numpy, but missing in the standard, like various matrix decomposition and backward substitution algorithms (LU, Cholesky, SVD, QR), FFT etc.
As you can see, many of the operations are already implemented in Core or in DNN module.
What needs to be done basically:A op A
and A op scalar
) and fully support broadcasting in dnn. Need to merge dnn implementation into core.Efficient, high-quality implementations of basic array processing functions will allow us to reduce code duplication and use those functions in DNN module and maybe more efficiently implement higher-level image processing algorithms in imgproc, photo and maybe other modules.
We can introduce more or less future-proof HAL for vendors who would like to accelerate OpenCV 5+. They will see that we ask for the same API (at least at semantic level) as the whole Python+numpy+PyTorch+... community, which is a huge number of people, many companies.
The goal for OpenCV 5 is to introduce not just the new CPU HAL, but also non-CPU HAL. All above-mentioned functions should be able to use such a HAL. And then all the functionality that is built on top of this basic API (which we and community will gradually extend) will automatically run on GPU or other HAL-supporting accelerators. See a dedicated feature requests ( New CPU HAL for OpenCV 5.0 #25019, Introducing non-CPU HAL for OpenCV 5+ #25025) where this HAL is described.
lcnittl, ChrisJones79, timsurber, SLchowis and lin72hcrackwitz and lin72h
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4