Background
As an admission to performance and ease of construction, most of our sparse matrix formats admit duplicate entries. For example:
>>> foo = coo_matrix(([4,5], ([0,0],[1,1])), shape=(2,2))
>>> foo.nnz
2
>>>foo.A
array([[0, 9],
[0, 0]])
Unfortunately, this causes a lot of problems for sparse matrix operations (#4409, #5394, #5806), and causes confusion regarding the true meaning of nnz
, even ignoring the issue of explicit zeros (#3343).
Most sparse matrix formats have a method sum_duplicates()
which operates in-place to canonicalize the internal storage, but it's unclear whether other methods are allowed to call this without first making a copy (see #5741 (comment)).
Proposal
I think that allowing duplicate entries in internal sparse matrix representations was an API mistake, but now that we allow it, it's difficult to disallow without breaking lots of existing user code. Therefore, I propose that we make it a policy that:
sum_duplicates()
triggers in-place canonicalization, a SparseEfficiencyWarning
is thrown, to alert the user that something potentially unexpected is going on.If this gets traction here, I'll send it along to the Scipy-dev mailing list as well. Thoughts?
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4