RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://github.com/scipy/scipy/issues/5807 below:

establish an official policy for duplicate elements · Issue #5807 · scipy/scipy · GitHub

Background
As an admission to performance and ease of construction, most of our sparse matrix formats admit duplicate entries. For example:

>>> foo = coo_matrix(([4,5], ([0,0],[1,1])), shape=(2,2))
>>> foo.nnz
2
>>>foo.A
array([[0, 9],
       [0, 0]])

Unfortunately, this causes a lot of problems for sparse matrix operations (#4409, #5394, #5806), and causes confusion regarding the true meaning of nnz, even ignoring the issue of explicit zeros (#3343).

Most sparse matrix formats have a method sum_duplicates() which operates in-place to canonicalize the internal storage, but it's unclear whether other methods are allowed to call this without first making a copy (see #5741 (comment)).

Proposal
I think that allowing duplicate entries in internal sparse matrix representations was an API mistake, but now that we allow it, it's difficult to disallow without breaking lots of existing user code. Therefore, I propose that we make it a policy that:

Duplicate entries are not preserved. That is, it's okay to canonicalize in-place.
Whenever a method other than sum_duplicates() triggers in-place canonicalization, a SparseEfficiencyWarning is thrown, to alert the user that something potentially unexpected is going on.
The presence/lack of duplicate entries is remembered with a boolean flag, which we will document and encourage users to toggle if they manually modify a sparse matrix's internal members.

If this gets traction here, I'll send it along to the Scipy-dev mailing list as well. Thoughts?

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4