A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://github.com/scipy/scipy/issues/22194 below:

sunsetting `scipy.stats.mstats` · Issue #22194 · scipy/scipy · GitHub

scipy.stats.mstats is a mostly-separate re-implementation of scipy.stats with support for NumPy masked arrays. Masked values are treated as missing: for 1-D slices, the result is typically the same as if the masked value were not present.

While there seems to be demand for statistical functions to support missing values, I'd suggest that having two separate implementations of these functions is not the best way to satisfy the need.

I have seen the opinion that we can combine the implementations but must maintain a separate scipy.stats.mstats namespace. While this does not double the workload, maintaining two interfaces is more work than maintaining one. For instance, many scipy.stats.mstats are missing "Returns" (#22065 (comment)) and "Examples" (gh-7168) sections of their documentation. Also, having separate interfaces for essentially identical functionality is unnecessarily complicated for users.

I see two other reasons why a namespace should not be devoted to NumPy masked arrays.

Fortunately, many scipy.stats functions already offer the same functionality as their scipy.stats.mstats counterparts, making the separate namespace redundant. There are actually two obvious ways 2 to ignore missing values in most scipy.stats functions with a scipy.stats.mstats counterpart:

Both of these avoid a common pitfall of NumPy masked arrays, which mask non-finite values that arise during calculations. This behavior is problematic because NaNs and infinities should not always be treated the same as missing data.

Update April 2025: The specific plan suggested here has changed; see #22194 (comment) for an update.

Here is the proposed alternative:

Looking beyond this, I would also suggest that as scipy.stats functions are translated to use the Python Array API, they can also be adapted to natively support marray, which add masks to any Python Array API compatible backend. In most cases, the only special consideration for MArrays is that the count of non-masked elements along axis should be used in place of the length of the array along axis.

Closing this will close gh-5474

  1. If the scipy.stats version did not already support masked arrays - but many do. (Addressed below.)

  2. Ideally, nan_policy='omit' could also be eliminated, and the same behavior could be achieved by passing an MArray (discussed below) to the function. MArrays do not automatically mask non-finite values that arise during calculations.

j-bowhay, h-vetinari, dschmitz89, lucascolley, tupui and 2 moretupuistefanv


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4