Towards gh-18867
This issue tracks progress toward the addition of array-API support to scipy.stats
functions. The functions listed below look ready for conversion, and I'd be happy to review PRs for them. Priority, balancing the ease and importance of the task, is roughly in the order listed.
moment
( ENH: stats.moment: add array API support #20292)lmoment
skew
( ENH: stats.skew: add array-API support #20541 - please see this PR as an example roughly similar kurtosis
through directional_stats
)kurtosis
( ENH: stats.kurtosis: add array API support #20658)describe
( ENH: stats.describe: add array API support #20667)entropy
( ENH: stats.entropy, special.{entr, rel_entr}: add array API support #20673)variation
( ENH: stats.variation: add array-API support #20647)sem
( ENH: stats.sem: add array-API support #20631)kstat
( ENH: stats: add array-API support to kstat/kstatvar #20634)*kstatvar
( ENH: stats: add array-API support to kstat/kstatvar #20634)*circmean
( ENH: stats.circ___
: add array-API support #20595)circvar
( ENH: stats.circ___
: add array-API support #20595)circstd
( ENH: stats.circ___
: add array-API support #20595)directional_stats
( ENH: stats: add array API support for directional_stats
#20794)pearsonr
( ENH: stats.pearsonr: add array API support #20284)ttest_1samp
( ENH: stats.ttest_1samp: add array-API support #20545 - please see this PR as an example for ttest_rel
through normaltest
)ttest_rel
( ENH: stats: rewrite ttest_rel
in terms of ttest_1samp
#20883)ttest_ind
( ENH: stats.ttest_ind
: add array API support #20771)skewtest
( ENH: stats.skewtest: add array-API support #20597)kurtosistest
( ENH: stats.kurtosistest: add array API support #20715)normaltest
( ENH: stats.normaltest/jarque_bera: add array-API support #20736)jarque_bera
( ENH: stats.normaltest/jarque_bera: add array-API support #20736)power_divergence
( ENH: stats.chisquare/power_divergence: add array API support #20753)chisquare
( ENH: stats.chisquare/power_divergence: add array API support #20753)combine_pvalues
( ENH: stats: add array API support to combine_pvalues #20900)gstd
( ENH: stats.gstd: add array API support #22455)ttest_ind_from_stats
( ENH: stats.ttest_ind
: add array API support #20771)alexandergovern
( ENH: stats.alexandergovern
: vectorize calculation for n-D arrays #21089)find_repeats
- deprecated in DEP: stats.find_repeats
: deprecate function #21157, removed in DEP: stats: remove find_repeats #23023After that:
stats._xp_mean
, an array API compatible mean
with weights
and nan_policy
#20743)gmean
( ENH: stats.gmean: add array API support #20946)hmean
( MAINT: stats.hmean/pmean: simplify prior to array API conversion #20954, DOC/MAINT: stats.gmean/gstd/hmean/pmean: document/treat invalid input consistently #20962, ENH: stats.hmean/pmean: add array API support #21035)pmean
( MAINT: stats.hmean/pmean: simplify prior to array API conversion #20954, DOC/MAINT: stats.gmean/gstd/hmean/pmean: document/treat invalid input consistently #20962, ENH: stats.hmean/pmean: add array API support #21035)After that:
_SimpleNormal
( ENH: stats: end-to-end array-API support for normality tests #20777)_SimpleChi2
( ENH: stats: end-to-end array-API support for NHSTs with chi-squared null distribution #20782)_SimpleBeta
( ENH: stats: end-to-end array-API support for NHSTs with beta null distribution #20793)_SimpleStudentT
( ENH: stats: end-to-end array-API support for NHSTs with Student's t null distribution #20884)stdtrit
where possible ( ENH: special
/stats
: implement xp-compatible stdtrit
and use in stats
#22222)I'd like to implement the following using the approach of _masked_array
(gh-20363):
tmean
( ENH: stats.tmean
: add array API support #20965)tvar
( ENH: stats.tvar/tstd/tsem: add array API support #21036)tmin
( ENH: stats.tmin/tmax: add array API support #21028)tmax
( ENH: stats.tmin/tmax: add array API support #21028)tstd
( ENH: stats.tvar/tstd/tsem: add array API support #21036)tsem
( ENH: stats.tvar/tstd/tsem: add array API support #21036)I left the transformation functions off this list initially, but most of them should be relatively easy.
xp_var
( ENH: stats.xp_var
: array-API compatible variance with scipy.stats
interface #21034)zmap
( ENH: stats.zmap
/zscore
/gzscore
: add array API support #21068)zscore
( ENH: stats.zmap
/zscore
/gzscore
: add array API support #21068)gzscore
( ENH: stats.zmap
/zscore
/gzscore
: add array API support #21068)obrientransform
( ENH: stats.obrientransform: add array API support #21055)boxcox_llf
( ENH: stats.boxcox_llf
: add array API support #21097; come back after xp
logsumexp
is done)yeojohnson_llf
boxcox_normmax
(Can use _chandrupatla
or, when it merges, optimize.elementwise.find_root
, ENH: optimize.elementwise: vectorized scalar optimization and root finding tools #20800)yeojohnson_normmax
boxcox
yeojohnson
trim1
(would benefit from an xp.partition
, but could use xp.sort
) See ENH: stats.quantile: methods to support trimming/Winsorizing #22644.trimboth
(same) See ENH: stats.quantile: methods to support trimming/Winsorizing #22644.sigmaclip
(same)After that:
_array_api.cov
; consider making it public if array API won't offer it Not really necessary. We don't need something very general, so let's not get hung up on it.linregress
: add axis
and array API supportexpectile
: add axis
and array API supportks_2samp
: consider natively vectorizing, then adding array API supportmode
: consider natively vectorizing (e.g. see ENH: ndimage: majority voting filter #9873 (comment) for implementation), then adding array API support. ( ENH: stats: add array API support to some of _axis_nan_policy
decorator #22857)bartlett
: consider natively vectorizing, then adding array API support ( ENH: stats.bartlett: add native axis
and array API support #20751)levene
: consider natively vectorizing, then adding array API supportanderson_ksamp
: might be able to vectorize, then add array API supportwasserstein_distance
: consider natively vectorizing, then adding array API supportenergy_distance
: consider natively vectorizing, then adding array API supportThese functions are held up by rankdata
(possibly among other things), which is waiting for improved array-API support. See gh-20639.
kendalltau
mannwhitneyu
wilcoxon
kruskal
cramervonmises_2samp
friedmanchisquare
brunnermunzel
ansari
fligner
mood
spearmanr
(also need to re-define axis
behavior)chatterjeexi
These functions need median
, quantile
, or similar, either directly or via iqr
. See data-apis/array-api#795.
_xp_quantile
function using xp.sort
(Done. scipy.stats.quantile
added in ENH: stats.quantile: add array API compatible quantile function #22352.) I'd like for it to include the following features, which will be useful elsewhere:
axis
and nan_policy
support. sort
will typically push all the NaNs to one end or the other; we can count the finite values in each slice rather than using .shape[axis]
to determine the index to take_along_axis
(see ENH: stats.rankdata: add array API standard support #20639 for an implementation).quantile
accepts a 1D array of probabilities. The specified quantiles are taken for all slices and aligned along a new axis 0
of the output. While convenient in the case that the user wants all quantiles for all slices, this does not follow normal broadcasting rules, and it does not allow for different probabilities for each slice (needed by bootstrap
, for example). In similar situations in stats
, we would allow for an n-d array of probabilities and follow normal broadcasting rules with the additional requirement that the length of the probabilities array along axis
must be 1. (For example, see how ttest_1samp
handles broadcasting with popmean
.) This allows the use of different probabilities for each slice or computation at all probabilites for all slices, depending on the alignment of the probability array. However, there is an improvement to be made. When keepdims=True
, we can relax the rule that the length of the percentiles along axis
must be 1, and we can accept an array of percentiles aligned along the dimension(s) specified by axis
. The quantile is computed at all of those probabilites for the corresponding slice, and these quantiles are aligned along the axis
dimension(s) of the output array. Compared to aligning the percentiles orthogonal to the input sample array, this has the advantage that each slice needs to be sorted (or partitioned) only once rather than once per percentile, and it offers the convenience of the existing NumPy interface. @seberg is this intelligible to you, at least, based on our conversation at the summit?iqr
siegelslopes
theilslopes
median_test
median_abs_deviation
epps_singleton_2samp
levene
(optional)fligner
(optional)sen_seasonal_slopes
I wrote the following, so I'd prefer to do the upgrades on those personally.
monte_carlo_test
( ENH: stats.monte_carlo_test: add array API support #20604)permutation_test
(accepts RNG; uses random
, permuted
, and permutation
)bootstrap
(accepts RNG; uses randint
/integers
)goodness_of_fit
(probably not very useful until we can fit distributions with array API)power
(main...mdhaber:scipy:xp_power)false_discovery_control
(uses take_along_axis
/put_along_axis
)differential_entropy
( ENH: stats.differential_entropy
: add array API support #21076)Toward gh-22194, we'll be adding a few new functions to scipy.stats
, and those should be array API compatible from the start:
marray
support #22505)plotting_positions
trim
See ENH: stats.quantile: methods to support trimming/Winsorizing #22644.winsorize
See ENH: stats.quantile: methods to support trimming/Winsorizing #22644.mjci
, mquantiles_cimj
, and rsh
After all that, it may be worth doing:
CensoredData
ecdf
- relies on CensoredData
. Might be worth doing after array API standard has diff
with prepend, append. I wrote xp_diff
in ENH: stats.rankdata: add array API standard support #20639, but it's slow.logrank
- relies on ecdf
bws_test
- needs permutation_test
tukey_hsd
- probably not too bad, but most of the time is calculating studentized_range
SF. Could vectorize computation with _tanhsinh
, though.pointbiserialr
- deprecate or implement using shortcut specific to binary data? It's just an alias for pearsonr
right now.I am not interested in working on or reviewing work on the following functions:
bayes_mvs
(consider deprecating)mvsdist
(consider deprecating)weightedtau
(consider deprecating)multiscale_graphcorr
(consider deprecating)tiecorrect
(consider deprecating)ranksums
(consider deprecating)somersd
page_trend_test
f_oneway
cumfreq
percentileofscore
scoreatpercentile
relfreq
binned_statistic
binned_statistic_2d
binned_statistic_dd
ppcc_max
ppcc_plot
probplot
boxcox_normplot
yeojohnson_normplot
rv_continuous
, rv_discrete
, rv_histogram
, etc.) and distributionsSome of the scipy.stats.contingency
functions would be feasible to work on, but some would probably need to be vectorized for it to make sense
relative_risk
expected_freq
margins
chi2_contingency
association
fisher_exact
- needs hypergeometric distributionbarnard_exact
- uses shgo
; probably not a good candidateboschloo_exact
- uses shgo
; probably not a good candidateodds_ratio
- I don't think the cost/benefit ratio looks goodcrosstab
I don't think these are good candidates for translation.
trim_mean
- would benefit from array API partition
binomtest
- could probably be made elementwise, but would be a bit of workquantile_test
- probably not bad, but needs binomial distribution functions (and inverses)shapiro
- compiled. I've written the Shapiro test in pure Python, but normal distribution order statistic stuff was computed via numerical integration, and p-value was just monte_carlo_test
, so it's probably faster to convert to NumPy, perform the test, and convert back.anderson
- technically not too hard, but there are interface questions to be answeredcramervonmises
- probably not too bad once we have array API distributionsks_1samp
- probably not too bad, but we need array API distributions and array API null distribution CDF/SFkstest
- dispatches to ks_1samp
and ks_2samp
; easy once those are done!poisson_means_test
- theoretically this could be an elementwise function, but implementing would be tricky because scalar arguments are used to create arange
which would naturally lead to ragged arraysdunnett
- statistic is easy to vectorize; p-value is not.scipy.stats.qmc
- mostly compiledsobol_indices
- relies on scipy.stats.qmc
scipy.stats.sampling
- all compiled. Long-term goal: rewrite vectorized versions in terms of scipy.interpolate
.scipy.stats.mstats
- deprecate; accept marray
s in regular stats functions.fit
. Relies heavily on differential_evolution
, which would need array API support first.wasserstein_distance_nd
. Uses linear programming. Unlikely to get efficient array API support any time soon.gaussian_kde
. There could be an efficient array API implementation in terms of multivariate_normal
if Covariance
gets support for batch covariance matrices, which was drafted in ENH: stats.multivariate: introduce Covariance
class and subclasses mdhaber/scipy#88. But it would be a complete rewrite.rgommers and fancidevlucascolleytupui and arkanoid87
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4