RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://github.com/scipy/scipy/issues/20544 below:

add array API-support · Issue #20544 · scipy/scipy · GitHub

This issue tracks progress toward the addition of array-API support to scipy.stats functions. The functions listed below look ready for conversion, and I'd be happy to review PRs for them. Priority, balancing the ease and importance of the task, is roughly in the order listed.

moment ( ENH: stats.moment: add array API support #20292)
lmoment
skew ( ENH: stats.skew: add array-API support #20541 - please see this PR as an example roughly similar kurtosis through directional_stats)
kurtosis ( ENH: stats.kurtosis: add array API support #20658)
describe ( ENH: stats.describe: add array API support #20667)
entropy ( ENH: stats.entropy, special.{entr, rel_entr}: add array API support #20673)
variation ( ENH: stats.variation: add array-API support #20647)
sem ( ENH: stats.sem: add array-API support #20631)
kstat ( ENH: stats: add array-API support to kstat/kstatvar #20634)*
kstatvar ( ENH: stats: add array-API support to kstat/kstatvar #20634)*
circmean ( ENH: stats.circ___: add array-API support #20595)
circvar ( ENH: stats.circ___: add array-API support #20595)
circstd ( ENH: stats.circ___: add array-API support #20595)
directional_stats ( ENH: stats: add array API support for directional_stats #20794)
pearsonr ( ENH: stats.pearsonr: add array API support #20284)
ttest_1samp ( ENH: stats.ttest_1samp: add array-API support #20545 - please see this PR as an example for ttest_rel through normaltest)
ttest_rel ( ENH: stats: rewrite ttest_rel in terms of ttest_1samp #20883)
ttest_ind ( ENH: stats.ttest_ind: add array API support #20771)
skewtest ( ENH: stats.skewtest: add array-API support #20597)
kurtosistest ( ENH: stats.kurtosistest: add array API support #20715)
normaltest ( ENH: stats.normaltest/jarque_bera: add array-API support #20736)
jarque_bera ( ENH: stats.normaltest/jarque_bera: add array-API support #20736)
power_divergence ( ENH: stats.chisquare/power_divergence: add array API support #20753)
chisquare ( ENH: stats.chisquare/power_divergence: add array API support #20753)
combine_pvalues ( ENH: stats: add array API support to combine_pvalues #20900)
gstd ( ENH: stats.gstd: add array API support #22455)
ttest_ind_from_stats ( ENH: stats.ttest_ind: add array API support #20771)
alexandergovern ( ENH: stats.alexandergovern: vectorize calculation for n-D arrays #21089)
find_repeats - deprecated in DEP: stats.find_repeats: deprecate function #21157, removed in DEP: stats: remove find_repeats #23023

After that:

Add _SimpleNormal ( ENH: stats: end-to-end array-API support for normality tests #20777)
Add _SimpleChi2 ( ENH: stats: end-to-end array-API support for NHSTs with chi-squared null distribution #20782)
Add _SimpleBeta ( ENH: stats: end-to-end array-API support for NHSTs with beta null distribution #20793)
Add _SimpleStudentT ( ENH: stats: end-to-end array-API support for NHSTs with Student's t null distribution #20884)
Dispatch to array backend stdtrit where possible ( ENH: special/stats: implement xp-compatible stdtrit and use in stats #22222)

I'd like to implement the following using the approach of _masked_array (gh-20363):

tmean ( ENH: stats.tmean: add array API support #20965)
tvar ( ENH: stats.tvar/tstd/tsem: add array API support #21036)
tmin ( ENH: stats.tmin/tmax: add array API support #21028)
tmax ( ENH: stats.tmin/tmax: add array API support #21028)
tstd ( ENH: stats.tvar/tstd/tsem: add array API support #21036)
tsem ( ENH: stats.tvar/tstd/tsem: add array API support #21036)

I left the transformation functions off this list initially, but most of them should be relatively easy.

xp_var ( ENH: stats.xp_var: array-API compatible variance with scipy.stats interface #21034)
zmap ( ENH: stats.zmap/zscore/gzscore: add array API support #21068)
zscore ( ENH: stats.zmap/zscore/gzscore: add array API support #21068)
gzscore ( ENH: stats.zmap/zscore/gzscore: add array API support #21068)
obrientransform ( ENH: stats.obrientransform: add array API support #21055)
boxcox_llf ( ENH: stats.boxcox_llf: add array API support #21097; come back after xp logsumexp is done)
yeojohnson_llf
boxcox_normmax (Can use _chandrupatla or, when it merges, optimize.elementwise.find_root, ENH: optimize.elementwise: vectorized scalar optimization and root finding tools #20800)
yeojohnson_normmax
boxcox
yeojohnson
trim1 (would benefit from an xp.partition, but could use xp.sort) See ENH: stats.quantile: methods to support trimming/Winsorizing #22644.
trimboth (same) See ENH: stats.quantile: methods to support trimming/Winsorizing #22644.
sigmaclip (same)

After that:

add N-D support to _array_api.cov; consider making it public if array API won't offer it Not really necessary. We don't need something very general, so let's not get hung up on it.
linregress: add axis and array API support
expectile: add axis and array API support
ks_2samp: consider natively vectorizing, then adding array API support
mode: consider natively vectorizing (e.g. see ENH: ndimage: majority voting filter #9873 (comment) for implementation), then adding array API support. ( ENH: stats: add array API support to some of _axis_nan_policy decorator #22857)
bartlett: consider natively vectorizing, then adding array API support ( ENH: stats.bartlett: add native axis and array API support #20751)
levene: consider natively vectorizing, then adding array API support
anderson_ksamp: might be able to vectorize, then add array API support
wasserstein_distance: consider natively vectorizing, then adding array API support
energy_distance: consider natively vectorizing, then adding array API support

These functions are held up by rankdata (possibly among other things), which is waiting for improved array-API support. See gh-20639.

kendalltau
mannwhitneyu
wilcoxon
kruskal
cramervonmises_2samp
friedmanchisquare
brunnermunzel
ansari
fligner
mood
spearmanr (also need to re-define axis behavior)
chatterjeexi

These functions need median, quantile, or similar, either directly or via iqr. See data-apis/array-api#795.

So we don't need to wait for the array API, it would be helpful to write an _xp_quantile function using xp.sort (Done. scipy.stats.quantile added in ENH: stats.quantile: add array API compatible quantile function #22352.) I'd like for it to include the following features, which will be useful elsewhere:
- Native axis and nan_policy support. sort will typically push all the NaNs to one end or the other; we can count the finite values in each slice rather than using .shape[axis] to determine the index to take_along_axis (see ENH: stats.rankdata: add array API standard support #20639 for an implementation).
- Improved broadcasting behavior. (The following may not be intelligble without real-time discussion. I just wanted to record the thoughts somewhere.) The NumPy version of quantile accepts a 1D array of probabilities. The specified quantiles are taken for all slices and aligned along a new axis 0 of the output. While convenient in the case that the user wants all quantiles for all slices, this does not follow normal broadcasting rules, and it does not allow for different probabilities for each slice (needed by bootstrap, for example). In similar situations in stats, we would allow for an n-d array of probabilities and follow normal broadcasting rules with the additional requirement that the length of the probabilities array along axis must be 1. (For example, see how ttest_1samp handles broadcasting with popmean.) This allows the use of different probabilities for each slice or computation at all probabilites for all slices, depending on the alignment of the probability array. However, there is an improvement to be made. When keepdims=True, we can relax the rule that the length of the percentiles along axis must be 1, and we can accept an array of percentiles aligned along the dimension(s) specified by axis. The quantile is computed at all of those probabilites for the corresponding slice, and these quantiles are aligned along the axis dimension(s) of the output array. Compared to aligning the percentiles orthogonal to the input sample array, this has the advantage that each slice needs to be sorted (or partitioned) only once rather than once per percentile, and it offers the convenience of the existing NumPy interface. @seberg is this intelligible to you, at least, based on our conversation at the summit?
iqr
siegelslopes
theilslopes
median_test
median_abs_deviation
epps_singleton_2samp
levene (optional)
fligner (optional)
sen_seasonal_slopes

I wrote the following, so I'd prefer to do the upgrades on those personally.

All new distribution infrastructure features
monte_carlo_test ( ENH: stats.monte_carlo_test: add array API support #20604)
permutation_test (accepts RNG; uses random, permuted, and permutation)
bootstrap (accepts RNG; uses randint/integers)
goodness_of_fit (probably not very useful until we can fit distributions with array API)
power (main...mdhaber:scipy:xp_power)
false_discovery_control (uses take_along_axis/put_along_axis)
differential_entropy ( ENH: stats.differential_entropy: add array API support #21076)

Toward gh-22194, we'll be adding a few new functions to scipy.stats, and those should be array API compatible from the start:

quantile ( ENH: stats.quantile: add discontinuous (HF 1-3) and Harrell-Davis methods; add marray support #22505)
something to replace plotting_positions
trim See ENH: stats.quantile: methods to support trimming/Winsorizing #22644.
winsorize See ENH: stats.quantile: methods to support trimming/Winsorizing #22644.
Possibly functions to replace mjci, mquantiles_cimj, and rsh

After all that, it may be worth doing:

CensoredData
ecdf - relies on CensoredData. Might be worth doing after array API standard has diff with prepend, append. I wrote xp_diff in ENH: stats.rankdata: add array API standard support #20639, but it's slow.
logrank - relies on ecdf
bws_test - needs permutation_test
tukey_hsd - probably not too bad, but most of the time is calculating studentized_range SF. Could vectorize computation with _tanhsinh, though.
pointbiserialr - deprecate or implement using shortcut specific to binary data? It's just an alias for pearsonr right now.

I am not interested in working on or reviewing work on the following functions:

bayes_mvs (consider deprecating)
mvsdist (consider deprecating)
weightedtau (consider deprecating)
multiscale_graphcorr (consider deprecating)
tiecorrect (consider deprecating)
ranksums (consider deprecating)
somersd
page_trend_test
f_oneway
frequency statistics
- cumfreq
- percentileofscore
- scoreatpercentile
- relfreq
- binned_statistic
- binned_statistic_2d
- binned_statistic_dd
plot tests
- ppcc_max
- ppcc_plot
- probplot
- boxcox_normplot
- yeojohnson_normplot
Old univariate and multivariate distribution infrastructure (rv_continuous, rv_discrete, rv_histogram, etc.) and distributions

Some of the scipy.stats.contingency functions would be feasible to work on, but some would probably need to be vectorized for it to make sense

relative_risk
expected_freq
margins
chi2_contingency
association
fisher_exact - needs hypergeometric distribution
barnard_exact - uses shgo; probably not a good candidate
boschloo_exact - uses shgo; probably not a good candidate
odds_ratio - I don't think the cost/benefit ratio looks good
crosstab

I don't think these are good candidates for translation.

trim_mean - would benefit from array API partition
binomtest - could probably be made elementwise, but would be a bit of work
quantile_test - probably not bad, but needs binomial distribution functions (and inverses)
shapiro - compiled. I've written the Shapiro test in pure Python, but normal distribution order statistic stuff was computed via numerical integration, and p-value was just monte_carlo_test, so it's probably faster to convert to NumPy, perform the test, and convert back.
anderson - technically not too hard, but there are interface questions to be answered
cramervonmises - probably not too bad once we have array API distributions
ks_1samp - probably not too bad, but we need array API distributions and array API null distribution CDF/SF
kstest - dispatches to ks_1samp and ks_2samp; easy once those are done!
poisson_means_test - theoretically this could be an elementwise function, but implementing would be tricky because scalar arguments are used to create arange which would naturally lead to ragged arrays
dunnett - statistic is easy to vectorize; p-value is not.
scipy.stats.qmc - mostly compiled
sobol_indices - relies on scipy.stats.qmc
scipy.stats.sampling - all compiled. Long-term goal: rewrite vectorized versions in terms of scipy.interpolate.
scipy.stats.mstats - deprecate; accept marrays in regular stats functions.
fit. Relies heavily on differential_evolution, which would need array API support first.
wasserstein_distance_nd. Uses linear programming. Unlikely to get efficient array API support any time soon.
gaussian_kde. There could be an efficient array API implementation in terms of multivariate_normal if Covariance gets support for batch covariance matrices, which was drafted in ENH: stats.multivariate: introduce Covariance class and subclasses mdhaber/scipy#88. But it would be a complete rewrite.

rgommers and fancidevlucascolleytupui and arkanoid87

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4