I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.
import pandas as pd import numpy as np a = pd.Series(np.random.normal(scale=0.1, size=(1_000_000,)).astype(np.float32)).pow(2) assert isinstance(np.mean(a), float) assert isinstance(np.mean(a.values), np.float32) assert abs(1 - np.mean(a)/np.mean(a.values)) > 4e-4Problem description
pd.DataFrame.mean
/pd.Series.mean
/np.mean(pd.Series)
outputs a Python float instead of a numpy float. Since np.mean(pd.Series.values)
does return an np float, I'm assuming for now that this should be fixed in pandasdtype==np.float32
, then calling mean
on a pandas object gives a significantly different result vs calling mean
on the underlying numpy ndarray.The output of np.mean(a)
should be the same as np.mean(a.values)
.
additional tests
# both b and c ~1e-2 b = a.mean() # the pandas impl of mean assert isinstance(b, float) # PYTHON float, not numpy float? Ergo implicit f64 h = np.mean(a) assert isinstance(h, float) assert h == b c = a.values.mean() # the numpy impl of mean assert isinstance(c, np.float32) # as exprected print('\nerrors between pandas mean and numpy mean') print(f'relative error: {abs(1-b/c):.3e}') # ~ 5e-4 print(f'absolute error: {abs(b -c):.3e}') # ~ 5e-6 print(f'relative error after casting: {abs(1-np.float32(b)/c):.3e}') # ~ 5e-4 print(f'absolute error after casting: {abs(np.float32(b) -c):.3e}') # ~ 5e-6 d = a.sum() / len(a) assert isinstance(d, np.float64) # expected, because division. Note `sum` returns an np.float32 e = a.values.sum() / len(a) assert isinstance(e, np.float64) # expected, because division # these methods are equivalent assert d==e # and up to f32 precision equal to the numpy impl assert d.astype(np.float32) == c # the cherry on the cake f = a.astype(np.float64).mean() assert isinstance(f, float) # still not ideal, should be np.float64 g = a.astype(np.float64).values.mean() print('\nrelative error between pandas f64 mean and numpy f64 mean') print(f'relative error numpy f64/pandas f64: {abs(1-g/f):.3e}') # ~ 1e-14 -- 1e-16, not bad but I would have expected equality print('\nerrors between pandas f64 mean and numpy/pandas f32 mean') print(f'relative error pandas f32/pandas f64: {abs(1-b/f):.3e}') # ~ 5e-4 print(f'absolute error numpy f32/pandas f64: {abs(1-c/f):.3e}') # ~ 1e-7 -- 1e-9 # finally... h = np.mean(a) assert isinstance(h, float) assert h == b
output
errors between pandas mean and numpy mean relative error: 5.210e-04 absolute error: 5.204e-06 relative error after casting: 5.210e-04 absolute error after casting: 5.204e-06 relative error between pandas f64 mean and numpy f64 mean relative error numpy f64/pandas f64: 1.066e-14 errors between pandas f64 mean and numpy/pandas f32 mean relative error pandas f32/pandas f64: 5.214e-04 absolute error numpy f32/pandas f64: 2.399e-07Output of
pd.show_versions()
INSTALLED VERSIONS
commit : c7f7443
python : 3.8.3.final.0
python-bits : 64
OS : Linux
OS-release : 5.4.0-80-generic
Version : #90-Ubuntu SMP Fri Jul 9 22:49:44 UTC 2021
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.3.1
numpy : 1.21.1
pytz : 2021.1
dateutil : 2.8.1
pip : 21.1.1
setuptools : 52.0.0.post20210125
Cython : 0.29.23
pytest : 6.2.3
hypothesis : None
sphinx : 4.0.1
blosc : None
feather : None
xlsxwriter : 1.3.8
lxml.etree : 4.6.3
html5lib : 1.1
pymysql : None
psycopg2 : 2.8.6 (dt dec pq3 ext lo64)
jinja2 : 3.0.0
IPython : 7.22.0
pandas_datareader: None
bs4 : 4.9.3
bottleneck : 1.3.2
fsspec : 0.9.0
fastparquet : None
gcsfs : None
matplotlib : 3.3.4
numexpr : 2.7.3
odfpy : None
openpyxl : 3.0.7
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : 1.6.2
sqlalchemy : 1.4.15
tables : 3.6.1
tabulate : None
xarray : None
xlrd : 2.0.1
xlwt : 1.3.0
numba : 0.51.2
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4