RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://github.com/pandas-dev/pandas/issues/13146 below:

Invalid value encountered in median RuntimeWarning · Issue #13146 · pandas-dev/pandas · GitHub

I just upgraded to 18.1 w/ conda. I started noticing this problem in some notebooks I created before the upgrade but recently revisited for further analysis.

Code Sample, a copy-pastable example if possible

import numpy as np
import pandas as pd
df = pd.DataFrame({'task_complete':['success','success','fail','fail','success','fail','success'],
    'value':[np.nan,4.5,5.7,3.0,np.nan,6.7,3.78]})

df.value.describe() returns a RuntimeWarning from numpy, which then gives this unexpected result for the quantiles:

In [5]: df.value.describe()
/Users/adrianpalacios/anaconda/lib/python3.4/site-packages/numpy/lib/function_base.py:3403: RuntimeWarning: Invalid value encountered in median
  RuntimeWarning)
Out[5]:
count    5.000000
mean     4.736000
std      1.480703
min      3.000000
25%           NaN
50%           NaN
75%           NaN
max      6.700000
Name: value, dtype: float64

Expected Output

I got this using a different conda environment that has not been upgraded to latest pandas version:

In [5]: df.value.describe()
Out[5]:
count    5.000000
mean     4.736000
std      1.480703
min      3.000000
25%      3.780000
50%      4.500000
75%      5.700000
max      6.700000
Name: value, dtype: float64

Dropping the NaN's works in pandas 18.1:

In [9]: df.value.dropna().describe()
Out[9]:
count    5.000000
mean     4.736000
std      1.480703
min      3.000000
25%      3.780000
50%      4.500000
75%      5.700000
max      6.700000
Name: value, dtype: float64

However, this work-around is not a great option when multiple columns w/ NaNs are present:

df2 = pd.DataFrame({'task_complete':['success','success','fail','fail','success','fail','success'],
    'value':[np.nan,4.5,5.7,3.0,np.nan,6.7,3.78],
    'more_values':[8.2,np.nan,np.nan,np.nan,9.4,np.nan,np.nan]
})

In [17]: df2[['value','more_values']].dropna().describe()
Out[17]:
       value  more_values
count    0.0          0.0
mean     NaN          NaN
std      NaN          NaN
min      NaN          NaN
25%      NaN          NaN
50%      NaN          NaN
75%      NaN          NaN
max      NaN          NaN

output of pd.show_versions() INSTALLED VERSIONS

commit: None
python: 3.4.4.final.0
python-bits: 64
OS: Darwin
OS-release: 15.4.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.18.1
nose: 1.3.7
pip: 8.1.1
setuptools: 20.2.2
Cython: 0.22.1
numpy: 1.10.4
scipy: 0.17.0
statsmodels: 0.6.1
xarray: None
IPython: 4.2.0
sphinx: 1.3.1
patsy: 0.3.0
dateutil: 2.4.2
pytz: 2015.7
blosc: None
bottleneck: 1.0.0
tables: 3.2.0
numexpr: 2.5
matplotlib: 1.5.1
openpyxl: 1.8.5
xlrd: 0.9.3
xlwt: 1.0.0
xlsxwriter: 0.7.3
lxml: 3.4.4
bs4: 4.3.2
html5lib: None
httplib2: 0.9.1
apiclient: None
sqlalchemy: 1.0.5
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.38.0
pandas_datareader: None

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4