A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://github.com/pandas-dev/pandas/issues/14784 below:

na_position doesn't work for sort_index() with MultiIndex · Issue #14784 · pandas-dev/pandas · GitHub

Code Sample, a copy-pastable example if possible
In [1]: import numpy as np

In [2]: import pandas as pd

In [3]: pd.__version__
Out[3]: u'0.19.0'

In [4]: mi = pd.MultiIndex.from_tuples([[1, 1, 3], [1, 1, np.nan]], names=list('ABC'))

In [5]: df = pd.DataFrame([[1, 2], [3, 4]], mi)

In [6]: df.sort_index(na_position="first")
Out[6]: 
         0  1
A B C        
1 1 NaN  3  4
    3    1  2

In [7]: df.sort_index(na_position="last")
Out[7]: 
         0  1
A B C        
1 1 NaN  3  4
    3    1  2
Problem description

The na_position argument isn't used in DataFrame.sort_index() or Series.sort_index() due to the way we sort the MultiIndex. Whenever we create a MultiIndex, we store the labels as relative values. For instance, if we have the following MultiIndex:

MultiIndex.from_tuples([[1, 1, 3], [1, 1, np.nan]], names=list('ABC'))

the values get stored as

MultiIndex(levels=[[1], [1], [3]],
           labels=[[0, 0], [0, 0], [0, -1]],
           names=[u'A', u'B', u'C'])

with a NaN placeholder of -1.

These label values are what get passed to the sorting algorithm for both DataFrames and Series. Since the sorting only happens on the labels, it has no notion of the NaN.

This has been discussed in #14015 and #14672 .

My original naive solution was to change these lines from:

indexer = _lexsort_indexer(labels.labels, orders=ascending,
                                            na_position=na_position)

to

index_values_list = np.dstack(labels.get_values())[0].tolist()
indexer = _lexsort_indexer(index_values_list, orders=ascending,
                                           na_position=na_position)

This didn't break any tests, but it isn't necessarily the best approach.

Expected Output
In [7]: df.sort_index(na_position="last")
Out[7]: 
         0  1
A B C        
1 1 3    1  2
    NaN  3  4
Output of pd.show_versions() In [3]: pd.show_versions() INSTALLED VERSIONS

commit: None
python: 2.7.12.final.0
python-bits: 64
OS: Linux
OS-release: 3.16.0-77-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: None.None

pandas: 0.19.0
nose: 1.3.4
pip: 9.0.0
setuptools: 27.2.0
Cython: 0.21
numpy: 1.11.2
scipy: 0.16.1
statsmodels: 0.6.1
xarray: None
IPython: 4.0.0
sphinx: 1.2.3
patsy: 0.3.0
dateutil: 2.5.3
pytz: 2016.7
blosc: None
bottleneck: None
tables: 3.1.1
numexpr: 2.3.1
matplotlib: 1.5.0
openpyxl: 1.8.5
xlrd: 0.9.3
xlwt: 0.7.5
xlsxwriter: 0.5.7
lxml: 3.4.0
bs4: 4.3.2
html5lib: None
httplib2: 0.9.2
apiclient: 1.5.5
sqlalchemy: 0.9.7
pymysql: None
psycopg2: 2.6.1 (dt dec pq3 ext lo64)
jinja2: 2.7.3
boto: 2.32.1
pandas_datareader: None


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4