originally #5190
xref #9816
xref #3942
This issue is for creating a unified API to Series & DataFrame sorting methods. Panels are not addressed (yet) but a unified API should be easy to extend to them. Related are #2094, #5190, #6847, #7121, #2615. As discussion proceeds, this post will be edited.
For reference, the 0.14.1 signatures are:
Series.sort(axis=0, ascending=True, kind='quicksort', na_position='last', inplace=True) Series.sort_index(ascending=True) Series.sortlevel(level=0, ascending=True, sort_remaining=True) DataFrame.sort(columns=None, axis=0, ascending=True, inplace=False, kind='quicksort', na_position='last') DataFrame.sort_index(axis=0, by=None, ascending=True, inplace=False, kind='quicksort', na_position='last') DataFrame.sortlevel(level=0, axis=0, ascending=True, inplace=False, sort_remaining=True)
Proposed unified signature for Series.sort
and DataFrame.sort
(except Series version retains current inplace=True):
def sort(self, by=None, axis=0, level=None, ascending=True, inplace=False, kind='quicksort', na_position='last', sort_remaining=True): """Sort by labels (along either axis), by the values in column(s) or both. If both, labels take precedence over columns. If neither is specified, behavior is object-dependent: series = on values, dataframe = on index. Parameters ---------- by : column name or list of column names if not None, sort on values in specified column name; perform nested sort if list of column names specified. this argument ignored by series axis : {0, 1} sort index/rows (0) or columns (1); for Series, only 0 can be specified level : int or level name or list of ints or list of column names if not None, sort on values in specified index level(s) ascending : bool or list of bool Sort ascending vs. descending. Specify list for multiple sort orders. inplace : bool if True, perform operation in-place (without creating new instance) kind : {‘quicksort’, ‘mergesort’, ‘heapsort’} Choice of sorting algorithm. See np.sort for more information. ‘mergesort’ is the only stable algorithm. For data frames, this option is only applied when sorting on a single column or label. na_position : {'first', 'last'} ‘first’ puts NaNs at the beginning, ‘last’ puts NaNs at the end sort_remaining : bool if true and sorting by level and index is multilevel, sort by other levels too (in order) after sorting by specified level """
The sort_index
signatures change too and sort_columns
is created:
Series.sort_index(level=0, ascending=True, inplace=False, kind='quicksort', na_position='last', sort_remaining=True) DataFrame.sort_index(level=0, axis=0, by=None, ascending=True, inplace=False, kind='quicksort', na_position='last', sort_remaining=True) # by is DEPRECATED, see change 7 DataFrame.sort_columns(by=None, level=0, ascending=True, inplace=False, kind='quicksort', na_position='last', sort_remaining=True) # or maybe level=None
Proposed changes:
inplace=False
default (changes Series.sort
)maybe, possibly in 1.0by
argument to accept column-name/list-of-column-names in first position
columns
keyword of DataFrame.sort
, replaced with by
(df.sort signature would need to retain columns keyword until finally removed but it's not shown in proposal)columns
arg of DataFrame.sort
allows tuples); use new level
argument insteadby
/axis
in DataFrame.sort_index
(see change 7)axis
is too so for the sake of working with dataframes, it gets first positionlevel
argument to accept integer/level-name/list-of-ints/list-of-level-names for sorting (multi)index by particular level(s)
columns
arg of DataFrame.sort
level
argument to sort_index
in first position so level(s) of multilevel index can be specified; this makes sort_index
==sortlevel
(see change 8)sort_remaining
arg to handle multi-level indexesDataFrame.sort_columns
==sort(axis=1)
(see syntax below)Series.order
since change 1 makes Series.sort
equivalent (?)inplace
, kind
, and na_position
arguments to Series.sort_index
(to match DataFrame.sort_index
); by
and axis
args are not added since they don't make sense for seriesby
argument from DataFrame.sort_index
since it makes sort_index
equivalent to sort
sortlevel
since change 3b makes sort_index
equivalentNotes:
sort
is still object-dependent: for series, sorts by values and for data frames, sorts by indexlevel
arg makes sort_index
and sortlevel
equivalent. if sortlevel is retained:
sortlevel
to sort_level
for naming conventionsSeries.sortlevel
should have inplace
argument addedlevel
and sort_remaining
args to sort_index
so it's not equivalent to sort_level
(intentionally limiting sort_index seems like a bad idea though)level=None
for sort_columns
. probably not since level=None falls back to level=0 anywayby
and axis
arguments should be ignored by Series.sort
Syntax:
sort()
== sort(level=0)
== sort_index()
== sortlevel()
sort(['A','B'])
sort(level='spam')
== sort_index('spam')
== sortlevel('spam')
sort(['A','B'], level='spam')
level
controls here even though columns are specified so sort happens along row index named 'spam' first, then nested sort occurs using columns 'A' and 'B'sort(axis=1)
== sort(axis=1, level=0)
== sort_columns()
sort(['A','B'], axis=1)
== sort_columns(['A','B'])
sort(['A','B'], axis=1, level='spam')
== sort_columns(['A','B'], level='spam')
axis
controls level
so sort will be on columns named 'A' and 'B' in column index named 'spam'sort()
== order()
-- sorts on valueslevel
specified, sorts on index/named index/level of multi-index:
sort(level=0)
== sort_index()
== sortlevel()
sort(level='spam')
== sort_index('spam')
== sortlevel('spam')
Comments welcome.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4