RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://github.com/pandas-dev/pandas/issues/49473 below:

CoW: Use the "lazy copy" (with Copy-on-Write) optimization in more methods where appropriate · Issue #49473 · pandas-dev/pandas · GitHub

With the Copy-on-Write implementation (see #36195 / proposal described in more detail in https://docs.google.com/document/d/1ZCQ9mx3LBMy-nhwRl33_jgcvWo9IWdEfxDNQ2thyTb0/edit, and overview follow up issue #48998), we can avoid doing an actual copy of the data in DataFrame and Series methods that typically return a copy / new object.
A typical example is the following:

df2 = df.rename(columns=str.lower)

By default, the rename() method returns a new object (DataFrame) with a copy of the data of the original DataFrame (and thus, mutating values in df2 never mutates df). With CoW enabled (pd.options.mode.copy_on_write = True), we can still return a new object, but now pointing to the same data under the hood (avoiding an initial copy), while preserving the observed behaviour of df2 being a copy / not mutating df when df2 is mutated (though the CoW mechanism, only copying the data in df2 when actually needed upon mutation, i.e. a delayed or lazy copy).

The way this is done in practice for a method like rename() or reset_index() is by using the fact that copy(deep=None) will mean a true deep copy (current default behaviour) if CoW is not enabled, and this "lazy" copy when CoW is enabled. For example:

if inplace: new_obj = self else: new_obj = self.copy(deep=None)

The initial CoW implementation in #46958 only added this logic to a few methods (to ensure this mechanism was working): rename, reset_index, reindex (when reindexing the columns), select_dtypes, to_frame and copy itself.
But there are more methods that can make use of this mechanism, and this issue is meant to as the overview issue to summarize and keep track of the progress on this front.

There is a class of methods that perform an actual operation on the data and return newly calculated data (eg typically reductions or the methods wrapping binary operators) that don't have to be considered here. It's only methods that can (potentially, in certain cases) return the original data that could make use of this optimization.

Series / DataFrame methods to update (I added a ? for the ones I wasn't directly sure about, have to look into what those exactly do to be sure, but left them here to keep track of those, can remove from the list once we know more):

add_prefix / add_suffix -> TST/CoW: copy-on-write tests for add_prefix and add_suffix #49991
align -> ENH: Add lazy copy to align #50432
- Needs a follow-up, see comment -> ENH: Make shallow copy for align nocopy with CoW #50917
asfreq -> ENH: Add test for asfreq CoW when doing noop #50916
assign -> ENH/TST: expand copy-on-write to assign() method #50010
astype -> ENH: Add lazy copy to astype #50802
between_time -> ENH: Add lazy copy for take and between_time #50476
bfill / backfill -> ENH: Add CoW optimization to interpolate #51249
clip -> TST: Add tests for clip with CoW #51492
convert_dtypes -> ENH: Implement CoW for convert_dtypes #51265
copy (tackled in initial implemention in #46958)
drop -> ENH: Add copy-on-write to DataFrame.drop #49689
drop_duplicates (in case no duplicates are dropped) -> ENH: Add lazy copy for drop duplicates #50431
droplevel -> ENH: test CoW for drop_level #50552
dropna -> ENH: Use lazy copy for dropna #50429
eval -> ENH / CoW: Add lazy copy to eval #53746
ffill / pad -> ENH: Add CoW optimization to interpolate #51249
fillna -> ENH: Add CoW optimization for fillna #51279
filter -> TST: Copy on Write for filter #50589
get -> TST: add CoW tests for xs() and get() #51292
head -> TST/CoW: copy-on-write tests for df.head and df.tail #49963
infer_objects -> ENH: Use lazy copy in infer objects #50428
insert?
interpolate -> ENH: Add CoW optimization to interpolate #51249
isetitem -> TST: CoW with df.isetitem() #50692
items -> TST: Test CoW with DataFrame.items() #50595
iterrows? -> CoW: Ensure that iterrows does not allow mutating parent #51271
join / merge -> ENH: enable lazy copy in merge() for CoW #51297
mask -> ENH: Add lazy copy to where #51336
- this is covered by where, but could use an independent test -> TST / CoW: Add test for mask #53745
pipe - > ENH: Add lazy copy to pipe #50567
pop -> TST: Add test for CoW in pop #50569
reindex
- Already handled for reindexing the columns in the initial implemention (#46958), but we can still optimize row selection as well? (in case no actual reindexing takes place) -> TST: add test for reindexing rows with matching index uses shallow copy with CoW #53723
reindex_like -> ENH: Use cow for reindex_like #50426
rename (tackled in initial implementation in #46958)
rename_axis -> ENH: add lazy copy (CoW) mechanism to rename_axis #50415
reorder_levels -> ENH: add copy on write for df reorder_levels GH49473 #50016
replace -> ENH: Add lazy copy to replace #50746
- ENH: Optimize replace to avoid copying when not necessary #50918
- TODO: Optimize when column not explicitly provided in to_replace?
- TODO: Optimize list-like
- TODO: Add note in docs that this is not fully optimized for 2.0 (not necessary if everything is finished by then)
reset_index (tackled in initial implemention in #46958)
round (for columns that are not rounded) -> ENH: Add lazy copy to concat and round #50501
select_dtypes(tackled in initial implemention in #46958)
set_axis -> ENH/CoW: use lazy copy in set_axis method #49600
set_flags -> TST: Test cow for set_flags #50489
set_index -> ENH/CoW: use lazy copy in set_index method #49557
- TODO: check what happens if parent is mutated -> shouldn't mutate the index! (is the data copied when creating the index?)
shift -> ENH: Add lazy copy to shift #50753
sort_index / sort_values (optimization if nothing needs to be sorted)
- sort_index -> ENH: Add lazy copy for sort_index #50491
- sort_values -> ENH: Add lazy copy for sort_values #50643
squeeze -> TST: Test squeeze with CoW #50590
style. (phofl: I don't think there is anything to do here)
swapaxes -> ENH: Add lazy copy for swapaxes no op #50573
swaplevel -> ENH: Add lazy copy to swaplevel #50478
T / transpose -> BUG: transpose not respecting CoW #51430
tail -> TST/CoW: copy-on-write tests for df.head and df.tail #49963
take (optimization if everything is taken?) -> ENH: Add lazy copy for take and between_time #50476
to_timestamp/ to_period -> ENH: Add lazy copy to to_timestamp and to_period #50575
transform -> BUG / CoW: Series.transform not respecting CoW #53747
truncate -> ENH: Add lazy copy for truncate #50477
tz_convert / tz_localize -> ENH: Add lazy copy for tz_convert and tz_localize #50490
unstack (in optimized case where each column is a slice?)
update -> TST: add CoW test for update() #51426
where -> ENH: Add lazy copy to where #51336
xs -> TST: add CoW tests for xs() and get() #51292
Series.to_frame() (tackled in initial implemention in #46958)

Top-level functions:

pd.concat -> ENH: Add lazy copy to concat and round #50501
pd.merge et al? -> ENH: enable lazy copy in merge() for CoW #51297, ENH: Avoid copy when possible in merge #51327
- add tests for join

Want to contribute to this issue?

Pull requests tackling one of the bullet points above are certainly welcome!

Pick one of the methods above (best to stick to one method per PR)
Update the method to make use of a lazy copy (in many cases this might mean using copy(deep=None) somewhere, but for some methods it will be more involved)
Add a test for it in /pandas/tests/copy_view/test_methods.py (you can mimick on of the existing ones, eg test_select_dtypes)
- You can run the test with PANDAS_COPY_ON_WRITE=1 pytest pandas/tests/copy_view/test_methods.py to test it with CoW enabled (pandas will check that environment variable). The test needs to pass with both CoW disabled and enabled.
- The tests make use of a using_copy_on_write fixture that can be used within the test function to test different expected results depending on whether CoW is enabled or not.

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4