import pandas as pd import numpy as np np_df = np.random.randn(10000, 4000) df = pd.DataFrame(np_df) %timeit np.round(np_df, 2) # 416 ms ± 10.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) %timeit df.round(2) # 1.69 s ± 27.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) %timeit np.round(df, 2) # 1.74 s ± 112 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)Problem description
Completely unexpected DataFrage.round() showed up as a major hotspot during profiling.
When looking at the code, we see that even when rounding the complete data frame to a given number of decimals it is split into series objects which are then rounded.
I am wondering if there is a reason not to pass the underlying data frame to numpy and do the rounding there in this case.
A quick test showed that something like this would give us the numpy performance:
def faster_round(df, decimals): rounded = np.round(df.values, decimals) return pd.DataFrame(rounded, columns=df.columns, index=df.index) %timeit faster_round(df, 2) # 417 ms ± 14.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4