A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://github.com/pandas-dev/pandas/issues/7683 below:

.iterrows takes too long and generate large memory footprint · Issue #7683 · pandas-dev/pandas · GitHub

When using df.iterrows on large data frame, it takes a long time to run and consumes huge amount of memory.

The name of the function implies that it is an iterator and should not take much to run. However, in the method it uses builtin method 'zip', which can sometimes generate huge temporary list of tuples if optimisation is not done correctly.

Below is the code which can reproduce the issue on a box with 16GB memory.

s1 = range(30000000)
s2 = np.random.randn(30000000)
ts = pd.date_range('20140101', freq='S', periods=30000000)
df = pd.DataFrame({'s1': s1, 's2': s2}, index=ts)
for r in df.iterrows():
    break # expected to return immediately, yet it takes more than 2 minutes and uses 4G memory

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4