# Run on Pandas 0.19.1 import numpy as np import pandas as pd df = pd.DataFrame(np.random.randn(1000,5000), columns=['C'+str(c) for c in range(5000)]) %%prun for col in df: df[col] 65011 function calls (65010 primitive calls) in 2.012 seconds Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 5000 1.935 0.000 1.955 0.000 base.py:1362(__contains__) 5000 0.025 0.000 2.002 0.000 frame.py:2033(__getitem__) 1 0.011 0.011 2.012 2.012 <string>:2(<module>) 5000 0.010 0.000 0.016 0.000 frame.py:2059(_getitem_column) 5001 0.007 0.000 0.007 0.000 {method 'view' of 'numpy.ndarray' objects} 5001 0.004 0.000 0.012 0.000 base.py:492(values) 5000 0.004 0.000 0.007 0.000 generic.py:1381(_get_item_cache) 5000 0.004 0.000 0.018 0.000 base.py:1275(<lambda>) 5000 0.003 0.000 0.014 0.000 base.py:874(_values) 5000 0.002 0.000 0.002 0.000 {built-in method builtins.isinstance} 5000 0.002 0.000 0.003 0.000 common.py:438(_apply_if_callable) 5000 0.002 0.000 0.002 0.000 {method 'get' of 'dict' objects} 5000 0.001 0.000 0.001 0.000 {built-in method builtins.hash} 5000 0.001 0.000 0.001 0.000 {built-in method builtins.callable} 1 0.000 0.000 2.012 2.012 {built-in method builtins.exec} 1 0.000 0.000 0.000 0.000 generic.py:833(__iter__) 1 0.000 0.000 0.000 0.000 generic.py:401(_info_axis) 2/1 0.000 0.000 0.000 0.000 {built-in method builtins.iter} 1 0.000 0.000 0.000 0.000 {built-in method builtins.getattr} 1 0.000 0.000 0.000 0.000 base.py:1315(__iter__) 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} %time df.info() <class 'pandas.core.frame.DataFrame'> RangeIndex: 1000 entries, 0 to 999 Columns: 5000 entries, C0 to C4999 dtypes: float64(5000) memory usage: 38.1 MB Wall time: 4.39 sProblem description
It appears that get_item_cache()
or __contains__
may have something to do with it. This affects other functionality such as df.info()
which is now also ~100X slower.
# Run on Pandas 0.18.1 In [6]: %%prun ...: for col in df: ...: df[col] ...: 45011 function calls (45010 primitive calls) in 0.032 seconds Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 5000 0.012 0.000 0.028 0.000 frame.py:1973(__getitem__) 5000 0.005 0.000 0.008 0.000 frame.py:1999(_getitem_column) 5000 0.004 0.000 0.005 0.000 base.py:1233(__contains__) 1 0.004 0.004 0.032 0.032 <string>:2(<module>) 5000 0.002 0.000 0.003 0.000 generic.py:1345(_get_item_cache) 5000 0.001 0.000 0.002 0.000 common.py:1846(_apply_if_callable) 5000 0.001 0.000 0.001 0.000 {built-in method builtins.isinstance} 5000 0.001 0.000 0.001 0.000 {method 'get' of 'dict' objects} 5000 0.001 0.000 0.001 0.000 {built-in method builtins.hash} 5000 0.000 0.000 0.000 0.000 {built-in method builtins.callable} 1 0.000 0.000 0.032 0.032 {built-in method builtins.exec} 1 0.000 0.000 0.000 0.000 generic.py:808(__iter__) 1 0.000 0.000 0.000 0.000 base.py:440(values) 2/1 0.000 0.000 0.000 0.000 {built-in method builtins.iter} 1 0.000 0.000 0.000 0.000 {method 'view' of 'numpy.ndarray' objects} 1 0.000 0.000 0.000 0.000 generic.py:381(_info_axis) 1 0.000 0.000 0.000 0.000 base.py:1186(__iter__) 1 0.000 0.000 0.000 0.000 {built-in method builtins.getattr} 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} %time df.info() <class 'pandas.core.frame.DataFrame'> RangeIndex: 1000 entries, 0 to 999 Columns: 5000 entries, C0 to C4999 dtypes: float64(5000) memory usage: 38.1 MB Wall time: 40 msOutput of
pd.show_versions()
INSTALLED VERSIONS ------------------ commit: None python: 3.5.2.final.0 python-bits: 64 OS: Windows OS-release: 7 machine: AMD64 processor: Intel64 Family 6 Model 63 Stepping 2, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None
pandas: 0.19.1
nose: 1.3.7
pip: 8.1.1
setuptools: 20.10.1
Cython: 0.24.1
numpy: 1.11.2
scipy: 0.18.1
statsmodels: 0.6.1
xarray: 0.7.1
IPython: 4.2.0
sphinx: 1.3.6
patsy: 0.4.0
dateutil: 2.5.0
pytz: 2015.7
blosc: None
bottleneck: 1.0.0
tables: 3.3.0
numexpr: 2.6.1
matplotlib: 1.5.1
openpyxl: 2.2.5
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: None
lxml: 3.6.4
bs4: 4.4.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.12
pymysql: None
psycopg2: None
jinja2: 2.8
boto: None
pandas_datareader: None
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4