This might be a potential bug: doing grouped timeseries operations fails silently on a multi-indexed DataFrame.
import pandas as pd import pandas.io.data as web # Get some market data df = web.DataReader(['AAPL', 'GOOG'], 'yahoo', pd.Timestamp('2013'), pd.Timestamp('2014')).to_frame() df.index.names = ('dt', 'symbol') In [21]: df.head() Out[21]: Open High Low Close Volume \ dt symbol 2013-01-02 AAPL 553.82001 555.00000 541.62994 549.03003 140129500 2013-01-03 AAPL 547.88000 549.67004 541.00000 542.10004 88241300 2013-01-04 AAPL 536.96997 538.63000 525.82996 527.00000 148583400 2013-01-07 AAPL 522.00000 529.30005 515.20001 523.90002 121039100 2013-01-08 AAPL 529.21002 531.89001 521.25000 525.31000 114676800 Adj Close dt symbol 2013-01-02 AAPL 74.63931 2013-01-03 AAPL 73.69719 2013-01-04 AAPL 71.64438 2013-01-07 AAPL 71.22294 2013-01-08 AAPL 71.41463
Let's say we want to resample this to monthly data. This fails and returns an empty DataFrame:
df_M = df.groupby(level='symbol').resample('M', how='mean') In [23]: df_M Out[23]: Empty DataFrame Columns: [] Index: []
This, however, works, but requires a seemingly-unnecessary re-indexing:
df_M = df.reset_index().set_index('dt').groupby('symbol').resample('M', how='mean') In [26]: df_M.head() Out[26]: Adj Close Close High Low Open \ symbol dt AAPL 2013-01-31 67.677750 497.822382 504.407623 492.969997 500.083329 2013-02-28 62.388477 456.808942 463.231056 452.106325 458.503692 2013-03-31 60.417287 441.841000 446.803495 437.337996 442.011512 2013-04-30 57.398619 419.765001 425.553183 414.722271 419.766820 2013-05-31 61.340151 446.452734 451.658190 441.495455 446.400919 Volume symbol dt AAPL 2013-01-31 1.562312e+08 2013-02-28 1.229478e+08 2013-03-31 1.147110e+08 2013-04-30 1.245851e+08 2013-05-31 1.073583e+08
The fact that you need to do the reset_index().set_index('dt')
and then groupby('symbol')
instead of groupby(level='symbol')
seems to defeat the purpose of a multi-index! What gives?
I also realize that data like this is perhaps more well-suited to a Panel than a DataFrame, but when dealing with very large amounts of (often sparse) data, the 3D Panel structure presents performance and memory issues compared to the flat DataFrame.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4