A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://github.com/pandas-dev/pandas/issues/2665 below:

Incorrect resampling from hourly to monthly values · Issue #2665 · pandas-dev/pandas · GitHub

Skip to content Navigation Menu

Saved searches Use saved searches to filter your results more quickly

Sign up You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert Additional navigation options

Incorrect resampling from hourly to monthly values #2665

Description

EDIT: Modern reproducible example:

In [1]: import pandas as pd

In [2]: dates = pd.date_range('3/20/2000', '5/20/2000', freq='1h')

In [4]: ts = pd.Series(range(len(dates)), index=dates)

In [5]: def start(x):
   ...:     return x.index[0]
   ...:
   ...: def end(x):
   ...:     return x.index[-1]
   ...:

In [6]: ts.resample('1M', closed='right', label='right').apply(start)
Out[6]:
2000-03-31   2000-03-20
2000-04-30   2000-04-01
2000-05-31   2000-05-01
Freq: M, dtype: datetime64[ns]

In [7]: ts.resample('1M', closed='right', label='right').apply(end)
Out[7]:
2000-03-31   2000-03-31 23:00:00
2000-04-30   2000-04-30 23:00:00
2000-05-31   2000-05-20 00:00:00
Freq: M, dtype: datetime64[ns]

In [8]: ts.resample('1M', closed='left', label='right').apply(start)
Out[8]:
2000-03-31   2000-03-20
2000-04-30   2000-03-31
2000-05-31   2000-04-30
Freq: M, dtype: datetime64[ns]

In [9]: ts.resample('1M', closed='left', label='right').apply(end)
Out[9]:
2000-03-31   2000-03-30 23:00:00
2000-04-30   2000-04-29 23:00:00
2000-05-31   2000-05-20 00:00:00
Freq: M, dtype: datetime64[ns]

Expected behavior: #2665 (comment)

There seems to be something wrong when resampling hourly values to monthly values. The 'closed=' argument does not do what it should. I am using pandas 0.10.0.

import pandas as pd
from pandas import TimeSeries
import numpy as np

# Timeseries with hourly values
dates = pd.date_range('3/20/2000', '5/20/2000', freq='1h')
ts = TimeSeries(np.arange(len(dates)), index=dates)

# I want monthly values equal to the summed values of the hours in that month
# With "closed=left" I want the hourly value (1 value) at midnight to be 
# included in the month after that very midnight
data_monthly_1 = ts.resample('1M', how='sum', closed='left', label='right') 
sum_april1 = data_monthly_1.ix[1]

# If we check this, the summed values per month do not match : 
data_hourly_april2 = ts[ts.index.month == 4]
sum_april2 = sum(data_hourly_april2)

# Instead, "closed=left" above resulted in an inclusion of the entire last day of the
# previous month (25 values !) (and exclusion of the last day of the month)
# This is also strange as I am nowhere concerned with using 'days'.
data_hourly_april3 = ts[   ((ts.index.month == 3) & (ts.index.day == 31)) | 
                           ((ts.index.month == 4) & (ts.index.day < 30))      ]
sum_april3 = sum(data_hourly_april3)

# PS: "closed=right" results in the answer I would expect for "closed = left":
data_monthly_4 = ts.resample('1M', how='sum', closed='right', label='right') 
sum_april4 = data_monthly_4.ix[1]

When resampling these hourly to daily values, there was no problem.

You can’t perform that action at this time.


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4