A Grouper allows the user to specify a groupby instruction for an object.
This specification will select a column via the key parameter, or if the level parameter is given, a level of the index of the target object.
If level
is passed as a keyword to both Grouper and groupby, the values passed to Grouper take precedence.
Currently unused, reserved for future use.
Dictionary of the keyword arguments to pass to Grouper.
Attributes
A TimeGrouper is returned if freq
is not None
. Otherwise, a Grouper is returned.
Examples
df.groupby(pd.Grouper(key="Animal"))
is equivalent to df.groupby('Animal')
>>> df = pd.DataFrame( ... { ... "Animal": ["Falcon", "Parrot", "Falcon", "Falcon", "Parrot"], ... "Speed": [100, 5, 200, 300, 15], ... } ... ) >>> df Animal Speed 0 Falcon 100 1 Parrot 5 2 Falcon 200 3 Falcon 300 4 Parrot 15 >>> df.groupby(pd.Grouper(key="Animal")).mean() Speed Animal Falcon 200.0 Parrot 10.0
Specify a resample operation on the column âPublish dateâ
>>> df = pd.DataFrame( ... { ... "Publish date": [ ... pd.Timestamp("2000-01-02"), ... pd.Timestamp("2000-01-02"), ... pd.Timestamp("2000-01-09"), ... pd.Timestamp("2000-01-16"), ... ], ... "ID": [0, 1, 2, 3], ... "Price": [10, 20, 30, 40], ... } ... ) >>> df Publish date ID Price 0 2000-01-02 0 10 1 2000-01-02 1 20 2 2000-01-09 2 30 3 2000-01-16 3 40 >>> df.groupby(pd.Grouper(key="Publish date", freq="1W")).mean() ID Price Publish date 2000-01-02 0.5 15.0 2000-01-09 2.0 30.0 2000-01-16 3.0 40.0
If you want to adjust the start of the bins based on a fixed timestamp:
>>> start, end = "2000-10-01 23:30:00", "2000-10-02 00:30:00" >>> rng = pd.date_range(start, end, freq="7min") >>> ts = pd.Series(np.arange(len(rng)) * 3, index=rng) >>> ts 2000-10-01 23:30:00 0 2000-10-01 23:37:00 3 2000-10-01 23:44:00 6 2000-10-01 23:51:00 9 2000-10-01 23:58:00 12 2000-10-02 00:05:00 15 2000-10-02 00:12:00 18 2000-10-02 00:19:00 21 2000-10-02 00:26:00 24 Freq: 7min, dtype: int64
>>> ts.groupby(pd.Grouper(freq="17min")).sum() 2000-10-01 23:14:00 0 2000-10-01 23:31:00 9 2000-10-01 23:48:00 21 2000-10-02 00:05:00 54 2000-10-02 00:22:00 24 Freq: 17min, dtype: int64
>>> ts.groupby(pd.Grouper(freq="17min", origin="epoch")).sum() 2000-10-01 23:18:00 0 2000-10-01 23:35:00 18 2000-10-01 23:52:00 27 2000-10-02 00:09:00 39 2000-10-02 00:26:00 24 Freq: 17min, dtype: int64
>>> ts.groupby(pd.Grouper(freq="17min", origin="2000-01-01")).sum() 2000-10-01 23:24:00 3 2000-10-01 23:41:00 15 2000-10-01 23:58:00 45 2000-10-02 00:15:00 45 Freq: 17min, dtype: int64
If you want to adjust the start of the bins with an offset Timedelta, the two following lines are equivalent:
>>> ts.groupby(pd.Grouper(freq="17min", origin="start")).sum() 2000-10-01 23:30:00 9 2000-10-01 23:47:00 21 2000-10-02 00:04:00 54 2000-10-02 00:21:00 24 Freq: 17min, dtype: int64
>>> ts.groupby(pd.Grouper(freq="17min", offset="23h30min")).sum() 2000-10-01 23:30:00 9 2000-10-01 23:47:00 21 2000-10-02 00:04:00 54 2000-10-02 00:21:00 24 Freq: 17min, dtype: int64
To replace the use of the deprecated base argument, you can now use offset, in this example it is equivalent to have base=2:
>>> ts.groupby(pd.Grouper(freq="17min", offset="2min")).sum() 2000-10-01 23:16:00 0 2000-10-01 23:33:00 9 2000-10-01 23:50:00 36 2000-10-02 00:07:00 39 2000-10-02 00:24:00 24 Freq: 17min, dtype: int64
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4