RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://stackoverflow.com/questions/15683588/iterating-through-a-pandas-dataframe below:

python - Iterating through a pandas dataframe

You will need 0.11-dev. I think this will give you what you are looking for. See this section: http://pandas.pydata.org/pandas-docs/dev/timeseries.html#time-deltas for more info as the timedeltas are a newer data that pandas is supporting

Heres your data (I separated long/lat just for convenience, the key thing is that the condition column is a bool)

In [137]: df = pd.read_csv(StringIO.StringIO(data),index_col=0,parse_dates=True)

In [138]: df
Out[138]: 
               date    long       lat condition
2013-02-05 19:45:00  39.940   -86.159      True
2013-02-05 19:50:00  39.940   -86.159      True
2013-02-05 19:55:00  39.940   -86.159     False
2013-02-05 20:00:00  39.777   -85.995     False
2013-02-05 20:05:00  39.775   -85.978      True
2013-02-05 20:10:00  39.775   -85.978      True
2013-02-05 20:15:00  39.775   -85.978     False
2013-02-05 20:20:00  39.940   -86.159      True
2013-02-05 20:25:00  39.940   -86.159     False

In [139]: df.dtypes
Out[139]: 
date         float64
long lat     float64
condition       bool
dtype: object

Create some date columns that are the index (these are datetime64[ns] dtype)

In [140]: df['date'] = df.index   
In [141]: df['rdate'] = df.index

Set the rdate column that are False to NaT (np.nan's are transformed to NaT)

In [142]: df.loc[~df['condition'],'rdate'] = np.nan

Forward fill the NaT's from the previous value

In [143]: df['rdate'] = df['rdate'].ffill()

Subtract the rdate from the date, this produces a timedelta64[ns] type column of the time differences

In [144]: df['diff'] = df['date']-df['rdate']

In [151]: df
Out[151]: 
                                   date  long lat condition               rdate  \
2013-02-05 19:45:00 2013-02-05 19:45:00   -86.159      True 2013-02-05 19:45:00   
2013-02-05 19:50:00 2013-02-05 19:50:00   -86.159      True 2013-02-05 19:50:00   
2013-02-05 19:55:00 2013-02-05 19:55:00   -86.159     False 2013-02-05 19:50:00   
2013-02-05 20:00:00 2013-02-05 20:00:00   -85.995     False 2013-02-05 19:50:00   
2013-02-05 20:05:00 2013-02-05 20:05:00   -85.978      True 2013-02-05 20:05:00   
2013-02-05 20:10:00 2013-02-05 20:10:00   -85.978      True 2013-02-05 20:10:00   
2013-02-05 20:15:00 2013-02-05 20:15:00   -85.978     False 2013-02-05 20:10:00   
2013-02-05 20:20:00 2013-02-05 20:20:00   -86.159      True 2013-02-05 20:20:00   
2013-02-05 20:25:00 2013-02-05 20:25:00   -86.159     False 2013-02-05 20:20:00   

                        diff  
2013-02-05 19:45:00 00:00:00  
2013-02-05 19:50:00 00:00:00  
2013-02-05 19:55:00 00:05:00  
2013-02-05 20:00:00 00:10:00  
2013-02-05 20:05:00 00:00:00  
2013-02-05 20:10:00 00:00:00  
2013-02-05 20:15:00 00:05:00  
2013-02-05 20:20:00 00:00:00  
2013-02-05 20:25:00 00:05:00

The diff column are now timedelta64[ns], so you want integers in minutes (FYI this is a little bit clunky now as pandas doesn't have a scalar type Timedelta similar to Timestamp for dates)

(Also, you may have have to do a shift() on this rdate series before you ffill, I think I am off by 1 somewhere)...but this is the idea

In [175]: df['diff'].map(lambda x: x.item().seconds/60)
Out[175]: 
2013-02-05 19:45:00     0
2013-02-05 19:50:00     0
2013-02-05 19:55:00     5
2013-02-05 20:00:00    10
2013-02-05 20:05:00     0
2013-02-05 20:10:00     0
2013-02-05 20:15:00     5
2013-02-05 20:20:00     0
2013-02-05 20:25:00     5

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4