A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://github.com/pandas-dev/pandas/issues/11899 below:

allow even more flexible ISO 8601 datetime parsing · Issue #11899 · pandas-dev/pandas · GitHub

Hello,

I noticed that there is a huge code speed difference with to_datetime execution when format is not given and when it's given.

I wonder if there is not some room for improvements here!

In [1]: %time df=pd.read_csv("AUDUSD-2014-01.csv", names=['Symbol', 'Date', 'Bid', 'Ask'])
CPU times: user 3.31 s, sys: 481 ms, total: 3.79 s
Wall time: 4.13 s

In [2]: df
Out[274]:
          Symbol                   Date      Bid      Ask
0        AUD/USD  20140101 21:55:34.404  0.88796  0.88922
1        AUD/USD  20140101 21:55:34.444  0.88805  0.88914
2        AUD/USD  20140101 21:55:34.475  0.88809  0.88910
3        AUD/USD  20140101 21:55:48.962  0.88811  0.88908
4        AUD/USD  20140101 21:56:38.293  0.88808  0.88887
...          ...                    ...      ...      ...
1947101  AUD/USD  20140131 21:59:48.048  0.87525  0.87589
1947102  AUD/USD  20140131 21:59:54.599  0.87527  0.87589
1947103  AUD/USD  20140131 21:59:56.927  0.87531  0.87588
1947104  AUD/USD  20140131 21:59:59.365  0.87531  0.87574
1947105  AUD/USD  20140131 22:00:00.038  0.87531  0.87574

[1947106 rows x 4 columns]

In [3]: %time pd.to_datetime(df['Date'])
CPU times: user 11min 44s, sys: 19.4 s, total: 12min 4s
Wall time: 12min 46s
Out[3]:
0         2014-01-01 21:55:34.404
1         2014-01-01 21:55:34.444
2         2014-01-01 21:55:34.475
3         2014-01-01 21:55:48.962
4         2014-01-01 21:56:38.293
                    ...
1947101   2014-01-31 21:59:48.048
1947102   2014-01-31 21:59:54.599
1947103   2014-01-31 21:59:56.927
1947104   2014-01-31 21:59:59.365
1947105   2014-01-31 22:00:00.038
Name: Date, dtype: datetime64[ns]

In [4]: fmt='%Y%m%d %H:%M:%S.%f'

In [5]: %time pd.to_datetime(df['Date'], format=fmt)
CPU times: user 37.3 s, sys: 1.31 s, total: 38.6 s
Wall time: 40 s

In [6]: timedelta(minutes=12, seconds=46) / timedelta(seconds=40)
Out[6]: 19.15

There is x19.15 factor!!!

Sample data can be found here
https://drive.google.com/file/d/0B8iUtWjZOTqla3ZZTC1FS0pkZXc/view?usp=sharing

See also pydata/pandas-datareader#153


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4