When assigning a timedelta64
array to a subset of a new column of a DataFrame
, missing data is not filled with NaT
as expected; rather, the new column is cast to float64
and NaN
is used instead. This cast does not usually occur when all values are present, except when there are already float64
columns but no timedelta64
columns in the DataFrame
and indexing is done through .ix
or .loc
.
It's possible these should be two separate issues.
There are a lot of issues involving NaT
in the issue tracker; I'm not 100% sure that this isn't a duplicate. (Nor am I 100% sure this isn't intended behavior, but if it is I'd expect it to be documented more prominently.)
import numpy as np import pandas as pd one_hour = 60*60*10**9 temp = pd.DataFrame({}, index=pd.date_range('2014-1-1', periods=4)) temp['A'] = np.array([1*one_hour]*4, dtype='m8[ns]') temp.loc[:,'B'] = np.array([2*one_hour]*4, dtype='m8[ns]') temp.loc[:3,'C'] = np.array([3*one_hour]*3, dtype='m8[ns]') temp.ix[:,'D'] = np.array([4*one_hour]*4, dtype='m8[ns]') temp.ix[:3,'E'] = np.array([5*one_hour]*3, dtype='m8[ns]') temp['F'] = np.timedelta64('NaT') temp.ix[:-1,'F'] = np.array([6*one_hour]*3, dtype='m8[ns]') temp # A B C D E F #2014-01-01 01:00:00 02:00:00 1.080000e+13 04:00:00 1.800000e+13 06:00:00 #2014-01-02 01:00:00 02:00:00 1.080000e+13 04:00:00 1.800000e+13 06:00:00 #2014-01-03 01:00:00 02:00:00 1.080000e+13 04:00:00 1.800000e+13 06:00:00 #2014-01-04 01:00:00 02:00:00 NaN 04:00:00 NaN NaT # # [4 rows x 6 columns] temp = pd.DataFrame({}, index=pd.date_range('2014-1-1', periods=4)) # Partial assignment converts temp.ix[:-1,'A'] = np.array([1*one_hour]*3, dtype='m8[ns]') # DataFrame is all floats; converts temp.ix[:,'B'] = np.array([2*one_hour]*4, dtype='m8[ns]') # .ix and .loc behave the same temp.loc[:,'C'] = np.array([3*one_hour]*4, dtype='m8[ns]') # straight column assignment doesn't convert temp['D'] = np.array([4*one_hour]*4, dtype='m8[ns]') # Now there are timedeltas; doesn't convert temp.ix[:,'E'] = np.array([5*one_hour]*4, dtype='m8[ns]') # .ix and .loc still behave the same temp.loc[:,'F'] = np.array([6*one_hour]*4, dtype='m8[ns]') temp # A B C D E \ #2014-01-01 3.600000e+12 7.200000e+12 1.080000e+13 04:00:00 05:00:00 #2014-01-02 3.600000e+12 7.200000e+12 1.080000e+13 04:00:00 05:00:00 #2014-01-03 3.600000e+12 7.200000e+12 1.080000e+13 04:00:00 05:00:00 #2014-01-04 NaN 7.200000e+12 1.080000e+13 04:00:00 05:00:00 # # F #2014-01-01 06:00:00 #2014-01-02 06:00:00 #2014-01-03 06:00:00 #2014-01-04 06:00:00 # # [4 rows x 6 columns] temp = pd.DataFrame({}, index=pd.date_range('2014-1-1', periods=4)) # No columns yet, no conversion temp.ix[:,'A'] = np.array([2*one_hour]*4, dtype='m8[ns]') # A #2014-01-01 02:00:00 #2014-01-02 02:00:00 #2014-01-03 02:00:00 #2014-01-04 02:00:00 # # [4 rows x 1 columns]
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4