pandas v0.14.0 (May 31 , 2014) seems uncapable of importing Stata 13 datasets although according to this http://pandas.pydata.org/pandas-docs/stable/whatsnew.html, it should. Stata 12 files can be imported without problems.
The output of running this
import pandas pandas.show_versions() dta = pandas.io.stata.read_stata('D:\\Datos\\rferrer\\Desktop\\myauto.dta')
follows:
%run D:/Datos/RFERRER/Desktop/import_stata13.py INSTALLED VERSIONS ------------------ commit: None python: 2.7.6.final.0 python-bits: 64 OS: Windows OS-release: 7 machine: AMD64 processor: Intel64 Family 6 Model 45 Stepping 7, GenuineIntel byteorder: little LC_ALL: None LANG: None pandas: 0.14.0 nose: 1.3.0 Cython: 0.19.2 numpy: 1.8.0 scipy: 0.14.0 statsmodels: 0.5.0 IPython: 1.2.1 sphinx: 1.2.2 patsy: 0.2.0 scikits.timeseries: 0.91.3 dateutil: 2.2 pytz: 2013.8 bottleneck: None tables: 2.4.0 numexpr: 2.2.2 matplotlib: 1.3.1 openpyxl: 1.8.5 xlrd: 0.9.2 xlwt: 0.7.5 xlsxwriter: None lxml: 3.2.3 bs4: None html5lib: 0.95-dev bq: None apiclient: None rpy2: None sqlalchemy: 0.8.3 pymysql: None psycopg2: None C:\Users\rferrer\AppData\Local\Enthought\Canopy\User\lib\site-packages\openpyxl\__init__.py:31: UserWarning: The installed version of lxml is too old to be used with openpyxl warnings.warn("The installed version of lxml is too old to be used with openpyxl") --------------------------------------------------------------------------- TypeError Traceback (most recent call last) C:\Users\rferrer\AppData\Local\Enthought\Canopy\App\appdata\canopy-1.4.0.1938.win-x86_64\lib\site-packages\IPython\utils\py3compat.pyc in execfile(fname, glob, loc) 195 else: 196 filename = fname --> 197 exec compile(scripttext, filename, 'exec') in glob, loc 198 else: 199 def execfile(fname, *where): D:\Datos\RFERRER\Desktop\import_stata13.py in <module>() 3 pandas.show_versions() 4 ----> 5 dta = pandas.io.stata.read_stata('D:\\Datos\\rferrer\\Desktop\\myauto.dta') C:\Users\rferrer\AppData\Local\Enthought\Canopy\User\lib\site-packages\pandas\io\stata.pyc in read_stata(filepath_or_buffer, convert_dates, convert_categoricals, encoding, index) 45 identifier of column that should be used as index of the DataFrame 46 """ ---> 47 reader = StataReader(filepath_or_buffer, encoding) 48 49 return reader.data(convert_dates, convert_categoricals, index) C:\Users\rferrer\AppData\Local\Enthought\Canopy\User\lib\site-packages\pandas\io\stata.pyc in __init__(self, path_or_buf, encoding) 455 self.path_or_buf = path_or_buf 456 --> 457 self._read_header() 458 459 def _read_header(self): C:\Users\rferrer\AppData\Local\Enthought\Canopy\User\lib\site-packages\pandas\io\stata.pyc in _read_header(self) 657 658 """Calculate size of a data record.""" --> 659 self.col_sizes = lmap(lambda x: self._calcsize(x), self.typlist) 660 661 def _calcsize(self, fmt): C:\Users\rferrer\AppData\Local\Enthought\Canopy\User\lib\site-packages\pandas\io\stata.pyc in <lambda>(x) 657 658 """Calculate size of a data record.""" --> 659 self.col_sizes = lmap(lambda x: self._calcsize(x), self.typlist) 660 661 def _calcsize(self, fmt): C:\Users\rferrer\AppData\Local\Enthought\Canopy\User\lib\site-packages\pandas\io\stata.pyc in _calcsize(self, fmt) 661 def _calcsize(self, fmt): 662 return (type(fmt) is int and fmt --> 663 or struct.calcsize(self.byteorder + fmt)) 664 665 def _col_size(self, k=None): TypeError: cannot concatenate 'str' and 'NoneType' objects
The dataset myauto.dta
is just the auto
dataset made available running sysuse auto
within Stata.
The problem is originally documented here: http://stackoverflow.com/questions/24053652/pandas-and-stata-13-files.
My Python is set up with Enthough Canopy 1.4.0 (64 bit).
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4