read_sas
doesn't work well with chunksize
or iterator
parameters.
The following data test file in the repository have 32 lines.
sasfile = 'pandas/io/tests/sas/data/airline.sas7bdat' pd.read_sas(sasfile).shape Out[18]: (32, 6)
When we carefully read the file with chunksize
/iterator
, all's well:
reader = pd.read_sas(sasfile, chunksize=16) df = reader.read() df.shape Out[31]: (16, 6) df = reader.read() df.shape Out[33]: (16, 6)
or
reader = pd.read_sas(sasfile, iterator=True) df = reader.read(30) df.shape Out[37]: (30, 6) df = reader.read(2) df.shape Out[39]: (2, 6) df = reader.read(2) type(df) Out[41]: NoneType
But if we don't know the length of the data, we'll easily stumble on an exception and won't read the whole data, which is painful with large files.
reader = pd.read_sas(sasfile, chunksize=20) df = reader.read() df.shape Out[45]: (20, 6) df = reader.read() Traceback (most recent call last): File "/usr/local/lib64/python3.5/site-packages/IPython/core/interactiveshell.py", line 2885, in run_code exec(code_obj, self.user_global_ns, self.user_ns) File "<ipython-input-46-c5d811b93ac1>", line 1, in <module> df = reader.read() File "/home/users/piotr/workspace/pandas-pijucha/pandas/io/sas/sas7bdat.py", line 604, in read rslt = self._chunk_to_dataframe() File "/home/users/piotr/workspace/pandas-pijucha/pandas/io/sas/sas7bdat.py", line 646, in _chunk_to_dataframe dtype=self.byte_order + 'd') File "/home/users/piotr/workspace/pandas-pijucha/pandas/core/frame.py", line 2419, in __setitem__ self._set_item(key, value) File "/home/users/piotr/workspace/pandas-pijucha/pandas/core/frame.py", line 2485, in _set_item value = self._sanitize_column(key, value) File "/home/users/piotr/workspace/pandas-pijucha/pandas/core/frame.py", line 2656, in _sanitize_column value = _sanitize_index(value, self.index, copy=False) File "/home/users/piotr/workspace/pandas-pijucha/pandas/core/series.py", line 2793, in _sanitize_index raise ValueError('Length of values does not match length of ' 'index') ValueError: Length of values does not match length of index
or
reader = pd.read_sas(sasfile, iterator=True) reader.read(30).shape Out[51]: (30, 6) reader.read(30).shape Traceback (most recent call last): File "/usr/local/lib64/python3.5/site-packages/IPython/core/interactiveshell.py", line 2885, in run_code exec(code_obj, self.user_global_ns, self.user_ns) File "<ipython-input-52-5d757f713808>", line 1, in <module> reader.read(30).shape File "/home/users/piotr/workspace/pandas-pijucha/pandas/io/sas/sas7bdat.py", line 604, in read rslt = self._chunk_to_dataframe() File "/home/users/piotr/workspace/pandas-pijucha/pandas/io/sas/sas7bdat.py", line 646, in _chunk_to_dataframe dtype=self.byte_order + 'd') File "/home/users/piotr/workspace/pandas-pijucha/pandas/core/frame.py", line 2419, in __setitem__ self._set_item(key, value) File "/home/users/piotr/workspace/pandas-pijucha/pandas/core/frame.py", line 2485, in _set_item value = self._sanitize_column(key, value) File "/home/users/piotr/workspace/pandas-pijucha/pandas/core/frame.py", line 2656, in _sanitize_column value = _sanitize_index(value, self.index, copy=False) File "/home/users/piotr/workspace/pandas-pijucha/pandas/core/series.py", line 2793, in _sanitize_index raise ValueError('Length of values does not match length of ' 'index') ValueError: Length of values does not match length of indexOutput of
pd.show_versions()
pd.show_versions() INSTALLED VERSIONS
commit: 75b606a
python: 3.5.1.final.0
python-bits: 64
OS: Linux
OS-release: 4.1.20-1
machine: x86_64
processor: Intel(R)_Core(TM)i5-2520M_CPU@_2.50GHz
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.19.0+112.g75b606a
nose: 1.3.7
pip: 9.0.1
setuptools: 21.0.0
Cython: 0.24
numpy: 1.11.0
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4