Summary: read_excel is unable to read a file using the same S3 URL syntax as read_csv. read_excel should support accessing S3 data in the same manner as read_csv
read_excel fails with the following error:
>>> import pandas as pd >>> df = pd.read_excel("s3://my-bucket/my_file.xlsx") Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/local/lib64/python2.6/site-packages/pandas/io/excel.py", line 163, in read_excel io = ExcelFile(io, engine=engine) File "/usr/local/lib64/python2.6/site-packages/pandas/io/excel.py", line 206, in __init__ self.book = xlrd.open_workbook(io) File "/usr/local/lib/python2.6/site-packages/xlrd/__init__.py", line 394, in open_workbook f = open(filename, "rb") IOError: [Errno 2] No such file or directory: 's3://my-bucket/my_file.xlsx' >>>
read_csv on the other hand is able to successfully read a csv file in the same S3 bucket using the same URL syntax:
>>> import pandas as pd >>> df = pd.read_csv("s3://my-bucket/my_file.csv") >>> len(df.index) 1187 >>>
For the record, read_csv can also see the xlsx file but returns parse errors when attempting to tokenize the data.
>>> import pandas as pd >>> df = pd.read_csv("s3://my-bucket/my_file.xlsx") Exception pandas.parser.CParserError: CParserError('Error tokenizing data. C error: Expected 9 fields in line 210, saw 10\n',) in 'pandas.parser.TextReader._tokenize_rows' ignored >>>
read_excel successfully reads and parses a local copy of the xlsx file
>>> import pandas as pd >>> df = pd.read_excel("my_file.xlsx") >>> len(df.index) 221 >>>
Pandas version string and dependencies:
>>> pd.show_versions() INSTALLED VERSIONS ------------------ commit: None python: 2.6.9.final.0 python-bits: 64 OS: Linux OS-release: 3.14.48-33.39.amzn1.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 pandas: 0.17.0 nose: 1.3.4 pip: 6.1.1 setuptools: 12.2 Cython: None numpy: 1.10.1 scipy: 0.16.0 statsmodels: None IPython: None sphinx: None patsy: None dateutil: 2.4.2 pytz: 2015.7 blosc: None bottleneck: None tables: None numexpr: None matplotlib: None openpyxl: None xlrd: 0.9.4 xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: None httplib2: None apiclient: None sqlalchemy: None pymysql: None psycopg2: None >>>
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4