I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
import pandas as pd import numpy as np pd.DataFrame({'x': [1, np.inf]}).to_stata('test.dta') pd.DataFrame({'x': [1, -np.inf]}).to_stata('test.dta')Issue Description
If any column contains np.inf
, then an error is raised when trying to write the DataFrame to dta file:
ValueError: Column x has a maximum value of infinity which is outside the range supported by Stata.
However, if the column contains -np.inf
instead, no error is raised, but when opening the dta file (test.dta
), -np.inf
is displayed as -1.#INF
in Stata:
+-----------------+
| index x |
|-----------------|
1. | 0 1 |
2. | 1 -1.#INF |
+-----------------+
Expected Behavior
I expect the same error is raised if any column contains either np.inf
or -np.inf
.
It seems to be related to the following lines:
When checking for infinity, line 607 pick up the maximum value of the column, which will be
np.inf
if it exists. However, it will not pick up
-np.inf
, which is the minimum value of the column. Hence, an easy fix would be replacing the quoted lines with
maxvalue = data[col].max() minvalue = data[col].min() if np.isinf(maxvalue) or np.isinf(minvalue): raise ValueError( f"Column {col} has a maximum value of infinity (or a minimum value of -infinity) which is outside " "the range supported by Stata." )Installed Versions INSTALLED VERSIONS
commit : 7c48ff4
python : 3.8.12.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19043
machine : AMD64
processor : Intel64 Family 6 Model 158 Stepping 13, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : English_United States.utf8
pandas : 1.2.5
numpy : 1.20.2
pytz : 2021.1
dateutil : 2.8.1
pip : 21.1.3
setuptools : 52.0.0.post20210125
Cython : None
pytest : 6.2.4
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.6.3
html5lib : None
pymysql : None
psycopg2 : 2.9.1 (dt dec pq3 ext lo64)
jinja2 : 3.0.1
IPython : 7.22.0
pandas_datareader: None
bs4 : 4.9.3
bottleneck : None
fsspec : 2021.06.0
fastparquet : None
gcsfs : None
matplotlib : 3.3.4
numexpr : None
odfpy : None
openpyxl : 3.0.7
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : 1.6.2
sqlalchemy : 1.4.22
tables : None
tabulate : 0.8.9
xarray : None
xlrd : 2.0.1
xlwt : None
numba : 0.53.1
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4