A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://github.com/pandas-dev/pandas/issues/47276 below:

StataWriterUTF8 is needlessly strict when converting variable names · Issue #47276 · pandas-dev/pandas · GitHub

Pandas version checks Reproducible Example
import pandas as pd
pd.DataFrame({"mø": [1, 2, 3, 4]}).to_stata("data.dta", version=119)

#.venv/lib/python3.9/site-packages/pandas/io/stata.py:2491: InvalidColumnName: 
#Not all pandas column names were valid Stata variable names.
#The following replacements have been made:
#
#    mø   ->   m_
#
#If this is not what you expect, please make sure you have Stata-compliant
#column names in your DataFrame (strings only, max 32 characters, only
#alphanumerics and underscores, no Stata reserved words)
#
#  warnings.warn(ws, InvalidColumnName)

pd.read_stata("data.dta")

#    index  m_
# 0      0   1
# 1      1   2
# 2      2   3
# 3      3   4
Issue Description

All characters in the range 128 <= ord(c) < 256 are replaced with underscore by StataWriterUTF8, but only (128 <= ord(c) < 192) or ord(c) in {215, 247} need to be removed. The rest (ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ) are perfectly valid in variable names in Stata version >= 118

Expected Behavior

The characters ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ should not be removed from variable names when saving with StataWriterUTF8. Happy to submit a pull request.

Installed Versions INSTALLED VERSIONS ------------------ commit : 4bfe3d0 python : 3.10.4.final.0 python-bits : 64 OS : Linux OS-release : 5.15.0-33-generic Version : #34-Ubuntu SMP Wed May 18 13:34:26 UTC 2022 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8

pandas : 1.4.2
numpy : 1.22.4
pytz : 2022.1
dateutil : 2.8.2
pip : 22.0.2
setuptools : 59.6.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : 8.4.0
pandas_datareader: None
bs4 : None
bottleneck : None
brotli : None
fastparquet : None
fsspec : None
gcsfs : None
markupsafe : None
matplotlib : 3.5.2
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 8.0.0
pyreadstat : 1.1.6
pyxlsb : None
s3fs : None
scipy : None
snappy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
zstandard : None


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4