A Python library for R datasets.
The package rdata offers a lightweight way in Python to import and export R datasets/objects stored in the ".rda" and ".rds" formats. Its main advantages are:
The rdata package is on PyPi and can be installed using pip
:
pip install rdata
The package is also available for conda
using the conda-forge
channel:
conda install -c conda-forge rdataInstalling a develop version
The current version from the develop branch can be installed as
pip install git+https://github.com/vnmabus/rdata.git@develop
The documentation of rdata is in ReadTheDocs.
Examples of use are available in ReadTheDocs.
Please, if you find this software useful in your work, reference it citing the following paper:
@article{ramos-carreno+rossi_2024_rdata, author = {Ramos-CarreƱo, Carlos and Rossi, Tuomas}, doi = {10.21105/joss.07540}, journal = {Journal of Open Source Software}, month = dec, number = {104}, pages = {1--4}, title = {{rdata: A Python library for R datasets}}, url = {https://joss.theoj.org/papers/10.21105/joss.07540#}, volume = {9}, year = {2024} }
You can additionally cite the software repository itself using:
@misc{ramos-carreno++_2024_rdata-repo, author = {The rdata developers}, doi = {10.5281/zenodo.6382237}, month = dec, title = {rdata: A Python library for R datasets}, url = {https://github.com/vnmabus/rdata}, year = {2024} }
If you want to reference a particular version for reproducibility, check the version-specific DOIs available in Zenodo.
The common way of reading an rds file is:
import rdata converted = rdata.read_rds(rdata.TESTDATA_PATH / "test_dataframe.rds") print(converted)
which returns the read dataframe:
class value 1 a 1 2 b 2 3 b 3
The analog rda file can be read in a similar way:
import rdata converted = rdata.read_rda(rdata.TESTDATA_PATH / "test_dataframe.rda") print(converted)
which returns a dictionary mapping the variable name defined in the file (test_dataframe
) to the dataframe:
{'test_dataframe': class value 1 a 1 2 b 2 3 b 3}
Under the hood, these reading functions are equivalent to the following two-step code:
import rdata parsed = rdata.parser.parse_file(rdata.TESTDATA_PATH / "test_dataframe.rda") converted = rdata.conversion.convert(parsed) print(converted)
This consists of two steps:
The common way of writing data to an rds file is:
import pandas as pd import rdata df = pd.DataFrame({"class": pd.Categorical(["a", "b", "b"]), "value": [1, 2, 3]}) print(df) rdata.write_rds("data.rds", df)
which writes the dataframe to file data.rds
:
class value 0 a 1 1 b 2 2 b 3
Similarly, the dataframe can be written to an rda file with a given variable name:
import pandas as pd import rdata df = pd.DataFrame({"class": pd.Categorical(["a", "b", "b"]), "value": [1, 2, 3]}) data = {"my_dataframe": df} print(data) rdata.write_rda("data.rda", data)
which writes the name-dataframe dictionary to file data.rda
:
{'my_dataframe': class value 0 a 1 1 b 2 2 b 3}
Under the hood, these writing functions are equivalent to the following two-step code:
import pandas as pd import rdata df = pd.DataFrame({"class": pd.Categorical(["a", "b", "b"]), "value": [1, 2, 3]}) data = {"my_dataframe": df} r_data = rdata.conversion.convert_python_to_r_data(data, file_type="rda") rdata.unparser.unparse_file("data.rda", r_data, file_type="rda")
This consists of two steps (reverse to reading):
Additional examples illustrating the functionalities of this package can be found in the ReadTheDocs documentation.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4