A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://github.com/vnmabus/rdata below:

vnmabus/rdata: Reader of R datasets in .rda format, in Python

A Python library for R datasets.

The package rdata offers a lightweight way in Python to import and export R datasets/objects stored in the ".rda" and ".rds" formats. Its main advantages are:

Installing a stable release

The rdata package is on PyPi and can be installed using pip:

pip install rdata

The package is also available for conda using the conda-forge channel:

conda install -c conda-forge rdata
Installing a develop version

The current version from the develop branch can be installed as

pip install git+https://github.com/vnmabus/rdata.git@develop

The documentation of rdata is in ReadTheDocs.

Examples of use are available in ReadTheDocs.

Please, if you find this software useful in your work, reference it citing the following paper:

@article{ramos-carreno+rossi_2024_rdata,
    author = {Ramos-CarreƱo, Carlos and Rossi, Tuomas},
    doi = {10.21105/joss.07540},
    journal = {Journal of Open Source Software},
    month = dec,
    number = {104},
    pages = {1--4},
    title = {{rdata: A Python library for R datasets}},
    url = {https://joss.theoj.org/papers/10.21105/joss.07540#},
    volume = {9},
    year = {2024}
}

You can additionally cite the software repository itself using:

@misc{ramos-carreno++_2024_rdata-repo,
  author = {The rdata developers},
  doi = {10.5281/zenodo.6382237},
  month = dec,
  title = {rdata: A Python library for R datasets},
  url = {https://github.com/vnmabus/rdata},
  year = {2024}
}

If you want to reference a particular version for reproducibility, check the version-specific DOIs available in Zenodo.

The common way of reading an rds file is:

import rdata

converted = rdata.read_rds(rdata.TESTDATA_PATH / "test_dataframe.rds")
print(converted)

which returns the read dataframe:

  class  value
1     a      1
2     b      2
3     b      3

The analog rda file can be read in a similar way:

import rdata

converted = rdata.read_rda(rdata.TESTDATA_PATH / "test_dataframe.rda")
print(converted)

which returns a dictionary mapping the variable name defined in the file (test_dataframe) to the dataframe:

{'test_dataframe':   class  value
1     a      1
2     b      2
3     b      3}

Under the hood, these reading functions are equivalent to the following two-step code:

import rdata

parsed = rdata.parser.parse_file(rdata.TESTDATA_PATH / "test_dataframe.rda")
converted = rdata.conversion.convert(parsed)
print(converted)

This consists of two steps:

  1. First, the file is parsed using the function rdata.parser.parse_file. This provides a literal description of the file contents as a hierarchy of Python objects representing the basic R objects. This step is unambiguous and always the same.
  2. Then, each object must be converted to an appropriate Python object. In this step there are several choices on which Python type is the most appropriate as the conversion for a given R object. Thus, we provide a default rdata.conversion.convert routine, which tries to select Python objects that preserve most information of the original R object. For custom R classes, it is also possible to specify conversion routines to Python objects as exemplified in the documentation.

The common way of writing data to an rds file is:

import pandas as pd
import rdata

df = pd.DataFrame({"class": pd.Categorical(["a", "b", "b"]), "value": [1, 2, 3]})
print(df)

rdata.write_rds("data.rds", df)

which writes the dataframe to file data.rds:

  class  value
0     a      1
1     b      2
2     b      3

Similarly, the dataframe can be written to an rda file with a given variable name:

import pandas as pd
import rdata

df = pd.DataFrame({"class": pd.Categorical(["a", "b", "b"]), "value": [1, 2, 3]})
data = {"my_dataframe": df}
print(data)

rdata.write_rda("data.rda", data)

which writes the name-dataframe dictionary to file data.rda:

{'my_dataframe':   class  value
0     a      1
1     b      2
2     b      3}

Under the hood, these writing functions are equivalent to the following two-step code:

import pandas as pd
import rdata

df = pd.DataFrame({"class": pd.Categorical(["a", "b", "b"]), "value": [1, 2, 3]})
data = {"my_dataframe": df}

r_data = rdata.conversion.convert_python_to_r_data(data, file_type="rda")
rdata.unparser.unparse_file("data.rda", r_data, file_type="rda")

This consists of two steps (reverse to reading):

  1. First, each Python object is converted to an appropriate R object. Like in reading, there are several choices, and the default rdata.conversion.convert_python_to_r_data. routine tries to select R objects that preserve most information of the original Python object. For Python classes, it is also possible to specify custom conversion routines to R classes as exemplified in the documentation.
  2. Then, the created RData representation is unparsed to a file using the function rdata.unparser.unparse_file.

Additional examples illustrating the functionalities of this package can be found in the ReadTheDocs documentation.


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4