Submitting Author: Niels Bantilan (@cosmicBboy)
All current maintainers: (@cosmicBboy)
Package Name: pandera
One-Line Description of Package: validate the types, properties, and statistics of pandas data structures
Repository Link: https://github.com/unionai-oss/pandera
Version submitted: 0.1.5
Editor: @lwasser
Reviewer 1: @mbjoseph
Reviewer 2: @xmnlab
Archive: https://github.com/pandera-dev/pandera/releases/tag/v0.2.3
Version accepted: v0.2.3
Date Accepted: 10/10/2019
pandas
data structures can hide a lot of information, and explicitly
validating them at runtime in production-critical or reproducible research
settings is a good idea for building reliable data transformation pipelines.pandera
enables users to:
DataFrame
or values inSeries
.pandera
provides a flexible and expressive API for performing data validation
on tidy (long-form) and wide data to make data processing pipelines more
readable and robust.
* Please fill out a pre-submission inquiry before submitting a data visualization package. For more info, see this section of our guidebook.
Data munging: the package makes ETL, data analysis, and data processing
pipelines more robust and reliable by providing users with tools to validate
assumptions about the schema and statistical properties of datasets.
This package supports validation on long (tidy) data and wide data.
Reproducibility: This package enables users to validate DataFrame
or Series
objects at runtime or as unit/integration tests, and can easily be integrated
to existing pipelines using the check_input
and check_output
decorators.
It also supports collaboration and reproducible research by programmatically
enforcing assertions made about the statistical properties of a dataset in
addition to making it easier to review pandas code in production-critical
contexts.
The target audience of pandera
consist of data scientists, data engineers,
machine learning engineers, and machine learning scientists who use pandas
in
their data processing pipelines for various purposes e.g., transforming data
for reporting, analytics, model training, and data visualization. This tool is
built on top of pandas
and scipy
to provide a user-friendly interface for
explicitly specifying the set of properties that a DataFrame
or Series
must
fulfill in order to be considered valid. Since pandera
makes no assumptions
about the domain of study or contents of these pandas
data structures, it
could be used in a wide variety of quantitative fields that involve the
analysis of tabular data.
There are a few alternatives to pandera in the the Python ecosystem and here
is how they compare:
Enforcer
and Column
objects are very similar to pandera, but it's aKey differentiators of pandera:
column data types, nullability, and uniqueness are first-class concepts.
check_input
and check_output
decorators enable seamless integration with
existing code.
Check
s provide flexibility and performance by providing access to pandas
API by design.
Hypothesis
class provides a tidy-first interface for statistical hypothesis
testing.
Check
s and Hypothesis
objects support both tidy and wide data validation.
Comprehensive documentation on key functionality.
If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag
the editor you contacted:
For details about the pyOpenSci packaging requirements, see our packaging guide. Confirm each of the following by checking the box. This package:
paper.md
matching JOSS's requirements with a high-level description in the package root or in inst/
.Note: Do not submit your package separately to JOSS
Are you OK with Reviewers Submitting Issues to your Repo Directly?This option will allow reviewers to open smaller issues that can then be linked to PR's rather than submitting a more dense text based review. It will also allow you to demonstrate addressing the issue via PR links.
P.S. Have feedback/comments about our review process? Leave a comment here
Editor and Review TemplatesEditor and review templates can be found here
Previous Repo: https://github.com/cosmicBboy/pandera
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4