Pointblank is a powerful, yet elegant data validation framework for Python that transforms how you ensure data quality. With its intuitive, chainable API, you can quickly validate your data against comprehensive quality checks and visualize results through stunning, interactive reports that make data issues immediately actionable.
Whether you're a data scientist, data engineer, or analyst, Pointblank helps you catch data quality issues before they impact your analyses or downstream systems.
Getting Started in 30 Secondsimport pointblank as pb validation = ( pb.Validate(data=pb.load_dataset(dataset="small_table")) .col_vals_gt(columns="d", value=100) # Validate values > 100 .col_vals_le(columns="c", value=5) # Validate values <= 5 .col_exists(columns=["date", "date_time"]) # Check columns exist .interrogate() # Execute and collect results ) # Get the validation report from the REPL with: validation.get_tabular_report().show() # From a notebook simply use: validation
import pointblank as pb import polars as pl # Load your data sales_data = pl.read_csv("sales_data.csv") # Create a comprehensive validation validation = ( pb.Validate( data=sales_data, tbl_name="sales_data", # Name of the table for reporting label="Real-world example.", # Label for the validation, appears in reports thresholds=(0.01, 0.02, 0.05), # Set thresholds for warnings, errors, and critical issues actions=pb.Actions( # Define actions for any threshold exceedance critical="Major data quality issue found in step {step} ({time})." ), final_actions=pb.FinalActions( # Define final actions for the entire validation pb.send_slack_notification( webhook_url="https://hooks.slack.com/services/your/webhook/url" ) ), brief=True, # Add automatically-generated briefs for each step ) .col_vals_between( # Check numeric ranges with precision columns=["price", "quantity"], left=0, right=1000 ) .col_vals_not_null( # Ensure that columns ending with '_id' don't have null values columns=pb.ends_with("_id") ) .col_vals_regex( # Validate patterns with regex columns="email", pattern="^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$" ) .col_vals_in_set( # Check categorical values columns="status", set=["pending", "shipped", "delivered", "returned"] ) .conjointly( # Combine multiple conditions lambda df: pb.expr_col("revenue") == pb.expr_col("price") * pb.expr_col("quantity"), lambda df: pb.expr_col("tax") >= pb.expr_col("revenue") * 0.05 ) .interrogate() )
Major data quality issue found in step 7 (2025-04-16 15:03:04.685612+00:00).
# Get an HTML report you can share with your team validation.get_tabular_report().show("browser")
# Get a report of failing records from a specific step validation.get_step_report(i=3).show("browser") # Get failing records from step 3
For teams that need portable, version-controlled validation workflows, Pointblank supports YAML configuration files. This makes it easy to share validation logic across different environments and team members, ensuring everyone is on the same page.
validation.yaml
validate: data: small_table tbl_name: "small_table" label: "Getting started validation" steps: - col_vals_gt: columns: "d" value: 100 - col_vals_le: columns: "c" value: 5 - col_exists: columns: ["date", "date_time"]
Execute the YAML validation
import pointblank as pb # Run validation from YAML configuration validation = pb.yaml_interrogate("validation.yaml") # Get the results just like any other validation validation.get_tabular_report().show()
This approach is perfect for:
Pointblank includes a powerful CLI utility called pb
that lets you run data validation workflows directly from the command line. Perfect for CI/CD pipelines, scheduled data quality checks, or quick validation tasks.
Explore Your Data
# Get a quick preview of your data pb preview small_table # Preview data from GitHub URLs pb preview "https://github.com/user/repo/blob/main/data.csv" # Check for missing values in Parquet files pb missing data.parquet # Generate column summaries from database connections pb scan "duckdb:///data/sales.ddb::customers"
Run Essential Validations
# Run validation from YAML configuration file pb run validation.yaml # Run validation from Python file pb run validation.py # Check for duplicate rows pb validate small_table --check rows-distinct # Validate data directly from GitHub pb validate "https://github.com/user/repo/blob/main/sales.csv" --check col-vals-not-null --column customer_id # Verify no null values in Parquet datasets pb validate "data/*.parquet" --check col-vals-not-null --column a # Extract failing data for debugging pb validate small_table --check col-vals-gt --column a --value 5 --show-extract
Integrate with CI/CD
# Use exit codes for automation in one-liner validations (0 = pass, 1 = fail) pb validate small_table --check rows-distinct --exit-code # Run validation workflows with exit codes pb run validation.yaml --exit-code pb run validation.py --exit-code
Click the following headings to see some video demonstrations of the CLI:
Getting Started with the Pointblank CLI Doing Some Data Exploration Validating Data with the CLI Using Polars in the CLI Integrating Pointblank with CI/CD Features That Set Pointblank ApartVisit our documentation site for:
We'd love to hear from you! Connect with us:
You can install Pointblank using pip:
You can also install Pointblank from Conda-Forge by using:
conda install conda-forge::pointblank
If you don't have Polars or Pandas installed, you'll need to install one of them to use Pointblank.
pip install "pointblank[pl]" # Install Pointblank with Polars pip install "pointblank[pd]" # Install Pointblank with Pandas
To use Pointblank with DuckDB, MySQL, PostgreSQL, or SQLite, install Ibis with the appropriate backend:
pip install "pointblank[duckdb]" # Install Pointblank with Ibis + DuckDB pip install "pointblank[mysql]" # Install Pointblank with Ibis + MySQL pip install "pointblank[postgres]" # Install Pointblank with Ibis + PostgreSQL pip install "pointblank[sqlite]" # Install Pointblank with Ibis + SQLite
Pointblank uses Narwhals to work with Polars and Pandas DataFrames, and integrates with Ibis for database and file format support. This architecture provides a consistent API for validating tabular data from various sources.
Contributing to PointblankThere are many ways to contribute to the ongoing development of Pointblank. Some contributions can be simple (like fixing typos, improving documentation, filing issues for feature requests or problems, etc.) and others might take more time and care (like answering questions and submitting PRs with code changes). Just know that anything you can do to help would be very much appreciated!
Please read over the contributing guidelines for information on how to get started.
There's also a version of Pointblank for R, which has been around since 2017 and is widely used in the R community. You can find it at https://github.com/rstudio/pointblank.
We're actively working on enhancing Pointblank with:
If you have any ideas for features or improvements, don't hesitate to share them with us! We are always looking for ways to make Pointblank better.
Please note that the Pointblank project is released with a contributor code of conduct.
By participating in this project you agree to abide by its terms.
Pointblank is licensed under the MIT license.
© Posit Software, PBC.
This project is primarily maintained by Rich Iannone. Other authors may occasionally assist with some of these duties.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4