RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://github.com/rkissell/etl below:

rkissell/etl: Build Postgres replication apps in Rust

Note: Version 2 is currently under development under /v2, which includes a complete rework of the pipeline architecture for improved performance and scalability.

A Rust crate to quickly build replication solutions for Postgres. It provides building blocks to construct data pipelines which can continually copy data from Postgres to other systems. It builds abstractions on top of Postgres's logical streaming replication protocol and pushes users towards the pit of success without letting them worry about low level details of the protocol.

The etl crate supports the following destinations:

BigQuery
DuckDB
MotherDuck
Snowflake (planned)
ClickHouse (planned)
Many more to come...

Note: DuckDB and MotherDuck destinations do not use the batched pipeline, hence they currently perform poorly. A batched pipeline version of these destinations is planned.

To use etl in your Rust project, add it via a git dependency in Cargo.toml:

[dependencies]
etl = { git = "https://github.com/supabase/etl", features = ["stdout"] }

Each destination is behind a feature of the same name, so remember to enable the right feature. The git dependency is needed for now because etl is not yet published on crates.io.

To quickly try out etl, you can run the stdout example, which will replicate the data to standard output. First, create a publication in Postgres which includes the tables you want to replicate:

create publication my_publication
for table table1, table2;

Then run the stdout example:

cargo run -p etl --example stdout --features="stdout" -- --db-host localhost --db-port 5432 --db-name postgres --db-username postgres --db-password password cdc my_publication stdout_slot

In the above example, etl connects to a Postgres database named postgres running on localhost:5432 with a username postgres and password password. The slot name stdout_slot will be created by etl automatically.

For code examples on how to use etl, please refer to the examples folder in the source.

Before running the examples, tests, or the API and replicator components, you'll need to set up a PostgreSQL database. We provide a convenient script to help you with this setup. For detailed instructions on how to use the database setup script, please refer to our Database Setup Guide.

To run the test suite:

The repository includes Docker support for both the replicator and api components:

# Build replicator image
docker build -f ./replicator/Dockerfile .

# Build api image
docker build -f ./api/Dockerfile .

For a detailed explanation of the ETL architecture and design decisions, please refer to our Design Document.

Too Many Open Files Error

If you see the following error when running tests on macOS:

called `Result::unwrap()` on an `Err` value: Os { code: 24, kind: Uncategorized, message: "Too many open files" }

Raise the limit of open files per process with:

Performance Considerations

Currently, the data source and destinations copy table row and CDC events one at a time. This is expected to be slow. Batching and other strategies will likely improve the performance drastically. But at this early stage, the focus is on correctness rather than performance. There are also zero benchmarks at this stage, so commentary about performance is closer to speculation than reality.

Distributed under the Apache-2.0 License. See LICENSE for more information.

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4