A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from http://magic.io/blog/asyncpg-1m-rows-from-postgres-to-python/ below:

1M rows/s from Postgres to Python — magicstack

asyncpg is a new fully-featured open-source Python client library for PostgreSQL. It is built specifically for asyncio and Python 3.5 async / await. asyncpg is the fastest driver among common Python, NodeJS and Go implementations.

asyncpg

Why asyncpg?

We are building EdgeDB—the next generation object database with PostgreSQL as a backing store. We need high-performance, low-latency access to the advanced features of PostgreSQL.

The most obvious option was psycopg2—the most popular Python driver for PostgreSQL. It is well-supported, stable, proven technology. There is also aiopg, which provides async interface on top of psycopg2. With that there is an obvious question: why reinvent the wheel? Short answer is twofold: features and performance. We will cover each item in detail below.

Data Type Support

Our biggest gripe with psycopg2 is its mediocre support for handling PostgreSQL data types, especially arrays and composite types. Rich data type system is one of the hallmarks of PostgreSQL. And yet, out of the box psycopg2 only supports simple builtin types like integers, strings, and timestamps, forcing the users to write custom “typecasters” for everything else. This is cumbersome and inefficient.

The reason is fundamental: psycopg2 exchanges data with the database server in text format. This necessitates a non-trivial amount of parsing, especially so for complex types.

Unlike psycopg2, asyncpg implements PostgreSQL binary I/O protocol, which, aside from performance benefits, allows for generic support of container types (arrays, composites and ranges).

Prepared Statements

asyncpg extensively uses PostgreSQL prepared statements. This is an important optimization feature, as it allows to avoid repeated parsing, analysis, and planning of queries. Additionally, asyncpg caches the data I/O pipeline for each prepared statement.

Prepared statements in asyncpg can be created and used explicitly. They provide an API to fetch and introspect query results. Most query methods are also exposed on the connection object directly, and asyncpg will create and cache a prepared statement implicitly.

Ease of Deployment

Another important feature of asyncpg is that it has zero dependencies. Direct implementation of PostgreSQL protocol means that there is no need for libpq to be installed, and you can just pip install asyncpg. Additionally, we provide binary wheels for Linux and macOS (Windows support is planned.)

It soon became evident that by implementing PostgreSQL frontend/backend protocol directly, we can yield significant speed improvements. Our earlier experience with uvloop has shown that Cython can be used to build very efficient libraries. asyncpg is written almost entirely in Cython with careful buffer management and highly optimized data decoding pipeline.

The result is that asyncpg is, on average, at least 3x faster than psycopg2 (or aiopg). This result is remarkable, as psycopg2, written in C and optimized, is not slow at all. See the benchmarks section below for more.

uvloop

Similarly to uvloop, we created a standalone toolbench to measure and report the performance of asyncpg and other PostgreSQL driver implementations. We measured query throughput (in rows per second) and latency. The main purpose of this benchmark is to measure the driver overhead.

pgbench

For fairness, all tests were run in a single-thread (GOMAXPROCS=1 for Go code) in async mode. Python drivers were run under uvloop.

The benchmark results featured in this post were obtained from a bare-metal server with the following setup:

Driver Implementations:

Each test consisted of running queries in a tight loop with 8 concurrent connections to the database server for 10 seconds, with 5 second warmup time.

The charts show the geometric average of results obtained by running four types of queries:

We firmly believe that high-performance and scalable systems in Python are possible. For that we need to put maximum effort into making fast, high-quality drivers, event loops, and frameworks.

asyncpg is another step in that direction. It is the result of careful design fuelled by our experience creating uvloop and using Cython and asyncio efficiently.

Previous post: uvloop: Blazing fast Python networking.

magicstack is a Toronto-based team of software engineers. Feel free to drop us a line at hello@magic.io.


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4