Postgres partitioning as easy as pie. Works great for both new and existing tables, with zero downtime and minimal app changes. No need to install anything on your database server. Archive older data on a rolling basis to keep your database size under control.
🍊 Battle-tested at Instacart
pgslice is a command line tool. To install, run:
This will give you the pgslice
command. If installation fails, you may need to install dependencies.
You can also install it with Homebrew or Docker.
Ensure the table you want to partition has been created. We’ll refer to this as <table>
.
Specify your database credentials
export PGSLICE_URL=postgres://localhost/myapp_development
pgslice prep <table> <column> <period>
The column should be a timestamp
, timestamptz
, or date
column and period can be day
, month
, or year
.
This creates a partitioned table named <table>_intermediate
using range partitioning.
pgslice add_partitions <table> --intermediate --past 3 --future 3
Use the --past
and --future
options to control the number of partitions.
Use the --batch-size
and --sleep
options to control the speed (defaults to 10000
and 0
respectively)
To sync data across different databases, check out pgsync.
The original table is renamed <table>_retired
and the intermediate table is renamed <table>
.
pgslice fill <table> --swapped
pg_dump -c -Fc -t <table>_retired $PGSLICE_URL > <table>_retired.dump psql -c "DROP TABLE <table>_retired" $PGSLICE_URL
pgslice prints the SQL commands that were executed on the server. To print without executing, use the --dry-run
option.
pgslice prep visits created_at month
BEGIN; CREATE TABLE "public"."visits_intermediate" (LIKE "public"."visits" INCLUDING DEFAULTS INCLUDING CONSTRAINTS INCLUDING STORAGE INCLUDING COMMENTS INCLUDING STATISTICS INCLUDING GENERATED INCLUDING COMPRESSION) PARTITION BY RANGE ("created_at"); CREATE INDEX ON "public"."visits_intermediate" USING btree ("created_at"); COMMENT ON TABLE "public"."visits_intermediate" is 'column:created_at,period:month,cast:date,version:3'; COMMIT;
pgslice add_partitions visits --intermediate --past 1 --future 1
BEGIN; CREATE TABLE "public"."visits_202408" PARTITION OF "public"."visits_intermediate" FOR VALUES FROM ('2024-08-01') TO ('2024-09-01'); ALTER TABLE "public"."visits_202408" ADD PRIMARY KEY ("id"); CREATE TABLE "public"."visits_202409" PARTITION OF "public"."visits_intermediate" FOR VALUES FROM ('2024-09-01') TO ('2024-10-01'); ALTER TABLE "public"."visits_202409" ADD PRIMARY KEY ("id"); CREATE TABLE "public"."visits_202410" PARTITION OF "public"."visits_intermediate" FOR VALUES FROM ('2024-10-01') TO ('2024-11-01'); ALTER TABLE "public"."visits_202410" ADD PRIMARY KEY ("id"); COMMIT;
/* 1 of 3 */ INSERT INTO "public"."visits_intermediate" ("id", "user_id", "ip", "created_at") SELECT "id", "user_id", "ip", "created_at" FROM "public"."visits" WHERE "id" > 0 AND "id" <= 10000 AND "created_at" >= '2024-08-01'::date AND "created_at" < '2024-11-01'::date /* 2 of 3 */ INSERT INTO "public"."visits_intermediate" ("id", "user_id", "ip", "created_at") SELECT "id", "user_id", "ip", "created_at" FROM "public"."visits" WHERE "id" > 10000 AND "id" <= 20000 AND "created_at" >= '2024-08-01'::date AND "created_at" < '2024-11-01'::date /* 3 of 3 */ INSERT INTO "public"."visits_intermediate" ("id", "user_id", "ip", "created_at") SELECT "id", "user_id", "ip", "created_at" FROM "public"."visits" WHERE "id" > 20000 AND "id" <= 30000 AND "created_at" >= '2024-08-01'::date AND "created_at" < '2024-11-01'::date
ANALYZE VERBOSE "public"."visits_202408"; ANALYZE VERBOSE "public"."visits_202409"; ANALYZE VERBOSE "public"."visits_202410"; ANALYZE VERBOSE "public"."visits_intermediate";
BEGIN; SET LOCAL lock_timeout = '5s'; ALTER TABLE "public"."visits" RENAME TO "visits_retired"; ALTER TABLE "public"."visits_intermediate" RENAME TO "visits"; ALTER SEQUENCE "public"."visits_id_seq" OWNED BY "public"."visits"."id"; COMMIT;
To add partitions, use:
pgslice add_partitions <table> --future 3
Add this as a cron job to create a new partition each day, month, or year.
# day 0 0 * * * pgslice add_partitions <table> --future 3 --url ... # month 0 0 1 * * pgslice add_partitions <table> --future 3 --url ... # year 0 0 1 1 * pgslice add_partitions <table> --future 3 --url ...
Add a monitor to ensure partitions are being created.
SELECT 1 FROM pg_catalog.pg_class c INNER JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace WHERE c.relkind = 'r' AND n.nspname = 'public' AND c.relname = '<table>_' || to_char(NOW() + INTERVAL '3 days', 'YYYYMMDD') -- for months, use to_char(NOW() + INTERVAL '3 months', 'YYYYMM') -- for years, use to_char(NOW() + INTERVAL '3 years', 'YYYY')
Back up and drop older partitions each day, month, or year.
pg_dump -c -Fc -t <table>_202409 $PGSLICE_URL > <table>_202409.dump psql -c "DROP TABLE <table>_202409" $PGSLICE_URL
If you use Amazon S3 for backups, s3cmd is a nice tool.
s3cmd put <table>_202409.dump s3://<s3-bucket>/<table>_202409.dump
Once a table is partitioned, make schema updates on the master table only (not partitions). This includes adding, removing, and modifying columns, as well as adding and removing indexes and foreign keys.
To undo prep (which will delete partitions), use:
To undo swap, use:
Set the tablespace when adding partitions
pgslice add_partitions <table> --tablespace fastspace
This set up allows you to read and write with the original table name with no knowledge it’s partitioned. However, there are a few things to be aware of.
When possible, queries should include the column you partition on to limit the number of partitions the database needs to check. For instance, if you partition on created_at
, try to include it in queries:
SELECT * FROM visits WHERE user_id = 123 AND -- for performance created_at >= '2024-09-01' AND created_at < '2024-09-02'
For this to be effective, ensure constraint_exclusion
is set to partition
(the default value) or on
.
SHOW constraint_exclusion;
Specify the primary key for partitioned models to ensure it’s returned.
class Visit < ApplicationRecord self.primary_key = "id" end
Please submit a PR if additional configuration is needed.
You can also use pgslice to reduce the size of a table without partitioning by creating a new table, filling it with a subset of records, and swapping it in.
pgslice prep <table> --no-partition pgslice fill <table> --where "id > 1000" # use any conditions pgslice analyze <table> pgslice swap <table> pgslice fill <table> --where "id > 1000" --swapped
Triggers aren’t copied from the original table. You can set up triggers on the intermediate table if needed.
Always make sure your connection is secure when connecting to a database over a network you don’t fully trust. Your best option is to connect over SSH or a VPN. Another option is to use sslmode=verify-full
. If you don’t do this, your database credentials can be compromised.
With Homebrew, you can use:
Get the Docker image with:
docker pull ankane/pgslice alias pgslice="docker run --rm -e PGSLICE_URL ankane/pgslice"
This will give you the pgslice
command.
If installation fails, your system may be missing Ruby or libpq.
On Mac, run:
On Ubuntu, run:
sudo apt-get install ruby-dev libpq-dev build-essential
Run:
To use master, run:
gem install specific_install gem specific_install https://github.com/ankane/pgslice.git
Also check out:
View the changelog
Everyone is encouraged to help improve this project. Here are a few ways you can help:
To get started with development:
git clone https://github.com/ankane/pgslice.git cd pgslice bundle install createdb pgslice_test bundle exec rake test
To test against different versions of Postgres with Docker, use:
docker run -p=8000:5432 postgres:16 TZ=Etc/UTC PGSLICE_URL=postgres://postgres@localhost:8000/postgres bundle exec rake
On Mac, you must use Docker Desktop for the port mapping to localhost to work.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4