Certain patterns of data access, and especially data updates, can exacerbate strain on the database. Avoid them if possible.
This document lists some patterns to avoid, with recommendations for alternatives.
High-frequency updates, especially to the same rowAvoid single database rows that are updated by many transactions at the same time.
This pattern often happens when an aggregate is too expensive to compute for each request, so a running tally is kept in the database. If you need such an aggregate, consider keeping a running total in a single row, plus a small working set of recently added data, such as individual increments:
PostgreSQL organizes rows into 8 KB pages, and operates on one page at a time. By minimizing the width of rows in a table, we improve the following:
Mitigating wide tables is one part of the database team’s 100 GB table initiative, as wider tables can fit fewer rows in 100 GB.
When adding columns to a table, consider if you intend to access the data in the new columns by itself, in a one-to-one relationship with the other columns of the table. If so, the new columns could be a good candidate for splitting to a new table.
Several tables have already been split in this way. For example:
search_data
is split from issues
.project_pages_metadata
is split from projects
.merge_request_diff_details
is split from merge_request_diffs
Certain tables, like users
, namespaces
, and projects
, can get very wide. These tables are usually central to the application, and used very often.
Why is this a problem?
A possible solution to this problem is to keep only the most important columns on the main table, and extract the rest into different tables, having one-to-one relationship with the main table. Good candidates are columns that are either very frequently updated, for example last_activity_at
, or columns that are rarely updated and/or used, like activation tokens.
The trade-off that comes with such extraction is that index-only scans are no longer possible. Instead, the application must either join to the new table or execute an additional query. The performance impacts of this should be weighed against the benefits of the vertical table split.
There is a very good episode on this topic on the PostgresFM podcast, where @NikolayS of PostgresAI and @michristofides of PgMustard discuss this topic in more depth - https://postgres.fm/episodes/data-model-trade-offs.
ExampleLets look at the users
table, which at of the time of writing has 75 columns. We can see a few groups of columns that match the above criteria, and are good candidates for extraction:
encrypted_otp_secret
, otp_secret_expires_at
, etc. There are few of these columns, and once populated they should not be updated often (if at all).confirmation_token
, confirmation_sent_at
, and confirmed_at
. Once populated these are most likely never updated.password_expires_at
, last_credential_check_at
, and admin_email_unsubscribed_at
. Such columns are either updated very often, or not at all. It will be better if they are in a separate table.unlock_token
, incoming_email_token
, and feed_token
.Let’s focus on users.incoming_email_token
- every user on GitLab.com has one set, and this token is rarely updated.
In order to extract it from users
into a new table, we’ll have to do the following:
While this is a lengthy process, it’s needed in order to do the extraction without disrupting the application. Once completed, the original column and the related index will no longer exists on the users
table, which will result in improved performance.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4