A customer needed to upgrade their Aurora RDS Postgres from v13 to v16 due to v13’s EOL. This is a common requirement, and so far, a straightforward process.
The complexity arose when they needed to upgrade their application model, which was taking a long time to run due that most of the tables required a full rewrite.
The Django migration was taking several hours to run, which made a blue/green upgrade and subsequent migration execution unfeasible due to the long downtime it would incur.
Even tho is not documented in the official documentation, it is possible to avoid storing archives on PGBackRest setups. In certain scenarios, you may want to disregard storing archives for reducing costs, specially if you are using cloud storage for your backups.
Those situations could be:
Tests environments, whether you run intensive tests but you don’t need or care about PITR (Point in Time Recovery).
Also, you want to test your backup configuration rather than just having logical dumps in test environments.
Highly updated Postgres databases, where you can recover changes by other means, such as scrapping or restoring from external sources.
⚠️
Not storing archives may lead you to data loss if you don’t have a proper strategy for handling the deltas.
You still need to store archives during backup execution if you want online backups. Otherwise, they won’t be recoverable.
If you don’t want to store archives at all, you need to run backup offline. That is, stopping your instance and executing pgbackrest backup.
As of today, PGCat supports 2 mechanisms of sharding through extended syntax:
By setting the shard explicitly: SET SHARD TO '<index>';, which allows you to do deterministic sharding, whether you choose
your shard according to a rule, such a lookup, region, customer’s group, etc.
This is great if you have one of those well delimited or an even distribution. But, still a nice approach and kind scalable.
We won’t be focusing on this strategy in this post, as its implementation relies on custom requirements.
Setting sharding_function to use one of the available functions: pg_bigint_hash and sha1. The extended syntax SET SHARDING KEY TO '<value>'; will calculate the index. Not very clear from docs how sha1 function is used, so this post will focus on pg_bigint_hash case.
Shard by hash is a bold strategy, particularly if you expect to have a large workload, and you need to have enough compute across all shards.
This extended syntax can be done through comments, see pgcat Sharding documentation.
In this laboratory, we’ll focus on the pg_bigint_hash function. It is not clear from PGCat’s documentation how sha1 should be implemented, by I’ll extend the laboratory to cover it – that is, if I overcome my skill issues :P .
At this point, you may be aware of the complexities of implementing sharding, and what limitations we expect from the hashing approach.
Keep in mind that this PGCat feature is tied to the Postgres partition, based on the same HASH_PARTITION_SEED.
See also pgcat hash seed.
ℹ️
Laboratory for this post can be found at babelfishpg-lab.
⚠️
In this post, we cover 4.2.0 version. There is a slight difference in the logging settings and format from 4.1.1.
Even tho BabelfishPG is a Postgres flavor, the configuration is done through its extensions (babelfish_tds and babelfish_tsql). In this post, we focus on how to log query timings and enable showing query plans.
For Postgres vanilla, you can log slow queries by setting the log_min_duration_statements, but this won’t log TSQL statements. Queries execute through a different backend fork, so targeted statements executed through the TDS protocol, won’t get logged.
In pgbouncer’s documentation it’s stated that the query_timeout should be set slightly
higher that Postgres’ statement_timeout. Although this apply in most of the cases, it depends
on the business requirements.
Generally, the statement_timeout should be set to the 99th percentile of your statements duration.
However, there are cases were certain statements require larger timeouts, due to particularities like
a large customer set, or larger fields, as in TOAST compression cases.
Some things happen once in a lifetime, and the story around this is quite particular.
A customer required something that at the beginning sound counter intuitive: migrate an existing Postgres database to BabelfishPG.
The thing was that the application was a critical business core, with a large amount of code that would require years to migrate fully for supporting another storage strategy. But the real reason was that their customer didn’t want to stick to any private licensing model, and required to use Open Source solutions.
Next post will cover performance tests using tdspool.
BabelfishPG connection architecture
Inherited from Postgres connection architecture, each connection trhough the TDS port will
instance a Postgres backend. As in Postgres, BabelfishPG needs a middleware for funnel
connections through the TDS port for avoiding running out of connections and processing capacity
in the database server.
For Postgres, we have plenty of options, like PGBouncer, Odyssey, pgcat, name it.
For T-SQL (read as MSSQL compatible language), there aren’t many open sourced solutions.