postgres_fdw estimated overhead
How much overhead is added by using postgres_fdw Foreign Data Wrappers?
By 3manuek in Postgres
March 6, 2017
Concept
In the current concept, we are going to combine Foreign tables inheritance with
the postgres_fdw
extension, both being already available features since 9.5 version.
Cross-node partitioning allows a better data locality and a more scalable model
than keeping local partitions. Being said, the data will be split into several
nodes and organized using a particular key, which will determine in which shard
data will be allocated. For the current POC, we are going to specify the shardKey
, which is a simple char(2)
type.
How this was done before
Until today, the only way to perform findings over this method, was from the application layer, by issuing queries directly to the nodes by keeping certain deterministic way as {1} or using a catalog table as {2} (NOTE: the bellow examples are using pseudo code).
{1}
query = "SELECT name,lastname FROM " + relation + partition " WHERE " id =" + person_id
{2}:
shard = query("SELECT shard FROM catalog WHERE key = " + person_id)
query = "SELECT name,lastname FROM " + relation + shard " WHERE " id =" + person_id
How we are going to implement this now
As foreign tables (FT) does not hold any data, it is possible to keep copies aroud all the databases involved and also in separated instances if this is necessary.
All the operations against the table will be done through the parent table of the FT tree tables and Postgres itself will determine the destination FT using the constraint exclusion feature, which will be detailed further.
For HA, you are limited on the data nodes to implement any other replication
solution available in the core version. To be fair, 9.6 supports streaming replication
and logical decoding, which is used by the pglogical
tool for providing advanced
logical replication per table basis.
Foreign tables
Foreign tables do not contain data by itselves and they only reference to a external
table on a different Postgres database. There are plenty of different extensions
allowing external tables on different data store solutions, but in this particular
article we are going to focus on postgres_fdw
as we want to explore more about
condition pushdowns, which makes queries against these tables more performant
on more complex queries.
A more extensive benchmark can be found at my next article.
The framework underlying for the Foreign Data Wrappers, support both reads and
write operations. postgres_fdw
is not the exception and does also support condition
pushdown for avoiding large scans on the source tables.
On each database holding the FT, you need to invoke the extension creation:
|
|
FT have two main elements,necessary to point correctly both in source as in user privileges. If you are paranoic enough, you’ll prefer to use unprivileged users with limited grants over the tables that you use.
- Server
- User Mapping
{1}
|
|
{2}
|
|
The FT definition is indeed pretty straightforward if we don’t want to do any further column filtering:
|
|
Writable FDWs
Even if I don’t recommend the following approach, it can be very easy to centralize the writes to the shards through the FT. Although, it requires to code a trigger for managing this. Currently, the minimum transaction level for foreign tables is REPEATABLE READ, but it will probably change in future versions.
A very simplistic approach for an INSERT trigger will be like bellow:
|
|
Data on shards
As shards contain data, the declaration ends up to be a common table within the necessary suffix for localization:
|
|
A simple test could be done by issuing:
|
|
You probably are intuiting that the above statement inserts data on both nodes, and the trigger will derive the row accordingly to the corresponding shard.
NOTE: the shard number is generated by
random()*1+1
which output rounds between 1 and 2.
Grab them from the hidden columns
Querying data can be nicely transparent, as shown bellow. The tableoid
in this
particular case can be misleading, as the oid
reported are those from the nodes,
not the local machine. It is used just to show that they’re effectively different
tables:
|
|
For example, retrieving a single row is easy as:
|
|
Behind the scenes, the pushed query to the remote servers contains the corresponding
filter ((key = 1500)
) and locally, the constraint exclusion allows to avoid further
scans into the other child FT.
|
|
Even if we don’t want to provide the shardKey, the key
filter will be pushed across
all the shard nodes. If your keys aren’t unique across shards, you’ll get a multi-row
result set.
|
|
Considerations
Foreign Data Wrappers for Postgres are such a great extension, but it comes at a price with a visible overhead in high intensive transactional workloads.
Hope you liked the article!
- Posted on:
- March 6, 2017
- Length:
- 6 minute read, 1242 words
- Categories:
- Postgres
- Series:
- Getting Started
- Tags:
- hugo-site