Import data from Redshift into Clickhouse in a single command.

Importing and explaning the process.

- 5 mins read

Series: Clickhouse

Scope If you heard about Clickhouse and you are wondering how to test with your residing data in Redshift, here is a command that will show you a few tips to make you speed up. Update (July 4th): There is a serie of posts about Clickhouse vs Redshift comparisons, the first post is this one. The standard wat to move your data out of Redshift is by using UNLOAD command, which pushes the output into S3 files.

postgres_fdw estimated overhead

How much overhead is added by using postgres_fdw Foreign Data Wrappers?

- 6 mins read

Series: Postgres

Concept In the current concept, we are going to combine Foreign tables inheritance with the postgres_fdw extension, both being already available features since 9.5 version. Cross-node partitioning allows a better data locality and a more scalable model than keeping local partitions. Being said, the data will be split into several nodes and organized using a particular key, which will determine in which shard data will be allocated. For the current POC, we are going to specify the shardKey , which is a simple char(2) type.

Simple and manual sharding on PostgreSQL.

Foreign Data Wrappers inheritance.

- 6 mins read

Series: Postgres

Concept In the current concept, we are going to combine Foreign tables inheritance with the postgres_fdw extension, both being already available features since 9.5 version. Cross-node partitioning allows a better data locality and a more scalable model than keeping local partitions. Being said, the data will be split into several nodes and organized using a particular key, which will determine in which shard data will be allocated. For the current POC, we are going to specify the shardKey , which is a simple char(2) type.

Connecting Postgres and Kafka rawly

The dirty way using plain kafkacat

- 5 mins read

Series: Postgres

Apache Kafka and Postgres: Transaction and reporting capabilities Apache Kafka is a well known distributed streaming platform for data processing and consistent messaging. It allows you to consistently centralize data streams for several purposes by consuming and producing them. One of the examples of a nice implementation, is the Mozilla’s Data pipeline implementation, particularly as it shows Kafka as an entry point of the data flow. This allows you to plug new data stores bellow its stream, making it easy to use different data store formats ( such as DRBMS or Document, etc.

Highlighting Postgres 10 new features: Logical Replication and Partitioning.

And playing with retention policies.

- 7 mins read

Series: Postgres

Heya! I this article we are going to explore two of the major features commited in the upcoming PostgreSQL release: Logical Replication and Partitioning. Needeless to say that these features aren’t yet available in the stable release, so they are prune to change or extended. Advertising warning! The current article is just a sneak peak of the upcoming talk Demystifying Logical Replication on PostgreSQL at Percona Live Santa Clara 2017.

Go-Plus and Atom GOPATH fix

A fix for the unloaded GOPATH.

- 2 mins read

Series: Golang

The background Golang is an awesome language, but I found it pretty unstable within the environment variables (at least in macOS Sierra/El Capitan). gvm is your friend btw, and it helped me to fix some of the issues by installing the latest release candidate of the 1.7.1 series. Keep in mind that if you want to upgrade your macOS to Sierra, you’ll need to backup all of your environment variables and reinstall gvm.

PostgreSQL RDS pg-stat-ramdisk-size new feature and its calculations

If you are using RDS, you want to read this.

- 6 mins read

Series: Postgres

IMPORTANT NOTE: This has been already addressed in PostgreSQL core, but this option is still available in RDS. What does it change and why is so important? Tracking databases and not just tables counters in Postgres isn’t cheap, but since some time ago there were workarounds involving the setup of a ramdisk to place the directory pointed by stat_temp_directory GUC variable. That directory places a global.stat and a per-database stat files called like db_<oidOfDB>.