Why sampling is important and what you need to be aware of? When dealing with very large amount of data, you probably want to run your queries only for a smaller dataset in your current tables. Specially if your dataset is not fitting in RAM. MergeTree is the first and more advanced engine on Clickhouse that you want to try. It supports indexing by Primary Key and it is mandatory to have a column of Date type (used for automatic partitioning).
Scope If you heard about Clickhouse and you are wondering how to test with your residing data in Redshift, here is a command that will show you a few tips to make you speed up. Update (July 4th): There is a serie of posts about Clickhouse vs Redshift comparisons, the first post is this one. The standard wat to move your data out of Redshift is by using UNLOAD command, which pushes the output into S3 files.
Concept In the current concept, we are going to combine Foreign tables inheritance with the postgres_fdw extension, both being already available features since 9.5 version. Cross-node partitioning allows a better data locality and a more scalable model than keeping local partitions. Being said, the data will be split into several nodes and organized using a particular key, which will determine in which shard data will be allocated. For the current POC, we are going to specify the shardKey , which is a simple char(2) type.