A handy tool for parallelization If you are reading this post, is because you have heard about parallel. It is a GNU developed tool for paralelizing commands across sets of arguments. I’ll be honest, I use to forget about its existence until I need to do quick and dirty things when dealing with tasks that require paralelization, and whenever time is a constraint.
It has plenty of options and arguments, although once you get up on its usage, it is very handy in many cases.
This is my current MacOS Layout.
A few months ago, I bought the Ergodox OrthoLinear Split Keyboard. After a period of trying it, I’ve found that the QWERTY layout was considerably uncomfortable to type in the keyboard disposition. I wasn’t the only one, of course, so I started the journey for searching a suitable layout. During a period of time, I switched back and forth between three layouts (QWERTY, Dvorak, and Colemak), although I picked Colemak at the end.
Background There was a contest in an internet thread about how to code a variable swap as low-level and minimal as possible. So, I ended up writing a low-level function in C calling instructions for a modern amd64 platform.
#include <stdio.h> #include <stdlib.h> void swapshit(int *arg1, int *arg2) { __asm__ __volatile__ ("movl %2, %%eax;" "movl %3, %%ebx;" "movl %%eax, %0;" "movl %%ebx, %1;" : "=g" (*arg2) , "=g" (*arg1) : "a" (*arg1), "b" (*arg2) ); } int main(int argc, char *argv[]) { int arg1 = atoi(argv[1]); int arg2 = atoi(argv[2]); swapshit(&arg1,&arg2); printf("arg1 es %d , arg2 es %d\n",arg1,arg2); return 0; } The explanation of the above is quite simple.
This post has no intention to discuss what layout might be better in terms of efficiency or effort. There are a considerable amount of great articles related to all the analisys across different layouts, and even tools for ingesting text and estimate the effort of typing the text for several layouts.
You’ll find this article more useful if you already decided to switch to Colemak, and make it your primary layout.
Why sampling is important and what you need to be aware of? When dealing with very large amount of data, you probably want to run your queries only for a smaller dataset in your current tables. Specially if your dataset is not fitting in RAM.
MergeTree is the first and more advanced engine on Clickhouse that you want to try. It supports indexing by Primary Key and it is mandatory to have a column of Date type (used for automatic partitioning).
Scope If you heard about Clickhouse and you are wondering how to test with your residing data in Redshift, here is a command that will show you a few tips to make you speed up.
Update (July 4th): There is a serie of posts about Clickhouse vs Redshift comparisons, the first post is this one.
The standard wat to move your data out of Redshift is by using UNLOAD command, which pushes the output into S3 files.