Open Labs

Database Laboratories During the last years, I’ve been working on several open labs for Database trainings and webinars for OnGres and CanalDBA community. Some of these laboratories may look out-of-date, but they should commit their purpose for references and examples on how to setup them with few lines of code and with the most popular technologies. If you are willing to start with Ansible, Terraform, Docker and others for Databases, you may found this resources interesting for you.

Ansible and Kubernetes

Install minikube or similar curl -Lo minikube https://storage.googleapis.com/minikube/releases/v1.2.0/minikube-darwin-amd64 &&\ chmod +x minikube && sudo cp minikube /usr/local/bin/ && rm minikube Getting the token Access Kubernetes API Ansible k8s K8S_AUTH_API_KEY in environment. Getting facts with k8s_fact

Google Cloud TCP Internal Load Balancing with HTTP Health Checks in Terraform for stateful services

Components and HCL code snippets to plug in iLB with your stateful services through HTTP API and TCP protocol.

Clickhouse sampling on MergeTree engine.

Why sampling is important and what you need to be aware of? When dealing with very large amount of data, you probably want to run your queries only for a smaller dataset in your current tables. Specially if your dataset is not fitting in RAM. MergeTree is the first and more advanced engine on Clickhouse that you want to try. It supports indexing by Primary Key and it is mandatory to have a column of Date type (used for automatic partitioning).

Import data from Redshift into Clickhouse in a single command.

Scope If you heard about Clickhouse and you are wondering how to test with your residing data in Redshift, here is a command that will show you a few tips to make you speed up. Update (July 4th): There is a serie of posts about Clickhouse vs Redshift comparisons, the first post is this one. The standard wat to move your data out of Redshift is by using UNLOAD command, which pushes the output into S3 files.

postgres_fdw estimated overhead

Concept In the current concept, we are going to combine Foreign tables inheritance with the postgres_fdw extension, both being already available features since 9.5 version. Cross-node partitioning allows a better data locality and a more scalable model than keeping local partitions. Being said, the data will be split into several nodes and organized using a particular key, which will determine in which shard data will be allocated. For the current POC, we are going to specify the shardKey , which is a simple char(2) type.

Simple and manual sharding on PostgreSQL.

Concept In the current concept, we are going to combine Foreign tables inheritance with the postgres_fdw extension, both being already available features since 9.5 version. Cross-node partitioning allows a better data locality and a more scalable model than keeping local partitions. Being said, the data will be split into several nodes and organized using a particular key, which will determine in which shard data will be allocated. For the current POC, we are going to specify the shardKey , which is a simple char(2) type.

Connecting Postgres and Kafka rawly

Apache Kafka and Postgres: Transaction and reporting capabilities Apache Kafka is a well known distributed streaming platform for data processing and consistent messaging. It allows you to consistently centralize data streams for several purposes by consuming and producing them. One of the examples of a nice implementation, is the Mozilla’s Data pipeline implementation, particularly as it shows Kafka as an entry point of the data flow. This allows you to plug new data stores bellow its stream, making it easy to use different data store formats ( such as DRBMS or Document, etc.

Highlighting Postgres 10 new features: Logical Replication and Partitioning.

Heya! I this article we are going to explore two of the major features commited in the upcoming PostgreSQL release: Logical Replication and Partitioning. Needeless to say that these features aren’t yet available in the stable release, so they are prune to change or extended. Advertising warning! The current article is just a sneak peak of the upcoming talk Demystifying Logical Replication on PostgreSQL at Percona Live Santa Clara 2017. Get your tickets!

binlogtop

Currently binlogtop is in status WIP and very alpha state, although there is a python script doing sort of the same thing is: binlogEventStats. Hey! Replication is delaying, don’t know what’s happening. binlogtop just do a real-time streaming statistics. What has inside? golang If you want to indicate issues, go for it at report issues.

MyRocks Views

MyRocks is an storage engine available also in MongoDB, focused on performance and space saving. It is a LSM tree, with Bloom filtering for unique keys, providing steady performance in limited amount of cache. Installing can be done through a 5.6 fork, repository here. Installing is easy as importing the sql file into your database. Repository can be found here. What has inside? SQL

Go-Plus and Atom GOPATH fix

The background Golang is an awesome language, but I found it pretty unstable within the environment variables (at least in macOS Sierra/El Capitan). gvm is your friend btw, and it helped me to fix some of the issues by installing the latest release candidate of the 1.7.1 series. Keep in mind that if you want to upgrade your macOS to Sierra, you’ll need to backup all of your environment variables and reinstall gvm.

PostgreSQL RDS pg-stat-ramdisk-size new feature and its calculations

IMPORTANT NOTE This has been already addressed in PostgreSQL core, but this option is still available in RDS. What does it change and why is so important? Tracking databases and not just tables counters in Postgres isn’t cheap, but since some time ago there were workarounds involving the setup of a ramdisk to place the directory pointed by stat_temp_directory GUC variable. That directory places a global.stat and a per-database stat files called like db_<oidOfDB>.

HOWTO Percona Server with Docker for DBAs

Before starting the container This article is not an introductory explanation of docker,however it’s scope if for docker’s beginners. You can consider it as an extension of the well documented Percona docker hub doc. For the source code of the image, the repository is at github. Here is the all what you need to do for start: 1 docker run --name percona57 -e MYSQL_ROOT_PASSWORD=<a_password> -d percona:5.7 For checking the container status log, you can execute docker logs percona57.

Random data for PostgreSQL

Usually, you want to test code, try new features or just need decent content when generating random data. Random databases for Postgres addresses this need by simple SQL scripts that can be entirely generated inside your database. It isn’t a sophisticated repository, although it is practical for quick usage. What has inside? Pure SQL If you want to indicate issues, go for it at report issues.

MySQL 5.7 InnoDB's Full Text Search overview.

Main application: InnoDB’s Full Text Search overview (with Shiny/R) Credits Author: Emanuel Calvo Company: Pythian Thanks to Valerie Parham-Thompson @ Pythian and Daniel Prince @ Oracle. Repository available at Github. For the whole article and the Shinyapp application is available here. Some initial thoughts A couple of days ago one of our customers came up with a question regarding FTS over InnoDB engine. Although the question is not answered in the current article, I came up with the conclusion that FTS is sometimes misunderstood.

Multi source data injection to Postgres RDS with encryption and FTS support

Sponsored: Pythian Inc. Note 1: All the of this presentation is published in this repository. You will find a lot of folders and information, probably part of a blog series. Note 2: All the work on this article is a POC (Proof of concept). Note 3: This is something that is related for HIPAA compliant. KMS/RDS The POC on this article was developed before the releasing of the Key Management service for RDS.

Alambre, ugly scripts to save time of your day

As a DBA we usually have to do dirty things under a running clock. Using that moto, I started a repository to start collecting those nasty things that save time a day. That’s what alambre does: small and complex things. What has inside? bash (preferred) python SQL If you want to indicate issues, go for it at report issues.

Posts in Getting Started