Introduction

On many occasions, it happened that I needed to move or sync data between two different systems. Some of these examples being moving data between PostgreSQL and S3, Elasticsearch and rabbitmq, MySQL and BigQuery, and other such examples. It's not always obvious what tool to use in these cases and people resort to writing custom scripts, use proprietary and many times expensive services or deploy complex solutions for seemingly easy tasks. In this article, I will explain how logstash from elastic can be used for these tasks and give one practical example.

All configuration used in this blog post is also avalible on: https://github.com/xtruder/playground/tree/master/logstash-examples

Shipping the data

Having one central database is nice and can in many cases support many different tasks, but as the team grows, we soon get more complex requirements. For example, exporting old database records for archival or exporting data to separate analytical databases are just two common examples that we soon need to tackle. These can be solved in various different ways, but primarily people resort to these solutions:

Writing custom scripts can be the most straightforward solution, but it usually requires adding additional code to your application. It might look easy to implement, but edge cases can make it error-prone and hard to test.

Using well-established batch or stream processing systems is usually the most reliable way, but can be quite costly to run and in many cases is too complex compared to simple alternatives.

Other simple alternatives include stream processing tools like:

These tools provide a middle-ground for stream data processing. They are much more simple to deploy and maintain, but of course have their limitations, like durability and scalability. We are going to discuss how logstash solves durability issues later using persistent queues.

From postgresql to S3

In this section we are going to explore how we shipped data from postgresql to S3 using logstash.

The problem