Build a Real-time Data Pipeline during the weekend in Go — Part 1

Wei Huang
Wei Huang
Oct 14, 2018 · 5 min read

I’m lucky enough working in different industries during the last ten years, Startup, Consultation, Finance, Retail, etc. and I was part of the digital transformation in most of those companies.

When I was working on projects as IT, we have always been challenged…

Why IT is so expensive and time consume? How can we make it better?


Last long weekend, I have some free time, so I decide to build a Real-time Data Pipeline during the weekend, by doing this I could also explain why IT is expensive and time consume?

IT is not just build, it is Design, Build, Support and Improve.


Design phase

Here is the Pipeline stacks: Kafka, InfluxDB and Grafana.

Architecture Design

Seems this is quite simple, however, think and design shall always with production in mind, which are:

  1. Testing: Unit Test (up to 80% test coverage), Integration Test etc.
  2. Deployment & Versioning: CI/CD automatic deployment pipeline
  3. Monitoring: Traceability, real-time support, logging
  4. Low latency: Because of real-time requirement, so low latency is the liability
  5. Scalability: How this pipeline can handle a large amount of data? Can it?
  6. Security: Make sure our pipeline operate without any security concerns.

Those all the basic factors (or even we can follow the 12 factors) as IT shall be considered, otherwise will end up build something really quick, but it can not stand or will have so much technical debt.

Wrong design, after all :)

Build phase

Prepare the Development Environment

First thing first, set up the Development Environment. I still remember when I start my journey of development, the most painful period was set up the development Environment, special like this kind of project (more than 3 systems), we need to download different services (jar, binary etc.) in order to get it open running locally, however thanks for the container technology, by using simple docker-compose.yml file, we can bootstrap all the key applications/components we needed.

See, how simple is it?

Development / Coding

Let’s start the fun part, Go is my recent premiere development language, but why I chose to Go for the Data Pipeline development, because of its simplicity, performances and type safe. Here I’m not going to explain the benefit of Go, if you are interested in, please check Golang.org.

As usual, if you check my previous post

How to build a MachineBox.io API with 100% unit testing coverage by using Go

You will know, I will use the same application structure.

1. Build the connection by creating the Env Struct

2. Using interfaces

You may be familiar with the use of interfaces from working through the Go walkthrough or from the official documentation. However, the beauty of using interfaces in Go, is you can define a set of methods a type (often struct) must define to be considered an implementation of that interface.

When any given type implements all the methods of that interface, the Go compiler automatically knows that it is allowed to be used as that type.

3. Logging

Logging is hard!

If you have already worked as Application Production Support relative function. You will appreciated the developer who logs the information about how the application run, when and how this application failed, which file, which function. (A Plus+)

Logging Example

I use the Uber Zap libs. And I also wrapper into a helper which can be use in different project different level. For me, I prefer return lower lever error to the main implementation.

Logging implementation

4. Unit Test

Writing unit test can ensure our code is working as expected, is much less “expensive” than if a bug or regression makes it all the way to a release.

If you still remember the #1 interface implementation, and the other benefit about the interface implementation is help you easy write testable code.

Example one of the unit testing

Unit test coverage is an interested subject, I use to crazy enough to trace the 100% coverage, however, I learn my lessons, the most important about unit test is:

How confident you are about your code and make sure the CI/CD automation catch as much as possible the coding level regression bugs.

go test -cover

End Result

End Result — gif

Support Phase

This pipeline is only built over the weekend, let’s say, 10 hours total design, development/coding. Because of the unit testing coverage, logging, and local integration test. I pretty confident its production ready, however…

Improvement Phase

The only thing missing here, is the security implementation, environment level, and code level. this is the most important part if you want to ship to production.

This Article mostly focuses on the Design and Build. Next post will more focus on Support and Improvement. :)

Part 2:

Wei Huang

Written by

Wei Huang

Love to learn. Learn to success.