Build a Real-time Data Pipeline during the weekend in Go — Part 1
I’m lucky enough working in different industries during the last ten years, Startup, Consultation, Finance, Retail, etc. and I was part of the digital transformation in most of those companies.
When I was working on projects as IT, we have always been challenged…
Why IT is so expensive and time consume? How can we make it better?
Last long weekend, I have some free time, so I decide to build a Real-time Data Pipeline during the weekend, by doing this I could also explain why IT is expensive and time consume?
IT is not just build, it is Design, Build, Support and Improve.
Here is the Pipeline stacks: Kafka, InfluxDB and Grafana.
Seems this is quite simple, however, think and design shall always with production in mind, which are:
- Testing: Unit Test (up to 80% test coverage), Integration Test etc.
- Deployment & Versioning: CI/CD automatic deployment pipeline
- Monitoring: Traceability, real-time support, logging
- Low latency: Because of real-time requirement, so low latency is the liability
- Scalability: How this pipeline can handle a large amount of data? Can it?
- Security: Make sure our pipeline operate without any security concerns.
Those all the basic factors (or even we can follow the 12 factors) as IT shall be considered, otherwise will end up build something really quick, but it can not stand or will have so much technical debt.
Prepare the Development Environment
First thing first, set up the Development Environment. I still remember when I start my journey of development, the most painful period was set up the development Environment, special like this kind of project (more than 3 systems), we need to download different services (jar, binary etc.) in order to get it open running locally, however thanks for the container technology, by using simple docker-compose.yml file, we can bootstrap all the key applications/components we needed.
See, how simple is it?
Development / Coding
Let’s start the fun part, Go is my recent premiere development language, but why I chose to Go for the Data Pipeline development, because of its simplicity, performances and type safe. Here I’m not going to explain the benefit of Go, if you are interested in, please check Golang.org.
As usual, if you check my previous post
You will know, I will use the same application structure.
1. Build the connection by creating the Env Struct
2. Using interfaces
You may be familiar with the use of interfaces from working through the Go walkthrough or from the official documentation. However, the beauty of using interfaces in Go, is you can define a set of methods a type (often
struct) must define to be considered an implementation of that interface.
When any given type implements all the methods of that interface, the Go compiler automatically knows that it is allowed to be used as that type.
Logging is hard!
If you have already worked as Application Production Support relative function. You will appreciated the developer who logs the information about how the application run, when and how this application failed, which file, which function. (A Plus+)
4. Unit Test
Writing unit test can ensure our code is working as expected, is much less “expensive” than if a bug or regression makes it all the way to a release.
If you still remember the #1 interface implementation, and the other benefit about the interface implementation is help you easy write testable code.
Unit test coverage is an interested subject, I use to crazy enough to trace the 100% coverage, however, I learn my lessons, the most important about unit test is:
How confident you are about your code and make sure the CI/CD automation catch as much as possible the coding level regression bugs.
This pipeline is only built over the weekend, let’s say, 10 hours total design, development/coding. Because of the unit testing coverage, logging, and local integration test. I pretty confident its production ready, however…
The only thing missing here, is the security implementation, environment level, and code level. this is the most important part if you want to ship to production.
This Article mostly focuses on the Design and Build. Next post will more focus on Support and Improvement. :)