TarsGo: A high performance microservice framework in golang which is 5 times higher than the performance of gRPC

12 min readSep 25, 2018

Lead: Recently, the Tars open source project released it’s Golang version in Shanghai . It’s performance is equivalent to the C++ version, and it is 5 times higher than the performance of gRPC. –editor

Tars is a microservices framework of Tencent Open Source. It was open sourced in April last year and was donated to the Linux Foundation in June this year. Tars provides users with a complete set of solutions for development and operation, helping a product or service to be rapidly developed, released, deployed, brought online and maintained. It integrates scalable protocol codec, high-performance RPC communication framework, name routing and discovery, monitoring, log statistics, and configuration management. It can quickly build stable and reliable distributed applications and complete them with effective microservices governance. After more than a year of development, Tars has been used by many companies, such as China Reading Group, Huya Live, Keda Xunfei, Youpin Fortune, Longtu Game and Golden Sun Education.

It is reported that on September 15th, Tencent announced the official Open Source Tars’s Golang version (Tars-Go) . The editor learned from Tars’ open source announcement about the differences, technical architecture, performance data, and related technical details between Tars and other microservices frameworks . This article will show the details of the TarsGo.

Project:

TarsCloud/TarsGo

A high performance microservice framework in golang. A linux foundation project. - TarsCloud/TarsGo

github.com

service governance and multi programing language support

The microservices architecture has become particularly hot in the past two years, and it has become the most mainstream architectural model. When we mention the microservices framework, we can naturally cite many well-known projects such as Dubbo, gRPC and Spring Cloud. These microservice frameworks can be divided into the following four categories depending on whether we support service governance and whether we support multiple programming languages:

A framework with RPC calls and has not service governance. Typical representatives are gRPC, Thrift, etc. They solve the problem of communication between services. Most of them also support multi-language, but they need to solve service governance problems themselves.

A framework with service governance but supporting ony one programing language. Typical representatives are Spring Cloud and Dubbo, both of which are implemented in Java. Users integrate multiple open source projects together to meet the needs of service governance.

Service Mesh. It supports service governance and solves multi-language support through the Sidecar model. The developer needs to encapsulate a communication component to solve communication problems and asynchronous calls, and it will increase the complexity of architecture and maintenance.

A framework with service governance and support for multiple languages. At present, there are few in the industry, except Tars has not found other representative frameworks.

From the analysis giving above, it can be found that Tars is a microservices framework that supports service governance while providing multi-language support. This is the unique and advantageous advantage of Tars.

Tars can run on physical machines, virtual machines, and containers. The main protocol is the IDL-based Tars protocol, a binary resolution protocol similar to protocolbuffers, and Tars extends to support other protocols, even user-defined.

The calling mode is mainly based on RPC, which supports synchronous, asynchronous and one-way calling. In addition to supporting the industry’s well-known capabilities such as service registration and discovery, service governance ,Tars also provides some other governance capabilities for massive access, such as Set model, automatic zone awareness, overload protection, etc., in addition to this new support in Golang , Tars Also supports C++, Java, NodeJS and PHP, and works well with DevOps.

Tars is divided into three parts: Registry, Service Node, and basic Services Cluster:

Registry is the management and control node of the microservice cluster, providing services such as service registration and discovery.

Service node:

A service node is an atomic unit that Tars runs. It can be a container or a virtual machine or a physical machine. A logic service solves capacity and fault tolerance by deploying multiple service nodes. The service node includes a node management service and one or more logic services. The node service manages the services of the node in a unified manner, provides functions such as starting and stopping, monitoring logic server, and receives the heartbeat reported by the logic server, and reports it to the Registry as a data source for service discovery.

Basic service cluster:

The basic service cluster is a series of servers designed to solve the micro-service governance. The number of servers is uncertain. For fault tolerance and disaster recovery, it is generally required to be deployed on multiple servers. The number of specific nodes is related to the service scale. For example, If the online-service scale is large and you need to log more logs, then you need to deploy more log service nodes. The basic services mainly include monitoring statistics, configuration centers, log aggregation, authentication, and distributed call chains. Tars has a very good service governance capability.

Through collaboration with the Registry, service nodes, and basic service clusters, Tars transparently performs service governance related tasks such as service discovery/registration, load balancing, authentication, and distributed tracking. If the framework registers xxxsvr through the Registry, the Client obtains the list of address information of the called service by accessing the Registry, and the Client selects the appropriate load balancing method to invoke the service according to the needs. Load balancing supports RR, Hash, WRR, and more.

In order to shield the faulty node in time , the Client judges whether there is a fault according to the abnormal condition of calling the other service to perform fault-tolerant strategy more quickly. The specific strategy is that when the client calls a server and the call consecutive timeout exceeds the setting threshold, or the timeout ratio of the call exceeds a certain percentage threshold, the client will block the server node and distribute the traffic to the normal node. The blocked server nodes are reconnected at regular intervals. If they are normal, normal traffic distribution is performed.

As the service grows, the deployment of the service will inevitably cross IDC or corss region. In the normal load balancing mode, the service deployed in the cross-idc or cross-regin will increase the delay due to the network. In order to speed up the access speed between services, reduce the latency caused by cross-region and cross-idc calls and reduce the impact of network failures, Tars provides automatic area-aware service management functions.

Automated zone awareness through the combination of Registry and development frameworks has the following advantages:

1. Simple operation and maintenance

2. Reduce latency and reduce bandwidth consumption

3. Stronger disaster tolerance

In addition, Tars also provides a Set model.

The Set model normalizes and standardizes deployments based on business function characteristics or capacity. The advantages of the Set model are:

Effectively prevent faults from spreading

Convenient for capacity management

For flow control, the main problems faced by the service release are “how to make service changes without loss ” and “how to do grayscale verification”. In Tars, the flow control can be implemented on demand through the Registry and the development framework. Achieve graceful release and grayscale traffic.

In addition, Tars provides an OSS platform that enables operations to be visualized and web-enabled.

It mainly contains the following features:

Logic server management: including deployed services, as well as service management, release management, server configuration, service monitoring, feature monitoring, etc.

Operation and maintenance management: including service deployment, capacity expansion, template management, etc.

Provide Open API to customize your own OSS system

TarsGo，Tars Go！

Multi-language support is a big advantage for Tars, and Tars has already released C++, Java, PHP, and NodeJS versions. The Go language’s coroutine concurrency mechanism makes it ideal for large-scale, high-concurrency back-end server application development. With the rapid development of containerization technology, projects such as Docker, Kubernetes and Etcd have made Go language more and more popular. And become the preferred language for cloud native. Therefor the Go language version of Tars came into being. The introduction of Tars-Go is of great significance as the whole environment gradually moves toward the cloud native.

The new Go version Tars-Go’s overall architecture can be divided into three parts, as shown in the following figure:

On the left is the tars2go tool, and tars2go is based on the Bacchus Paradigm (BNF), a formal method for describing the language structure of a program. It is used to perform grammar and lexical analysis of Tars files and generate corresponding code for use by clients and servers. . At the same time, it provides the codec function of the Tars protocol binary stream, and converts the binary package into the corresponding Go data structure.

The right part is package tars, which contains the functionality of both the Client and Server sections:

l The client consists of logical structures such as Servantproxy, Communicator, ObjProxy, and adapterproxy. These logical structures are used to manage ip and port of the server node corresponding to obj name .The underlying layer uses net.Conn to establish a specific connection and uses SendQueue chan to control the number of concurrency. The Client also includes some Goroutines for property monitoring reporting and stat monitoring reporting.

l Server uses the listener of package net to manage TCP and UDP connections, accepts connection with multiple Goroutines, and passes net.Conn after accept to the backend handler via SendQueue chan for processing. The Handler consists of a bunch of woker Goroutines. Each Goroutine is based on net.Conn for sending and receiving packets, Tars protocol decoding, and dispatcher (generated by tars2go) to call the user’s code implementation, and then encode the result into a binary stream and return it to the client. Server also includes some Goroutine functions such as remote log for asynchronous reporting to prevent synchronous calls to block requests.

The editors learned that the Tars open source team experienced the performance tuning of its various aspects during the development of Tars-Go. Tars-Go focused on the development and improvement of functions in earlier versions, without systematically performing pressure testing and performance analysis. After a period of business using, TarsGo developer focus on performance optimization. The Tars open source team first optimized the tars2go tool for a round of optimization. When generating the grammar tree, it generated the type information, avoiding the use of reflection for type judgment, and the codec efficiency was improved by 2 times. Then pressure testing of the whole server was performed again and measurement and use cpu profile to analysis:

Here are a few examples of performance improvement optimizations:

Timer performance issues:

When each request comes in, Tars-Go will create a goroutine for processing, in order to process the call timeout, a timer will be created, and the timer will be deleted at the end. When the concurrency is up, the timer will be created and deleted frequently. Take up a lot of CPU time for service.

The R&D team found in an issue that in a multi-CPU scenario, if there are a large number of timers, the performance will be greatly depleted. The optimization method is that allowed each p has its own timer, which can greatly improve the overall concurrency performance. So Tars-Go upgraded the build environment to 1.10.3. From the profile reusult, the performance has been greatly improved, and R&D team implements its own timer based on the time-based polling algorithm, in exchange accuracy for efficiency.

Package net SetDeadline call performance issue

In order to set the read and write timeout of the network connection, Tars-Go uses the related calls of SetReadDeadline/SetWriteDeadline of the package net, but it is found from the profile that when the concurrency is very large, these two calls take a lot of CPU time. In order to circumvent these two related calls, Sysfd is used to set the Socket read and write timeout.

Bytes Buffer related issue

As can be seen from the figure below, there is a considerable part of the time spent on the slice-related operations. In the process of encoding and decoding the tars protocol, the bytes.Buffer is used for temporary storage. When the bytes of the bytes.Buffer are not enough., slice expand is performed. At that time, a certain amount of memory space will be allocated, and the efficiency of frequent allocation is very low, so that the performance degradation is obvious when the package is large.

Reminiscent of Redis’ memory model and Linux’s slab mechanism, TarsGo pre-create and reused frequently creating destroyed objects. Go itself provides a sync.Pool mechanism for multiplexing of temporary objects to reduce GC. On this basis, Tars-Go implements a buffer management scheme similar to the Linux slab mechanism for allocation improvement.

Other aspects of optimization

TCP connection optimization:

Use persistent connection, set up large read and write buffers, and selectively set tcpNodelay

goroutine pool

Go’s coroutines are lightweight and efficient, but high-concurrency creation and destruction of coroutines often results in some performance loss.

avoid multi-goroutine competition over chan

When chan reads and writes, there is a lock overhead. When the concurrency is large, it will affect the overall performance.

Individual scenes are replaced by atomic perations

Use pointers instead of values as function parameter

Especially for large structures, when function design, function parameters use pointers instead of values, which will have a good performance improvement.

Avoid using reflection

Reflections are sometimes useful, but they often become performance killers.

Try to determine the type in advance

After the above performance optimization, the performance of Tars-Go in the small package increased by 5 times.

Pressure measuring machine type: 4 core / 8 thread CPU 3.3Ghz frequency 16G memory

Pressure measurement logic: the client sends a certain amount of data to the server, and the server returns it to the client as it is.

Server single process, multiple clients .

Tars-Go programming example

The Tars protocol is a binary protocol. It is a language-independent IDL language. The tool automatically generates the server and client code. Below is an example of the Tars protocol:

For specific programming, you first need to define a Tars file, as shown below: define interface Mult, a and b are input parameters, and c is an input parameter, which is an integer.

Then generate the interface code. Use tars2go JesseTest.tars to automatically generate a framework implementation of pacakge Prajna and Mult methods, without worrying about implementation details:

Finally, implement the interface code, put the result of multiplying the parameters a and b into c and return it to the client:

Then the code be compiled by go build

The client only needs to pay attention to the input and output, and import a package converted from a Tars file to complete an RPC call.

In the future, the Linux Foundation will enhance the community operations of the Tars project, allowing Tars’ influence to move from China to the international.

· Tars：https://github.com/TarsCloud

· TarsGo：https://github.com/TarsCloud/TarsGo