Apache SkyWalking Architecture Designs
Author, Wu Sheng, SkyWalking creator
Apache SkyWalking provides a very powerful and lightweight backends. In here, I am going to introduce why we design it in these ways and how does it work.
For APM, agent or SDKs are just the technical details about how to instrument the libs. Manual or auto are nothing about the architecture, so in this document, we will consider them as a client lib only. I will skip these parts in this article.
Basic Principles
The basic design principles of SkyWalking architecture are easy to maintain, controllable and streaming.
In order to achieve these goals, SkyWalking backend provides the following designs.
- Modulization design.
- Multiple connection ways for client side.
- Collector cluster discovery mechanism
- Streaming mode.
- Swtichable storage implementors.
Modulization
SkyWalking collector is based on pure modulization design. End user can switch or assemble the collector features by their own requirements.
Module
Module defines a collection of features, which could include techenical implementors(such as: gRPC/Jetty server managements), trace analysis(such as: trace segment or zipkin span parser), or aggregation feature. Totally decided by the module definition and its implementors.
Each module could define their services in Java Interface, and every providers of the module must provide implementors for these services. And the provider should define the dependency modules based its own implementation. So it means, even two different implementors of the module, could depend different modules.
Also the collector modulization core checks the startup sequences, if cycle dependency or dependency not found occurs, collector should be terminated by core.
The collector startup all modules, which are decleared in application.yml
. In this yaml file
- Root level is the module name, such as
cluster
,naming
- Secondary level is the implementor name of the module, such as
zookeeper
is thecluster
module - Third level are attributes of the implementors. Such as
hostPort
andsessionTimeout
are required attributes ofzookepper
.
The example part of the yaml definitation
Multiple connection ways
First of all, the collector provides two types of connections, also two protocols(HTTP and gRPC). These two are
- Naming service in HTTP, which returns the all available collectors in the backend cluster.
- Uplink service in gRPC(primary in SkyWalking native agents) and HTTP, which uplinks traces and metrics to collector. Each client will only send monitoring data(traces and metrics) to a single collector. Attempt to connect other if the connected one offline.
Such as in SkyWalking Java agent
collector.servers
means the naming service, which maps tonaming/jetty/ip:port
of collector, in HTTP.collector.direct_servers
means setting Uplink service directly, and using gRPC to send monitoring data.
Example of the process flow between client lib and collector cluster
Collector Cluster Discovery
When collectors are running in cluster mode, collector must discovery each other in some way. In default, SkyWalking uses Zookeeper to coordinate and as register center for instance discovery.
Through the above section(Multiple connection ways), client lib will not use the Zookeeper to find cluster. And we suggest the client shouldn’t do it in that way. Because the cluster discovery mechanism is switchable, provided by modulization core. Relying on that breaks the switchable capability.
We hope the community provides more implementor to do cluster discovery, such as Eureka, Consul, Kubernate.
Streaming Mode
Streaming mode likes a lightweight storm/spark implementation, which allows using APIs to build streaming process graph(DAG), and the input/output data contracts of each node.
New module can find and extend the existed process graph.
There are three cases in processing
- Synchronizing process. Tranditional method invocation.
- Asynchronizing process, a.k.a batch process based on Queue buffer.
- Remote process. Aggregate metrices across collector. In that way, selector is defined in node to decide how to find the collector in cluster. (HashCode, Rolling, ForeverFirst are the three ways supported)
By having these features, collector cluster runs like as a streaming net, to aggregate the metrics and don’t rely on the storage implementor to support writing the same metric id concurrently.
Swtichable Storage Implementors
Because streaming mode takes care of the concurrent, storage implementor responsibilities are provide high speed write, and group query.
Right now, we supported ElasticSearch as primary implementor, H2 for preview, and MySQL Relational Database cluster managed by ShardingShpere project.
Web UI
Besides the principles in collector design, UI is another core component in SkyWalking. It is based on React, Antd and Zuul proxy to provide collector cluster discovery, query dispatch and visualziation.
Web UI shares the similiar process flow as client’s 1.naming then 2.uplink
mechanism in Multiple connection wayssection. The only difference is that, replace the uplink with GraphQL query protocol in HTTP binding at ui/jetty/host,port
s.
In default distribution package, Web UI uses localhost:10800
to do naming query for collector cluster ui/jetty/host,port
s.