Open Source log analysis is dead; long live Open Source log analysis
Back in 2009 Damian Guy and I had the mad vision of building a liquid compute, self healing infrastructure upon which you build and deploy apps. We wanted to do it at scale; across thousands of machines. We devised a technology called VScape. The problem with VScape was observability; its a distributed system. Not only did we need to understand what it was doing but what the users app might be doing/behaving (a user app might collect metrics from a Database, Nginx webserver etc). The solution was simple; lets build an app on VScape that did log aggregation; we called it Logscape.
But first; a bit about VScape
Vscape had various layers the built up to comprise an elastic scalable runtime. Any type of application could be hosted, WebServer, In-Memory data grid, etc. VScape provided hooks so that demand and time based metrics could be used to provision additional nodes, and reduce contentious or idle nodes. i.e. If my data-grid was under severe load then VScape would start running processes; at the same time shutting down other app nodes with lower priority. Does this sound familiar? It should, we invested heavily to build some bits of technology that is common.
- Service mesh including network endpoints, service discovery over Netty or Rabbit MQ. Endpoints are proxied and APIs are annotated; APIs are not one-way; they can emit notifications and can also stream to the client. Similar to ISTIO etc
- Reactive data storage; similar to IMDG but consumers can be notified of the type of event (RWD or Expire). data is leased. Similar to Java RX
- Data storage with an ORM layer; store objects in the space, and query, interactive with them using a Domain Specific Language (DDL). Also, allow those objects to evolve (additively).
- Service Orchestration; deploy applications to run various services across agent nodes and orchestrate their startup; including service discovery, demand based elasticity, time based elasticity, failure and recovery detection as well as lifecycle management (start, stop, deploy). Deployment including zoning/regionalisation, prioritisation and security. Similar to K8s, Helm, Helm charts etc
- Replicating file system: peer-to-peer based replication of specific directories between nodes to scale deployment activities without killing the network. Similar to bit-torrent
- User groups and roles; security based users, groups/roles for all entities and resources;
Like anything, timing is everything, and while we tried to sell VScape to some of the largest organisations in the world, they were scared of a small 2 person startup. They all said — this sounds exciting but risky, however, but this visualisation tool looks interesting; how does it work?
Well its a log aggregation tool dummy
Logscape is a bit like the kitchen sink. It does everything; it can be run centrally, ‘tailer’ processes discover/instrument and ship data to dedicated storage services; indexing runs against the storages service that instrument, parse and build realtime, query-able views. A search is executed against the view, and then dispatched to process each matching data source to reduce the stream of log-events and analytics histogram.
There are a couple of key ingredients that make Logscape stand out from the crowd. Im looking at you Splunk and Elastic ;)…
- autodetection of data patterns; KeyValue pairs, JSON or other semi structured data is automatically queryable using the Query expression;
- tagging provides a mechanism to support a meta layer. The meta layer can be used to classify, identify data as part of the search expression. Supremely handy!
- overlay search allows the output from multiple searches to be visualised together to correlate the relationship be between different sources of data.
- workspaces are a dashboard that is populated with a collection of search visualisations; Panels also support HTML resources that teams can use to embed ticking, link to other workspaces and the ability to drill-down to the search page and then drill further to the raw data.
- Apps are not only used by VScape to deploy and run user processes including Logscape but also build custom monitoring apps that collect JMX, Rest API or from other metrics. There is a nice collection of apps; by default the WindowsApp or UnixApp will automatically run on whatever Operating system is detected by the agent. Apps include MongoDB, MySQL, Nginx, Apache weblogs and more.
There is always a ‘but’
Now I don’t want you to go and download Logscape, it is free, and if you want to, then by all means give it a spin. I have been hooking up CircleCI so there is always a green light and the docs are decent. It works, and is good for the enterprise.
BUT.. technology has changed and I want to build something new and different
Over the next few months I’m going to be building a serverless log analysis tool. It will be cloud first, run it within your own VPC, use your storage and your resources. It will take the learnings of the last 10 years and provide something simple and practical. There is some great open source technology out there; including cube.js, fluentd, beats, Logpacker (dead) and others. If you are a fan Datadog, Lightstep, Splunk or Elastic, then this project will be worthy of your attention.
Stay tuned… ;)