AnyLog — A Speed Layer at the Edge

Published in

AnyLog Network

7 min readFeb 20, 2024

AnyLog Edge Platform is an end-to-end integrated software platform solution that enables management and monitoring of distributed data and resources at the Edge.
Using AnyLog, distributed edge nodes operate as a single machine that collectively produces a single result set that is serviced to edge and cloud applications while keeping data in place–locally at each distributed node.

Managing the edge is simple as AnyLog delivers a Single System Image (SSI) where distributed edge resources (sensors, switchers, gateways and servers) appear to be one single system that provides a unified view of the data (as if the data is in a centralized database).

Deploying AnyLog transforms the edge into a Virtual Data-Lake, making distributed data unified and self-describing, allowing real-time insights from the data, lowering the volumes of data transferred to the cloud, and seamless integration with the cloud whenever needed.

This approach delivers and deploys a unified, end-to-end software stack at the edge, more quickly and securely, through pre-existing services and tooling allowing a pre-paved path to production.

The AnyLog Architecture Diagram

In the diagram below, a bottom-top architecture leveraging AnyLog Edge Platform is shown.

The bottom layer represents the PLCs and Sensors that generate the data.
The AnyLog software is deployed on the Edge Nodes. Nodes can be as small as a switch, gateway, Raspberry Pi, or larger edge servers.
The Speed Layer contains the Virtual Data-Lake (VDL) derived from all nodes running the AnyLog software. New nodes deployed with AnyLog are dynamically added to the VDL, and the data they store locally becomes immediately available to all applications and services. Queries to the VDL are issued in SQL and written as if the data is located in a single database. Since the data is distributed, queries are satisfied in a MapReduce Manner and return a complete result set, representing a unified, centralized view of the data (whereas the physical data remains in place at each node).

The Speed Layer

The Virtual Data-Lake layer acts as a speed layer. This layer fills the gap between traditional centralized data warehouses (or lakes) and data at the edge. The speed layer is designed to handle high-velocity data streams that are generated continuously and require immediate processing. For high-velocity data, users determine the retention policies that determine the data that will be kept at the edge and its duration and data that will be submitted to the cloud (see more details in archival and retention below).
The AnyLog speed layer acts as a bridge between the data in motion from the devices and edge nodes to the data at rest in the cloud, and it delivers a unified view of both real-time streaming data and historical edge data.

Decentralization, High Availability, and Scaling

The AnyLog nodes form a decentralized network of nodes that guarantee continued operations without a dependency on a centralized authority. Data is replicated across the nodes of the network such that if a node fails the data is serviced by surviving nodes.
AnyLog supports horizontal scaling by distributing the data to multiple nodes that process the data as a single unified machine. This approach allows users to add nodes to the network and distribute the data loads to the additional nodes whenever needed.
Adding more AnyLog nodes increases the degree of parallelism because queries are satisfied by multiple nodes concurrently (and by adding nodes, the network can be adjusted to deliver the needed performance).
AnyLog’s virtualization and software approach supports upscaling and downscaling with ease, deploying (or removing) additional nodes when needed without any changes to the software infrastructure.

Data Sharing

The AnyLog network operates as a trustless permissioned network. Data owners maintain full control of their data including access control rights. Non-data-owners can join the network but can only view data permissioned to them by data owners.
Data Sharing is accomplished by policies granted by data owners that associate the data they own with other members of the network and specific permissions (examples: read/write, access restrictions to a subset of tables, date ranges, access expiry).
When a query is issued, the SQL request is evaluated against policies that determine the user/application permissions. With this approach, the SQL details the needed data, and permission policies restrict the subset of the data that can be evaluated to satisfy the query. This approach keeps data secure, is automated, transparent, and replaces the need to physically copy the data. When a query is issued, it is satisfied only from the permitted data without the need to develop specific processes to replicate the data and apply external logic to join the data with the applicable permissions.
With this approach, data owners retain complete ownership of their data, including access rights and permissions, while leveraging AnyLog as a platform and infrastructure for data sharing.

Edge Security

AnyLog leverages 4 layers to secure the data at the edge:

Edge Nodes can be deployed inside a firewall.
Using an overlay network, the network is secure, and data transfer is encrypted.
The AnyLog protocol uses asymmetric cryptography:
— Participants are authenticated via public/private keys.
— Permissions (for data and services) are granted based on authorizations represented in policies hosted in a blockchain.
AnyLog nodes manage logs that monitor system events, node events, and device performance metrics. An example is with a deployment delivered by Open-Horizon using Kubearmor leveraging AnyLog as a data layer and is detailed here.

Edge Data Services

Sensor data streamed to edge nodes is processed using AnyLog services on each edge node.

These services include the following:

Southbound connectors like REST, MQTT and gRPC to connect to data sources. These connectors are extended by 3rd party software to include protocols like Modbus, BACnet, EtherNet/IP.
Northbound connectors supporting queries. For example, an application issues a SQL query via REST to an AnyLog node that acts as an orchestrator. The orchestrator operates transparently as follows:
1. Determines the edge nodes that host the relevant data.
2. Pushes the query directly to each identified node.
3. Each queried node processes the query on its local database and returns a result to the orchestrator.
4. The orchestrator unifies the results and returns the unified result to the application.

This process is similar to MapReduce, but the participating nodes are determined dynamically. Additional info and use cases are details in the AnyLog Value Prop Doc.
Cloud integration services including REST, Kafka, SQL queries from a cloud process to edge data or repeatable queries.
Services that create the schemas based on data ingested and unify the schema across all the participating edge nodes.
Local databases on each node to host structured and unstructured data (the type of database is configurable and can vary depending on the size of the node, volume and type of data ingested). An AnyLog node is pre-configured with interfaces to PostgreSQL, SQLite, MongoDB, and other types of databases can be added if needed.
Note: No need for DBAs to deploy and declare databases at the edge as these processes are automated using the node’s services.
Security services (see details in Edge Security above).
High Availability services replicate data among nodes such that if a node fails, the data is serviced from a replica. The number of replicas is a configuration parameter.
Archival and retention of data — users determine the duration of data on each edge node (i.e., days, months, years). Duration can be dependent on the type of data or node). Users determine which data is transferred to the cloud and the transfer protocol to use.
The process of removal and archival of data is dynamic, automated and does not impact the ongoing operations of the node.
Rules Engines maintain and process rules that evaluate data or resource status to trigger a process, alert, or notification (e.g., signal a valve if the temperature is above a threshold, remove old data if disk space is reaching capacity, send latest ingested data to a destination by time interval, forward stale data to the cloud for long-term, historical storage).

Monitoring Functionalities

The deployed AnyLog Network monitors all the edge resources from a single point. Like in a centralized cloud, AnyLog provides a real-time view of the resources including network and disk usage, CPU, and Memory availability across the entire edge from a single point.
The same setup can be adjusted to monitor sensors, for example, to alert if a sensor did not generate data within an interval of time.

The Blockchain

AnyLog is leveraging a blockchain as a shared and transparent metadata layer such that all the nodes in the network operate in a synchronized way and as if the distributed nodes are a single machine. AnyLog offers an API to update the blockchain with policies that are leveraged by the services offered on the edge nodes. These policies act to unify the schemas across the edge nodes, serve as a directory to locate where relevant data resides, publish security rules, and determine configurations of different components.
With this layer, users control the edge components from a single point (by updating policies on the blockchain) rather than interacting with the individual edge components.
Policies can include any needed metadata, for example, description of nodes and sensors (to act as edge shadows), and queries can be joined with metadata, for example, query the status of sensors of version X, or data of sensors deployed in location Y.
The API to the blockchain is open, supporting policies that are not AnyLog-related. For these use cases, AnyLog will (transparently) service these policies to the 3rd parties applications, services, or sensors at the edge.

Deployment

AnyLog can be deployed with Docker, Kubernetes, from a single point using an edge deployment platform like Open Horizon, or as a backend process.