Taraxa Echo: a Decentralized Social Data Network
In our previous article, we introduced the rationale behind building Taraxa Echo 👇👇.
Introducing Taraxa Echo — a social listening platform.
We call it ‘hype farming’: measuring and directing attention and reputation in Web3.
Here, we’ll outline a preliminary design Echo’s decentralized architecture. Since this is under ongoing development, expect changes down the road.
1️⃣ Echo Node
The Echo Node is the core of the decentralized data collection network. They’re run by individual node runners, each collecting a subset of the overall social data coverage from multiple public, open social platforms, such as Telegram, Twitter, and Discord.
For example, in Telegram, a node is logged in with one account, and listens to many Telegram chat groups and channels at a time through its Ingestor. The Ingestor then organizes the data into a standardized data format, conduct a set of standardized Analytics pipeline (more on this in our next article), and then stores everything locally in the local Storage.
Besides collecting, analyzing, and storing social data, the Echo Node also periodically communicates with two external entities, 2️⃣ IPFS, and the 3️⃣ Echo Smart Contract.
Echo Nodes will deposit their collected social data and standardized analytic results into the IPFS network, after which they’ll receive a hash for the file upload. These hashes will be how DApps will be able to locate & access the data & analytics later on.
3️⃣ Echo Smart Contract
The entire Echo network communicates & collaborates through the Echo Smart Contract that sits on the Taraxa Layer-1 network. The Echo Smart Contract performs several critical functions,
- Coordinates Echo Nodes: the Echo Smart Contract at random intervals, randomly assigns & shuffles which social groups / accounts each Echo Node is supposed to listen in on, and ensures there’s sufficient randomized redundancy (e.g., each Telegram group is listened in on by at least 5 Echo Nodes) so that there’s a way to verify the output.
- Validates & Pays for Social Data: the Echo Smart Contract receives hashes (note: these are hashes of data, not the same as aforementioned IPFS hashes) intermittently submitted by Echo Nodes proving that they’ve collected data from their assignments. The randomized redundancy provides a basis to see if for the same data set, different nodes submitted the same hashes. Nodes that submitted hashes that fall in the majority are rewarded — e.g., if 4 out of 5 nodes submitted identical hashes, those 4 are rewarded, the remaining 1 is not.
- Processes data requests from DApps: external apps (e.g., Hype, Trend Spotter) will need to request data from the Echo network. They will submit their request into the Echo Smart Contract, and then the Smart Contract will route the request to the appropriate nodes with a deterministic mapping algorithm, so the contract doesn’t need to maintain a list of node<>data mapping. Nodes then could submit encrypted (e.g., via hybrid cryptography) IPFS hashes to the requester upon, and then payment is released to the submitter.
🔎Current Focus: Echo Node
As of this writing, our current focus is refining the Echo Node and making sure it can stably gather social data from various social media platforms, Telegram being the first network we’re focused upon.
Once the Echo Nodes are able to reliably collect data, then we’ll worry about decentralized & randomized orchestration. After that, work out the decentralized economics.
We’ll end this brief intro with a final parting thought: that the economics of the Echo network need to be carefully designed to ensure that node runners have sufficient financial incentives to keep the network alive. Whenever data is being bought and sold, there’s the risk that the first buyer will resell the data while undercutting the original seller, since the marginal cost is almost zero. In our case, resale value is significantly lowered in that the use cases rely on time-sensitive data, and the trustworthiness of the network is critical. We’re optimistic about this not being a significant impediment, as other networks (e.g., ChainLink) also face similar issues, but seem to be doing just fine. 😄