Retrieval Bot is Live!

Caro Cai
Filecoin Plus
Published in
3 min readJun 14, 2023

--

We are thrilled to announce the launch of our Retrieval Bot on DataCap applications which provides insight into historic retrievability of datasets stored by Fil+ clients. Along with the release of the bot, we have also published the Retrieval Bot sampling algorithm, designed to enhance the reliability and transparency of data stored in Fil+.

Filecoin Plus’s major push in 2023 is to improve the quality of data onboarded. A clear path to quality is increasing retrievability of open datasets and providing notaries with the necessary tools and guidance on their due diligence process.

Upon each datacap application submission, a Retrieval Bot message will pop up (in the same message as the existing CID Checker Bot) as shown in the screenshot below. It shows the retrievability of past datasets stored by this data client, shedding light on the reputation of the data client.

Even though the Retrieval Bot tests all 3 retrieval methods: Graphsync, HTTP, and Bitswap, the main focus here is HTTP retrievability. (See “Why HTTP?” section at the end). Data clients and notaries can click to see the full report linked at the last section of the message:

A question that we’ve heard from the notary community is what is a good retrieval success rate. In the long term, we aim to have all open datasets retrievable as aligned with the Fil+ mission. As the community evolves, we expect to see a steady increase in the retrievability success rates. In the immediate term, a bad client is one that has <1% retrieval success rate. Notaries are encouraged to ask clients to explain their past datasets’ retrievability as part of their due diligence.

In addition, we have published the full Retrieval Bot sampling algorithm so that it is transparent to the community. Again, the push is on HTTP here while the Retrieval Bot tries to retrieve from all three methods. The bot randomly samples datasets with newer deals having a higher priority. This hopefully gives more clarity on the retrievability scores shown on the application.

This broader effort will feed into the T&T WG effort of developing risk/reputation scores to be displayed on the trust dashboard. Clients that indicate that their dataset is retrievable will earn reputation over time and SPs that comply with retrieval requests will also be ranked higher on future dashboards that display SP reputation.

Lastly, the T&T WG requests all notaries to hold clients accountable based on the statistics generated by the Retrieval Bot.

Appendix — Why HTTP?

Filecoin’s native retrieval method, Graphsync, relies on the payload CID of a dataset for serving retrievals. The recent implementation of booster-bitswap, which enables Bitswap retrieval requests to access Filecoin through IPFS Gateway, also needs the payload CID to perform retrievals. However, Filecoin deals only record the piece CID and not the payload CID on chain. Therefore, serving retrievals via Bitswap or Graphsync needs an external mapping between the piece CIDs and payload CIDs. Retrieval Bots assume the `label` field on chain to be the payload CID, but this is not always accurate.

HTTP stands out as the shortest path to enable retrievals and thus the preferred retrieval method for Fil+ datasets. HTTP inherently supports retrievals via PieceCID and has lots of open resources. Storage providers can leverage booster-http and nginx to serve and monitor http retrievals.

--

--