Run Your Own IPFS Search Engine With Lens
Lens is another one of our open-source IPFS tools under the Temporal umbrella, allowing you to take content from IPFS, and index it to be searchable at a later date. Currently Lens can index the following mime-types:
The one requirement is that all your data exists on IPFS, and is discoverable by the running Lens instance. In the future we may add support for other distributed networks, such as DAT or SWARM. To interact with Lens we have a simple, but robust gRPC API that supports both simple and complex queries.
How Does Indexing Work
We have a few different methods of analyzing data that we’ll chain together. When given PDFs we first attempt to extract images and text from the pages. The text is fed into bleve which is capable of handling simple and complex search queries. The images are also analyzed, using a combination of Tesseract for optical character recognition to extract searchable text, and Tensorflow for rudimentary classification of images. When analyzing other mime types such as
image/* we attempt to perform the same Tesseract, and image classification analysis as we do with images extracted from PDFs. When analyzing mime types like
text/* we feed the text directly into bleve.
How Does Searching Work
Searching at the most basic level consists of taking a query, ranging from single words like
blockchain all the way up to search phrases like
blockchain data storage. We also support more complex queries, like filtering against specific tags, categories, mime types, and more however these are entirely optional.
The response to your query is an array of
documents that contains the IPFS hash of the content that matched your query, as well as the mimetype of the content, and a score displaying the relevance this content has to your search query.
There are a few different ways you can go about installing Lens, with the simplest way to be using our prebuilt Lens docker image. When using the docker image, the default setting is to start the gRPC server listening on
0.0.0.0:9998, without any encryption, and with a gRPC authentication key of
blahblahblah. The docker container will also need a connection to an IPFS HTTP API, with the default being
127.0.0.1:5001. To install this docker image, run the following command
docker pull rtradetech/lens:latest
Alternatively for those wanting a more hands off setup, we have a docker-compose setup that also spins up the required IPFS node. To use this docker-compose file, the following set of commands need to be run. These will use the
/tmp directory as the base directory for storing all files in.
$> wget -O lens.yml https://raw.githubusercontent.com/RTradeLtd/Lens/master/lens.yml
$> LENS=latest BASE=/tmp docker-compose -f lens.yml up
Before we get started with how you can use Lens, we’ve published the existing Lens index as seen on https://temporal.cloud/lens via IPFS that can be downloaded via the CID QmZqSYDQrtWg4LHnqT6DPqa1XUr7u4oeaGcyaTiGHJY3SR. It’s 1.2GB in size and contains a variety of research papers, crypto whitepapers, and I have submitted, as well as other user submitted documents.
All Indexing and Searching can be done via the gRPC API, for which we have published protocol buffers on github. Using these you can build an API for Lens in any language that supports protocol buffers!
For an example of how we use those protocol buffers to build the Lens API client that is in Temporal, you can check out our Golang example below:
To actually index data, once you have your gRPC client up and running, all you need to do is called the
Index command, and let Lens do its magic! Depending on where the content is in your network this process can take sometime. Generally speaking, if the content is locally available index analysis shouldn't ever take more than a minute, usually 30 seconds. When submitting data for indexing, you must provide two parameters, the
ObjectType, which should be using the
IndexReq_IPLDas defined in the protocol buffers. The second parameter is
ObjectIdentifier which should be the IPFS hash of the content you want indexed.
Searching for data is extremely simple as well, and requires calling the
Search command. The only required parameter is
Query which defines how you want to search the data. Optionally you can filter out your search results even more with filters like
Hashes to only match specific IPFS hashes,
MimeTypes to only match specific mime types. The time it takes for this command to complete will depend on a wide variety of factors, such as the size of your index, the number of objects matched, the speed of your disk that the index resides on.
Thank you and a big shout out to everyone contributing to IPFS and all the great work that is be done by many different projects!
v2.1.0 of Temporal is out!
Highlights of release:
- go-ipfs v0.4.20
- ipfs-cluster v0.10.1
- gomod support
Temporal: A versatile easy to use tool for companies with large amounts of data to secure, store and track. The platform can be used as is, or customarily built to manage and deploy blockchain-based applications and non-blockchain data-storage solutions for any enterprise.
If you don’t want to run your own Temporal installation you can use our hosted version, Full Featured Pinning Service w/ Free 3GB/Monthly, 5 Free IPNS record creation a month, 100 Free pubSub messages a month and 5 Free IPFS keys
Also the Usages and Features section of the README.md doc on the GitHub repository covers using the docker compose file to spin up the environment.