Sign in

Add search to your S3 bucket’s non-text files

You know your app’s users

Whether you build software for lawyers, doctors, real-estate agents, or teachers, the common challenge is the same: they have non-text files.

A couple of examples:

How do we search these today?

Currently, you must use your object store’s (S3) tagging feature, like:

Now, I can search “blueprints”, or “wallaby” and this file should return.

But what if we want to go deeper?

If you’ve heard of Lucene, you know it's the full-text standard for search engines across the web.

Full-Text Search (FTS) is a technique for searching text in a collection of documents, and it is the standard library for pretty much every major application’s search bar. Heard of SOLR? ElasticSearch? These are based on Lucene.

Okay, so how do we build Lucene’s full-text search into our SaaS tool, so our users can search for PDFs?

Take the following pseudo-code:

And here’s our tech stack:

Managing full file search on your own

Does that seem like a lot to manage?

Well, it is, that’s why users are flocking to Mixpeek, a file-search API for your software.

Using our Mixpeek Python library, you can upload your files and search them in a single line of code.

Extract the text of all objects in my bucket:

Or upload a single file, in this case, a prescription PDF:

Now, if I search for text that appears in the PDF:

My result is:

We can now use this highlights array to make a clean UI that shows exactly where in the files (and which files) this text appears.

So our tech stack is now split in half:

Using Mixpeek to search your files