Simple document and query processing for your browser

Published in

norch

3 min readAug 4, 2019

The volatile search engine is coming closer to reality, making the creation of- and using search engines simpler, more useful and cheap. It will be based on a search engine running in your browser, keeping the server cost to a minimum and making snooping on your private data and queries a lot harder.

What we got now is a document and query processor running in the browser. The next steps are data input and a Vue.js version of the search engine, search-index. But more about that at the end of this post.

Create a good search engine experience

With the daq-proc library we will be able to semi-automatically create good search engine experiences without any programming. Just add your data, define which fields are titles and body text and get stuff like autocomplete and filters on important keywords out of the box. In addition you’ll get a smaller index size and healthy OR-search result sets.

The daq-proc demo is just to give you a hint of what you can do. There’s some more bells and whistles under the hood.

Technically, what is daq-proc?

Daq-proc is just a wrapper / browser distribution of four underlying libraries that goes well together in document and query processing.

Extracting words (and numbers) into arrays

words-n-numbers: For words to be analyzed, you first have to have them in an array. Usually, documents come in the form of strings containing words. The simple way is to do a string.split(‘ ‘), but you get a lot of noise, that’s why we’ve created words-n-numbers to more easily get a good extraction of actual words.

Removing stopwords

stopword: Stopwords are words that holds little or no informational value. All the small words you use to write a sentence and not sound like a robot. If words like “a”, “the”, “it”, “for”, “and”, etc. ends up in your search index there are some big disadvantages. Firstly: Your index gets really big. Secondly: If you do an OR-search and type a short sentence as your query, you’ll easily get every document in your index back in the result-set.

N-grams

ngraminator: N-grams creates arrays of word sequences found in your text. You can choose i.e. 2-grams, 3-grams and 4-grams, as in the interactive demo. This is great for creating an autocomplete functionality, suggesting word sequences on the partial query you’ve typed already.

Important keywords

eklem-headline-parser: Determines the most relevant keywords in a headline by matching it with article context, after first removing stopwords.

Next step: Data input through a bookmarklet

Any search engine needs data. The bookmarklet library nowcontent.xyz will make it possible to click on a bookmarklet ( a small JavaScript program within a bookmark) to add it to one of your search engines, running in your browser.

After that a re-write of norch-vue is needed to make the actual search engine browser based and not server based. Then version 1.0.0 of nowsearch.xyz will be created, based on daq-proc, nowcontent.xyz and norch-vue.

Further into the future?

Truly serverless throug torrent-like technology: hyperdb ?