Abstract art

In response to Ayende’s code review of Resin, part I.

So wrong it’s right. https://img0.etsystatic.com/062/0/7847857/il_570xN.797405938_c5sa.jpg

My art teacher once told me, before you can create abstract art you need to first train yourself on concrete art. If you don’t know what the rules are then how will you know which one’s to break?

I’ve had trouble following this advise thoughout my career because I believe it only to a certain degreee. I believe much more in John Forbes Nash’s * stance:

Before diving into a new problem you should prepare yourself by learning none of the rules that apply to that problem space and that people before your have abided by because they will hold you back.

Put another way. don’t try to visualize where the borders of the box of the problem space are. There is no box.

My concrete art is sometimes off though which is why I do appreciate the occasional reality check from people who’s been around.

Oren’s review is a straight path for me to follow for Resin 1 to become Resin 2. Much obliged.

The critique

All in the eyes of the observer of course but after hours of late night coding, stepping back and then squinting, shit is starting to look good! https://thumbs.dreamstime.com/t/broken-mechanical-heart-red-clock-like-open-clockwork-conceptual-metaphorical-d-illustration-isolated-white-49597254.jpg

A search engine is a system with many moving parts. Resin is one with all of the parts, most in their right position but many appear crafted in haste.

It’s a clockwork with many gears firmly in place but some of them have been molded from plastics when they should have been made from titanium.

That’s my view. Having said that, given how Resin already beats Lucene 4.8 in many indexing and free-text querying scenarios with regards to speed and given how one code base is written by some dude and the other code base is the result of ~ a million man (or woman) hours, then rest assured anyone who reads this, I’m not too worried about the future of this project. I do have some work cut out for me though.

Apart from giving me straight up fantastic advise on how to design around the problem of tokenization, there was some critique. Here’s what I heard:

  1. There are some problems with the architecture that needs more than a little boyscouting to fix (mutability and too many allocations puts too much pressure on GC).
  2. File system operations are not optimal
  3. Code readability can be improved.
  4. There are many ways of creating extension points. Virtual methods hammered on in high frequency when milliseconds count cannot be counted as one of the most optimal.

See you in the comments section.

*) That guy was something else.

One interesting thing about his name is as of this post it now appears one more time in the Google index and that engine now knows of a fairly new document posting for that name with the count of 1 but since I didn’t anchor it with a hyperlink to the appropriate Wikipedia page Google feels this post is not very important with regards to it. Which is interesting to me. Because relevance has very little to do with hyperlinking.

Hyperlinking is meant to give humans the option of achieving context. It is information the Google engine shouldn’t even require. By now the engine should be able to reverse-engineer the context from the content we provide.

Given all this labeled training data that is the web, by now it should be a content ranking engine, not a page ranking one. Pages are for machines. Content that spans across pages and sites, is for people.