Code review of Universal UI script
I think it is interesting that no company has succeeded or even seriously attempted to build a browser that standardizes the look and feel of the web. Fundamentally the potential is always there. And the ability to reshape and remix online content is part of the web’s historic culture.
Maybe there is a legal argument against it. But the internet doesn’t care about intellectual property. The whole thing started with college kids phreaking the phone company. So.
The code being discussed below is an evening or weekend (I can’t remember) of noodling around with a prototype.
So the script in question is: https://github.com/nick227/htmling
A demo is available at: http://htmling.herokuapp.com/
What it does is first scrapes a web address.
Then extracts the html, strips the tags then re-wraps the raw text in new tags.
Finally outputting to the screen with a generic style sheet.
I got real fancy with this one by including tests and clustering.
The code base is:
public/
routes/
/includes/Htmling.js
/includes/Templates.js
test/
package.json
logging.js
server.js
worker.js
package.json requires:
node-bunyan bunyan
chai
forever
mocha
request
node-restify restify
striptags*
supertest
yaml-config-node yaml-config
So right there, striptags. That script is doing most of the heavy-lifting. If you were expecting a detailed blog post about back-tracing broken html tags, you are sorely mistaken.
Here is that code:
Alright keep in mind that script is at least a year old as I write this now, so who knows what we are about to find in there.
I have a class, which is always a classy move. The constructor is nice and tidy, initializing a few odds-and-ends. It also gins up a new Templates object from the externally required Templates class.
Then a total of 9 functions in a modest 136 lines of code. The only method publicly called “build()” accepts the page info and returns a promise.
build()
sanity()
templatize()
charCheck()
tagCheck()
stylize()
loadUrl()
isValidUrl()
The “sanity” function is my hipster way of saying sanitize, which is my arcane way of saying strip tags.
I decided that keeping the anchor tags is important . Damn my tragic luck. Dammit all to Hades. I am going to have to deal with tags after all. I want to chop that text at my buffer size, but probably will be smack in the middle of an anchor tag. So rather than sink into regex madness, I did a back and forward tag finder.
I only have one tag to deal with. I didn’t have to reinvent the wheel. Good old simple tagCheck() takes the current char position and searches using indexOf() and lastIndexOf() for open or close tags. It would be interesting to bench test this against regex and a for loop. I think this is fastest.
The Templates class is a list of template parameter, the only moving parts are the parameter loading.
So why would I do this? Is it just wasting space on github’s hard-drives? That is a very distinct possibility. Because I love the idea. I think it could be done really well.
And to me, this seems like the kind of idea that could be hugely disruptive. The kind of obvious idea that’s been sitting out there forever, with nobody really trying.
Any coders that happen to read this post, that agree there is something to the idea, give me a shout. And anyone feel free to give your opinion of this script’s potential application down in the comments.