Project update — parsing HTML

The last programming language that I learnt before Go was JavaScript and it was 5 years ago. By “learnt” I mean — I created few apps and became really efficient with it. So I must say I fallen out of the loop of continuous technical learning. That’s why learning Go is so painful.

Today I was finally working on something meaty — parsing HTML files brought by HTTP requests — I POST few files and their content is parsed using some XPath. It’s not so far from standard library, but finally it’s something you can’t achieve easily with Go stdlib. I didn’t know that, when I started, but I should have suspected this.

It’s quite easy to parse HTML document to fix its content:

import (
“log”
“strings”
“html”
“bytes”
)
func fixHtml(document string) (string) {
reader := strings.NewReader(document)
root, err := html.Parse(reader)
  if err != nil {
log.Fatal(err)
}
  var fixingBuffer bytes.Buffer
html.Render(&fixingBuffer, root)
return fixingBuffer.String()
}

It’s not as trivial as it would in Ruby, but I expected this to be more difficult. I won’t explain what’s happening, because it’s quite obvious. Why do we have to “fix” the document? Because XPath library wouldn’t find the node in it.

Ok, now let’s jump to XPath:

func parseHtml(html string) (string) {
reader := strings.NewReader(html)
root, err := xmlpath.ParseHTML(reader)
  if err != nil {
log.Fatal(err)
}
  xpath := xmlpath.MustCompile(“//table[3]//tr[5]//table//tr”)
value, ok := xpath.String()
if ok {
return value
}
return “”
}

Although it was easy to use XPath it is very limited — e.g. you can’t use `last()` selector and many others. You can iterate to last element using `xmlpath.Iter`, but using XPath would make it more declarative.

What’s interesting in both code samples they are not “object-oriented”, but passing one of arguments as message receiver makes the code feel like it is object-oriented — I really like this feature of Go.

Many things changed since I learnt the JavaScript. Although there was StackOverflow then I barely used it — I was much more keen to just jump into w3schools language reference or API documentation. But using StackOverflow made my learning velocity higher now — I didn’t have to find library, because it was already mentioned in post. I didn’t have to use documentation, because I already had snippets that handled my case. You may laugh at stackoverflow driven development, but there’s no faster way to learn new language, especially when you’re already familiar with some basics.