Extraction of Information from the Disorganized Internet

John Lo
Human Intelligence
Published in
2 min readAug 8, 2018

Our strategy is filtering, which can be content-based and collaborative, and can be done by us and by algorithms.

Content-based filtering

For content-based filtering, there are two approaches, rule-based and case-based.

Rule-based approach

In the rule-based one, a content is evaluated by several factors to determine its quality. Main topics of the article is a good hint for us, which is used to keep only the relevant contents for us.

Case-based approach

In the case-based one, there are some indicators that predict content quality. Some headers often indicate an unacceptable content quality, so they can avoided.

Collaborative filtering

For collaborative filtering, we share the evaluation of content quality from others, which can also be rule-based and case-based.

Rule-based approach

In the rule-based approach, we have mechanisms that gives feedback to the content, which is used to indicate our collective evaluation on the content quality.

Case-based approach

In the case-based approach, we follow others who share a common interest with us, since contents that they have approved would probably be good for us. It happens in two dimensions, writers and readers.

Writer dimension

For the dimension in writers, if we found some contents published by a writer good, the others are probably also good.

Reader dimension

For the dimension in readers, if we found the contents approved by other readers overlap with ours in a great proportion, others approved by them are probably also good for us.

Collaboration between humans and algorithms

Provision of contents by algorithms

Algorithms on internet use these characteristics to evaluate content quality and pushes them to us, where we can filter them ourselves to extract contents with good quality.

Feedback of contents by humans

Moreover, we can give feedback to the content to receive contents with higher quality, like following topics and other people, in addition to giving our evaluation on content quality.


Filtering of content, which can be content-based and collaborative with rule-based and case-based approach in consideration of different factors, and done by algorithms and us collaboratively, gives us contents with high quality.

