There are no perfect way to use data

Qi He
2 min readMar 24, 2018

--

“Oh, why Google just show me this? The search results are not what I want !” I complained.

That let me think about how the search engines work. Then I searched the question online and knew Google search engines use an algorithm called PageRank. The key of this algorithm are:

  • If a web page is linked to many other web pages, it means that the web page is more important and its PageRank value will be relatively high.
  • If a page A with a high PageRank is linked to a page B, web B’s PageRank value will be relatively high.

Before the PageRank, search engines used a very simple idea to order their results: more important of the page, the more visits it will be. However, there are two big problems of this idea: One is that because statistics can only be collected, statistical data are not necessarily accurate, and fluctuations in the number of visits will be large. To obtain accurate statistics requires a lot of time and manpower, and only a very short effective time can be maintained. The other is that visits do not necessarily mean the “importance” of a web page.

To some extent, PageRank algorithm improves the order of the web pages. But maybe also not a perfect way to solve it.

Thinking of this, I found that using data to do research and solve problems can never been simple.

To be honest, I confused about our homework “How to use some datas to tell users a story”. When our team decide what data we wants, I fell into confusion. The data itself is interesting but I don’t know what’s the use of this data.

When I doing my assignment I keep asking myself: Are all the datas useful ? How can I choose useful datas ? How do these data come from? Are they reliable? Will people interested in those datas? Can they really solve the problem? Is there better way to analysis those datas?

This article is full of question, it’s just some thoughts of the datas. I will keep going to find the answers and better way to use the datas.

--

--