Why Big Data Is Often A Big Zero!

23% of you will hate what I am about to say, 13% will be ambivalent, 55% will will agree with all you have (and 6% of those will be violently passionate about what I am saying) and very few of you will have noticed those numbers are made up and don’t add up to 100%…

Now I have your attention for a few more seconds, let me tell you something — BIG DATA IS OFTEN CRAP!

It’s not because the data that is collected and goes into a system is rubbish, generally it’s good. It’s the system that is wrong.

I lay this morning, insomnia riddled, and I heard on the radio how big data was messed up because of “peoples opinions on what came out of the data being bias”.

It was a face palm moment.

We are all bias in one way or another, but as a SEO Consultant I don’t look at data with a particular bias. I look at data objectively. A data scientist at a university will do the same. Someone looking at millions of pieces of data from the UK Census will do the same. Objectivity is how you improve and tell the real story of the data that has come out of the system.

What is bias — or rather can be bias, is the systems we use to compute the data we collect — and this is why big data is often a big zero.

Read again what I say above. People have opinions, we all do.

These assumptions come across in the code of the systems created, and no matter how big the system, no matter how many people have coded into it (which will likely reduce the bias type) it is still there. All systems are bias.

All systems are fundamentally flawed, because the AI within them is human based. It’s based on our own judgments, usually one or two people have said ‘yeh that is the right call’ or ‘no that can’t be right’ and it makes it into the code and can be there fore years even if it’s entirely wrong. Code can be bias just because of one persons opinion out of five hundred coders, or perhaps in Google’s case thousands!

So — big data, no matter how we look at it — it is bias.

It isn’t because of the people who look at what comes out, though of course they have their part. But it’s because the systems are bias.

It’s time we stopped thinking big data is always right, and start questioning the output we see from our systems.

Nothing produced by man is ever 100% correct with data, so why should we believe a man created machine / code base will be 100% correct?

And that’s why big data can be a big zero.

01101001 01110100 00100111 01110011 00100000 01101110 01101001 01100011 01100101 00100000 01110100 01101111 00100000 01100010 01100101 00100000 01101110 01101001 01100011 01100101