Should Market Analysis Die?

Does it do more harm than good?

Ethar Alali
Bz Skits
Published in
9 min readFeb 6, 2016

--

Startup Europe week 2016 draws to a close with me reading another incredible result of bad experimentation and research! The beginning of the week saw me rant about this slick, but abhorrently wrong piece of… I dont’ know what the f*** it is!?! It’s not research! That is the last thing it can be called.

Start Your Engines

The problem with the research at the start of the week was that data was pulled off from @nesta_uk, [twitter bio: an independent charity with a mission to help people and organisations bring great ideas to life]. That data has a fundamental flaw in it’s methodology and I quote:

“The Index includes all capital cities in the EU28. The presence of cities from EU28 countries will allow all member states to use the Index to inform and assess their digital entrepreneurship policies. Additionally, it includes seven non-capital cities in the EU that are important hubs of digital entrepreneurship; these extra cities were chosen by reference to other indicators of digital activity or entrepreneurship. It is hoped that future versions of this Index will expand the number of non-capital cities.”

You can go directly to the methodology on this page:

https://digitalcityindex.eu/methodology

From a research perspective, it is also complete cr@p! But I’ll come on to why, later.

I got ired because that was the second time in three days I had received something research related which didn’t have Manchester on it for a digital startup hub. Believe it or not, the University of Reading, at one point one of the UK’s foremost technology leaders in the areas of robotics, sent me this cr@p!

After the second piece of bad research appeared from Nesta at the beginning of the week, I wondered what exactly it was that was causing Manchester to disappear off the map, despite the fact that several independent, EU wide results have Manchester placed as the second city in the UK next to London. Here are the EU results published for the current top EU cities on Startup Hubs EU.

Top startup cities EU wide, from EU data collated through mash-up of multiple sources

Indeed, even marketing analysis companies themselves came to the conclusion that Manchester was a good UK wide player. Time (Dec 2015)…

And time (Jun 2014)…

And time again!

http://www.managementtoday.co.uk/news/1290837/the-7-best-tech-hubs-outside-london/

What’s My Problem?… And we’re off!

The problem, is in order for research to be considered independent and credible, it has to follow three main principles of science. They are:

  1. Randomisation

The research must randomly select for study representative cities in the EU from a total population which do and do not have manifestation of the phenomenon. When cross-sectional, which the nesta research is, it also requires a significant enough sample size to account for experimental or categorising errors (statistical false positives or negatives).

Nesta research variables

The methodology lists a huge number of variables, which usually sounds great (“We raided the bin and found a chewing gum wrapper!”) but aside from the City Level Data estimates, again I quote:

*City level data was estimated by multiplying national level data by the proportion of national GDP coming from that city

Which from a point of view of analysis, is f***ing total bo**ocks in it’s own right, have a number of factors which have a very high statistical error rate.

Study Fail

Given the The methodology in the Nesta study set fails on not just one, but three counts. It fails to identify all cities in the EU (since as you can from my tweet, the list only includes 35 EU cities. All capital cities and a non-random selection of other cities, which are wrong according to the method anyway.

In addition, one of the variables is “Engagement with digital startup ecosystem” and is “Number of tweets with selected entrepreneurship related hashtags”

Tweet analysis is notoriously error prone. Think about the number of typos you’ve made in tweets or mistyped hashtags. If you’ve every typed anything wrong this will take it out of the set of ‘selected’ hashtags. Equally, bots can and do use popular(ish) hashtags as a bandwagon to jump on, which increases the number items in the set. Both are these are false negatives and false positives respectively. This increases the width of the distribution, naturally increasing the range of possible positions any sampled means could be in and so, decreasing the confidence in any result. In addition, temporally, the list changes all the time. The [lack of] reliability of hashtag systems means the sample size has to be huge to get over this experimental error.

In addition, the irony is that the study footnote on the website states

1. Principal Component and Factor Analysis are not possible for multiple reasons, including (i) small-sample issues (more variables than cases, hence principal axis analysis is not possible); (ii) it would be sensitive to modifications in basic data, meaning data revisions and updates such as adding new cities would not be feasible in the future; (iii) outliers (of which we have several) would introduce spurious variability in the data; (iv) PCA and FA minimize the contribution of individual indicators which do not move in same fashion as other individual indicators

Aside from the cr@p about it not being possible to use PCA and FA (in the sense if you chose all cities, not select ones, you have more than enough data) the method seems to clearly acknowledge it’s own failings. Especially (ii). So this is basically the statistical equivalent of saying “we want to use the method we’ve used, so we’ll limit the cases, so we yield the results for just this data” (basically fitting the problem to the method not the other way round).

2. Blinding

When studying any phenomenon scientifically, including A/B testing, one needs to take care not to influence the results during experiments. This happens de to unconscious cognitive biases. There are generally single blind studies, where the subjects don’t know which group they are part of in the research and double blind studies where both the subject and the researcher don’t know what group the subject has been assigned to. This is to prevent several possible experimental contaminants, which in themselves introduce covariates into the results, or selection bias. This has implications far outside traditional scientific research.

For example, in the case of mystery shoppers, one thing the research body shouldn’t do is tell the company (or their own stores) they are sending a mystery shopper. In addition, if a well known food critic walks into a restaurant, this is likely to influence the behaviour of restaurant staff. Kind of like school inspections.

Study Fail:

The method obviously wasn’t double blind. In addition, it openly admits to selection bias. There isn’t much more to be said about this here, as it also breaks number…

3. Controlled Conditions

This principle aims to isolate the study from the effects of contamination, which keeps the effects free from covariates and hence, the results credible. You keep constant variables in both control and study group, changing only one factor between each. However, it’s the nature that cities interact with a number of factors. Digital activity does not stand alone and the effects of cultural contamination cannot be accounted for. In a way, this is OK, as people meld technology into their daily lives, but crucially, this doesn’t eradicate covariates.

Study Fail:

As we’ve established, the study selected capital cities and a select other set of cities which performed on particular variables. Also, it is not clear what other variables were used to select the cities nor if the same variables were used in assessing the capital cities, since it’s not beyond the bounds of possibility non-capital hubs outclass capital cities in other countries.

OK, So Where is the Harm?

The harm in releasing such rubbish isn’t in the release per se. Indeed, some argue that it is only the responsibility of the company and the impact is only on the company releasing it. However, in this case especially this is not true.

During #SEWeek16, this research study was incorrectly shared dozens of times. Perhaps making over 100,000 impressions or more, never mind the amount of visits and and clicks it’s taken (including mine to rebut it). There are two things that can happen with this.

  1. Other studies or sites base their marketing on this poor quality work and attribute.
  2. People make decisions on this poor quality work. This is especially important for investment in areas currently under represented, but punching well above their weight.

Propagating this sort of result is akin to propagating hype or gossip. The difficulty is as soon as something enters the world of numbers, you cut out 70% of the UK population and if we go into credible numbers, requiring the skill to understand methodology strengths and weaknesses, that cuts out another 25% or so. Hence leaving about 5% of people who understand that this is rubbish. That 5% of people must also have [some influence over] money to invest in the cities using proper, good quality research. However, with the UK not rewarding science, technology, mathematics and engineering as well as other parts of the world relative to the cost of living, this isn’t likely to be the case. Plus, technically, it doesn’t matter. Since as long as you sell to the 98% that’s out there, the 2% who can see through your proverbial fraud don’t matter. That’s how politics works. One person, one vote, one pound.

Conclusion

As we’ve seen, the results are highly incredible. I’d love nothing more than for this research to die, as it is likely to harm cities in countries such as the UK where even their governments publicly acknowledge are powerhouses. However, these are typical results elucidated by the practises currently employed by marketing analysts, who are not statisticians nor top level researchers or data scientists. Indeed, many of them are glorified journalists. So the practise of science is rare in that industry sector. The results only sound credible because it’s presented well to enough for people to believe it.

Basically it’s akin to the fiasco with B.o.B. (the rapper who insisted the world was flat in January 2016).

Also, we can’t get it right all the time. Granted. I’ve taken out my frustration in the past with Manchester City Council before now.

So lets’ close off with some tabloid headlines on Manchester’s real performance relative to some other cities, just to make sure we’ve not missed anything.

Manchester Startup Revenue bigger than Brussels and Munich

Manchester Startups Contribute More to UK Employment than Parisian Startups

Shocking: Manchester, Part of UK’s Northern Powerhouse, sold a dummy by Central Government

The truth is Manchester is doing a serious amount of stuff, with some of the worst investment amounts in EU major cities. Compare the £138 million total investment with £31.3bn in revenue it generates. That is an ROI of 22,581%…

...I think that number is worth repeating. 22,581% Return on Investment.

From this data, London has £8.32bn investment and revenue of £207bn. ROI of 2,388% ROI (one tenth that of Manchester). Berlin, £3.13bn investment revenue of £45.5bn, which is an ROI of 1,354%. Munich £630 million, revenue £26.5bn, an ROI of 4,160%.

So the truth of the matter is that even London isn’t close to as effective with the money as Munich is, so can’t can’t touch Manchester in that regard. The panel discussion that kicked off #SEWeek16 wanted to make the case that it shouldn’t be necessary for startups to move to London and it’s true. The reality of the cost of living hike will wipe you out anyway, but even without that, the real ROI is not in really in London. Only people are.

Me and my company will be looking at this in the next few weeks. If you want credible results for stuff, I invite companies to come and talk to us, as I for one don’t want unjustified hype to ruin things for them or worse, the market at large. Eventually companies carrying out this level of rubbish will find their competition will come along and wipe them out with real valuable insight. That market is ripe for disruption. We don’t need nice looking paintwork on a car with structural damage. We all know how much that costs us!

--

--

Ethar Alali
Bz Skits

EA, Stats, Math & Code into a fizz of a biz or two. Founder: Automedi & Axelisys. Proud Manc. Citizen of the World. I’ve been busy