Heuristics and Biases made when Forecasting: Part 1

15 min readDec 31, 2019

I shall write a three part series where I use lessons learnt from some fantastic books to evaluate the probability of success of a crypto project called Polyswarm Network.

This is obviously not financial advise. This is a theory based academic exercise looking at biases and how they may effect the thinking of an analyst when evaluating a startup project. Don’t use this information as part of any financial decision.

Part 1: Background research on Polyswarm Network
Part 2: Heuristics and Biases made when Forecasting Polyswarm Network Outcomes
Part 3: Forecasting like one of Philip Tetlock’s Superforecasters.

Part 1: Background research on Polyswarm Network

TLTR/Key findings:

Polyswarm Network (PN), built by a private company (Swarm Technologies) provides the very specific function of better aligning incentives of existing stakeholders in the cyber security space. The PN service does not aim to replace or compete with existing products or services only improve existing services. PN allows for checking of tricky artifacts against specialized anti-virus (AV) engines, allows for testing and improving of existing AV engines and provides a high confidence database of attested artifacts.

Its target market is thus the demand from AV companies to improve and test their products and for protecting against cyber security related damage which still occurs despite the presence of current AV products and services. The enterprise market for additional protection is approximately $1.1bn (as measured by insurance premiums paid for this) and government sector demand is unknown.

PN’s ability to deliver service is highly dependent on its platform of economic incentives which have not yet been proven in the wild. More than anything else the project is an experiment which may or may not work. Although the testnet is running and providing a service, no clients have been confirmed, other than in hearsay conversation, which is concerning (however Swarm Technologies may not be able to provide info on clients if they do exist). There are also concerns about the financial state of the company if they do not become profitable before the end of 2020.

When compared to other VC funded startups in the cybersecurity market, the project is slightly undervalued at about $3m, as opposed to the average $5m.

There is a strong team behind the project with an executive team holding good, focused experience with government threat intelligence, however PN appears to be their largest project undertaken. The PN project has gained some traction with about 25 significant AV engines running on its platform and several interested clients.

Aim of article

Recently some fantastic books have been written regarding behavioral economics/psychology and forecasting. The two intertwine a lot because humans seem to make systematic psychological errors in judgement when forecasting the future. These errors arise mainly because our brains are built for the days of the caveman and the same mental tricks which were useful then cause us to make predicable errors today.

Meta-cognition can be annoying. Picture from www.whatsyourgrief.com

Part 1: I shall begin with an attempted objective analysis of the Polyswarm project and the cyber threat space as it may relate to the project.

Part 2: I shall then dive into making use of the lessons of the various books to see where I may have made systematic errors in my analysis and I shall attempt to correct the errors to provide a more objective analysis.

By the end of the article series I hope to obtain a forecast on the probability of the project becoming a top 10 marketcap crypto project within the next 3 years and provide a confidence interval for the forecast (this satisfies the requirements of a proper forecast; probability within a certain range of confidence of a defined outcome within a defined time-frame).

Polyswarm Network Project

Briefly, Polyswarm Network (PN) is an opensource software programme run on a blockchain which serves to bring together multiple parties and incentivises them to provide accurate attestations on malware samples (files and URLs). These attestation results are sold to interested parties who generally provide the samples. The programme is currently running in test net and is expected to be deployed on the public Ethereum blockchain when ready.

Swarm Technologies/NARF

Swarm Technologies is a manned, centralized company producing the PN opensource software programme. Swarm Technologies intends to earn future revenue by providing value add services ontop of PN. Swarm Technologies therefore has an incentive to produce and improve the PN and to promote its use. During the early stages of the PN the fates of PN and Swarm Technologies will be very much intertwined.

Swarm Technologie’s (ST) four executives are also NARF employees. From this we could say that Swarm Technologies is a project being conducted by NARF, the ability of ST to execute on PN may to some extent be assessed by NARF’s ability to execute on previous projects. I see it as a positive sign that PN is made up of a team with several years of experience working together and that the PN project is not an academic project turned company to take advantage of the ICO craze as is the case with many other projects.

Analysis on NARF projects: NARF industries provides cyber threat and protection services such as malware reverse engineering, penetration testing and checking code for errors. They have also hosted white hat hacking courses and patented a firewall method. Some of the more development based projects (difficult to understand) they were involved with included a programme to securely obfuscate code to make it harder to reverse engineer, something about protecting embedded systems, adaptation of an opensource diagnostics programme to Japanese, a system to obtain information from an untrusted network (I think) and something to do with android security.

Clearly this went over my head. It looks like they have done some work for department of defense, which makes sense given their work experience, and have regular income from enterprise as employees have been employed full time.

PN, however, would be their largest project from what I can tell.

In addition to NARF executive suite, 7 engineers, 7 sales and growth, a data analyst, temporary economics consultant and 3 community managers are involved. I assume the 4 executives are somewhat involved in code development which provides for a large number of engineers and a very large sales and growth team.

I dont see why the team would not be able to pull off the project aside from technical challenges which may be unsolvable at this time. To play devils advocate, it is very difficult to verify previous work completed by NARF industries and this would be their largest project taken on where they are developing a software product for large scale adoption.

PN’s market

The segment of the market being addressed by Polyswarm is specific and growth in the general market does not directly mean growth in PN’s market. The use of a decentralized network, due to processing bottlenecks (even with side chain implementations) will prevent Polyswarm from competing with traditional AV companies who cheaply process millions of artifacts per day (Semantic processes 1bn URLs per day). Polyswarm will always need to focus on rare, high security, fringe vulnerabilities and serve users who need extremely high levels of security.

It appears the market Polyswarm is addressing are participants in the threat detection market and enterprise with high security demands (who may obtain Polyswarm services through their existing service provider).

Alternatively, Polyswarm will also provide end point security and sell to consumer end users. This is unlikely given the existing bottleneck on blockchain throughput and that such security would most likely be overkill for personal consumers.

Threat intelligence sharing landscape

Threat intelligence sharing and group efforts become important when AV services are unable to identify a threat. When stuck, it is in the best interest of the AV service to share its problem with a network of other AV services. The pooled knowledge assists all participants. Threat intelligence sharing also seems to be ‘in fashion’ or is part of a new very recent shift in direction/trend. A US 2015 Act Cybersecurity Information Sharing Act (https://en.wikipedia.org/wiki/Cybersecurity_Information_Sharing_Act) also introduced to reduce risk in sharing between parties. Sharing networks and services allow users to obtain the wisdom of the crowd, in some systems, various algorithms would incentivise curation and sharing.

Sharing is caring. Image taken from: https://www.ariasystems.com/blog/sharing-way-better-iot/

Existing sharing networks and services available to AV companies include:

1. Cyber Threat Alliance “The CTA is a not-for-profit organization that is working to improve the cybersecurity of our global digital ecosystem by enabling near real-time, high-quality cyber threat information sharing among companies and organizations in the cybersecurity field.” They have a total of 26 members currently including CISCO, McAfee, Palo Alto Networks and Semantic. The aim is to share intelligence and collaborate on problems. Members submit structured threat intelligence data to the platform. The data is scored by an algorithm for its relevance and scoring the submitting member. Members with good scores can extract intelligence data. CTA processes about 4 million artifacts per month, 66% files, 33% network. Provides a, possibly cheap, way to share data. Looks like access is restricted. Does not incentivise specific/fringe threat intelligence. Does not provide a quick verdict on an unknown artifact. Allows members to improve their services. Threat intelligence is structured and curated making it useful for members. https://www.cyberthreatalliance.org/

2. MISP — Open Source Threat Intelligence Platform & Open Standards For Threat Information Sharing: Provides software for standardizing threat intelligence so that it may be stored and shared in a structured manner. There are many such languages and standards. Countries tend to adopt their own.

MISP is an open source software and it is also a large community of MISP users creating, maintaining and operating communities of users or organizations sharing information about threats or cyber security indicators worldwide. The MISP project doesn’t maintain an exhaustive list of all communities relying on MISP especially that some communities use MISP internally or privately.

One of their largest communities has about 800 organizations as members and the service is free (https://www.circl.lu/services/misp-malware-information-sharing-platform/). It also allows for sharing of indicators which Polyswarm does not allow for.

3. Information Sharing and Analysis Centers for various countries provide standards for collecting and sharing data and facilitate 2-way sharing (e.g Cyber Security Information Sharing Partnership in the UK).

https://en.wikipedia.org/wiki/Information_Sharing_and_Analysis_Center

4. There are several threat intelligence sharing platforms hosted by various enterprise such as AlienVault, Facebook, CrowdStrike and IBM.

5. Palo Alto Networks and other platforms provide a system for management and analysis of threat intelligence and detection and integrate many scanners and threat intelligence systems. These systems aim to provide a comprehensive solution and run playbooks, security orchestration, automation and incident response.

6. VirusTotal. A little closer to the mark, focusing on threat detection. VirusTotal provides attestations on an artifact in realtime from numerous AV engines and records the results. Engines and enterprises may use this data to inform on potential threats as they occur in real time.

When researching the current state of business practice in the space I observe that enterprise has mostly realized the advantages of sharing data where about 50% of enterprise make use of some from of data sharing (I assume large businesses only are looked at here). Data sharing is especially useful for enterprise security teams and security product providers to better understand the landscape, update and test their products, and respond better. So the benefits of collaborative work in the threat detection space seems to be understood and there is a possible recent business trend in the direction of working together.

The problem of single AV product coverage is understood and there are several service platform providers which provide multi-scanner services to increase coverage. So it seems the problems of shared data and limited single AV product coverage are largely solved.

Problems that still remain in the threat landscape:

1. The ability to detect new malicious artifacts where each AV company takes on the challenge in isolation, each siliod company cannot specialize. Results in duplicate efforts with common coverage.

2. The ability to detect new malicious artifacts quickly where each AV company is incentivised to wait on others to identify and share their data.

3. No significant incentive to share threat data other than co-operative agreements, quid-pro-quo and goodwill.

4. No incentive by AV companies to share indicators. Enterprises may share indicators with each other.

5. Preparing data for sharing (removing sensitive personal data) is an expensive process difficult to automate. Further, due to lack of global standards automating the preparation and ingestion of data is difficult.

6. Data sharing does not improve false negatives where a team failed to detect a threat and therefore did not report it.

7. Lack of calibration in the industry (how accurate each attestation is when compared to ground truth) where it makes more sense to increase false positives.

8. There is a privacy and legal concern around sharing threat intelligence data.

9. There is no up to date source of ground truth for novel artifacts.

PN aims to solve problems 1, 2, 7 and 9. It would be nice if PN could implement a system for solving problem 3 and 4 as is being attempted by Cyber Threat Intelligence. It is possible that PN may exacerbate problem 8. To solve problem 9, PN has a built in payment mechanism and reputation mechanism for arbiters to determine ground truth on all submitted artifacts. Current methods to establish ground truth involve using assumption models (such as if > 50% of engines say its malicious then it is) and would cause missed positives. Delayed response methods (waiting some time after an artifact is found to rescan once more engines know more) requires delays. A continually updated source of ground truth would be extremely useful for machine learning AV software and to allow AVs in general to continually update their products.

From the above we may conclude that PN’s market is the current unmet demand for improving on problems 1, 2, 7 and 9. PN solves these problems through a setup of attestation results record keeping and financial rewards/punishments thus introducing financial and reputational incentives to provide accurate attestations and ground truth determination on tricky artifacts. PN takes on problem 1 by incentivising engines to specialize and problem 2 by incentivising engines to spot something others do not. Problem 7 is taken on by incentiving against making mistakes (financial and reputation loss). Problem 9 is solved as a result of PN record keeping.

Given that current AV solutions do not provide 100% coverage and protection (if they did, all enterprise with AV software will be 100% protected). The demand for improved cyber protection service would be the market’s willingness to pay for avoiding the current costs of cyber attacks. PN can be assumed to gain some percentage of that market.

PN’s possible market share

The enterprise market can be estimated by looking at the size of global cyber security insurance market. Specifically this would exclude social engineering cyber breaches which PN cannot protect against. It is assumed that government would not obtain insurance against cyber security attacks, for this we might look at coverage under the federal terrorism reinsurance law but I assume government just prints money when they need it and doesn’t pay insurance premiums.

The most reputable information source I could find (AM Best’s market segment report, 2019) showed direct cyber insurance market to be $1.1bn as of 2018 as measured by value of direct premiums. An additional $0.9bn was paid as part of package claims and may not be directly related to malware and is therefore excluded. When looking at cyber specific claims from 2015 to 2018 it looks like the number of claims per year is largely unchanged while claimed on packaged cyber insurance is increasing. This could mean an increase in errors and omissions insurance or it could mean something else.

Currently cyber insurance underwriting is a high margin profitable business indicating that businesses are overestimating their risk however insurers are concerned about what the claims environment may look like in the near future due to lack of data in the space.

I shall assume the current market to be unchanged going forward. Polyswarm may therefore capture some percentage of the $1.1bn enterprise is willing to pay to protect themselves from cyber security incidents.
The addition of the use of PN will not protect against all types of cyber issues and will not protect 100% so PN would never take 100% of the cyber insurance market.
The revenue attributable to PN in the future may be somewhere between $0 and $1bn.

Reasons demand for better or more cyber security may increase:

Need for better protection of user data.
Strict regulations from US and Europe regarding the reporting of breaches and the protection of user data. Policies are also implementing minimum cyber security requirements.
More data hosted by enterprise.
Cryptocurrencies allowing for liquid capital to easily and anonymously move over the internet.
General trend toward digitization.

And reasons it may decrease:

Technology changes resulting in better protection of client data so breaches are not as problematic.
Users holding their own personal data and not having it held by enterprise.
Loss of innovation by cyber criminal groups.
Technology users becoming more aware of cyber threats and avoiding them.

PN’s market incentives

We can see that PN’s ability to provide solutions is fundamentally reliant on its ability to engineer market incentives and for those incentives to function in the real world.

The economic incentives are listed below:

1. AV engines/cyber threat intelligence services/enterprise purchase the ability to provide queries to the PN. These clients must be willing to pay for the value added service. The fees paid by the clients to obtain the attestation result is out weighed by the benefit of obtaining the result at the level of accuracy PN generally provides. Higher levels of accuracy would warrant higher fees.

2. AV engines running on the PN are incentivised by the fees to run their engines and keep them relevant. The fees obtained should cover the network costs and costs to maintain the engine as well as provide some worthwhile margin. It becomes clear that areas of overlapping coverage by engines would result in the attestation fee being split among many participants thus reducing its profitability. Greater profitability can be obtained by focusing on artifacts few others would confidently provide attestations on or that others would get wrong. Thus the system is structured for specialization.

3. AV engines are confident enough in their attestations that they are willing to risk some capital in order to obtain a share of the amount paid by a client for an attestation.

4. Arbitors are sufficiently incentivised by a portion of the fee to provide a verdict. Status may provide an incentive for arbitors to conduct their work although the network should function without status being a motivating factor.

Given the above, the following are probable uses cases for PN in the existing market

1. Enterprise services who are already using multiple scanners to provide broader coverage and better insights to their customers can integrate PN for the more tricky samples in order to further improve their coverage.

2. AV companies can test their products for accuracy against ground truth by running products as microengines on PN.

3. AV companies can use the microengine response data and ground truth big data to improve the accuracy of their products.

4. Over time PN may obtain a large vulnerability database of artifacts with high confidence attestations and may function as a gold standard for attestation of artifacts.

This means the current market demand for PN services is demand from enterprise services to improve their coverage and demand from AV companies to test and improve their products. There is existing demand for such services so PN, if able to deliver and if used, will have room to grow into the existing demand. Future growth in such demand would in turn be proportional to increases in new malware.

A database of high confidence attestations on artifacts may be useful for investigative research and intelligence building.

I am not sure what it will mean if PN becomes a gold standard for attestation accuracy (as arbitors provide a clear verdict on each artifact). Perhaps this will function as a measuring tool for status and promotion of AV tools by their developers.

It seems that PN will ‘plug-into’ the existing AV and cyber intelligence space to improve it by better aligning incentives of participants in the space to be able to tackle and provide verdicts on some of the more complex threats. It is not an all encompassing solution (as is provided by the likes of Palo Alto Networks), but may be able to provide benefit to various stakeholders in the space.

Thats part 1. Please feel free to comment and critisise below. I welcome feedback as this broadens my understanding.

Part 2: Heuristics and Biases made when Forecasting Polyswarm Network Outcomes