Enterprise Security Frameworks for EOS Block Producers

Published in

Coinmonks

10 min readMay 14, 2018

by Blockgenic Security team

— — — — — — — — — — — — —

Our team members have authored seven Internet (IETF RFCs) standards on authentication, identity and cryptography, hold 7 granted patents focused on security, have managed and built key consumer and enterprise security features into widely used products such as Windows and managed the security of critical cloud infrastructure including Bing and Azure.

— — — — — — — — — — — — — -

Even as the cryptocurrency community has been the target of many high-profile hacks, none of the crypto platforms today have a comprehensive security framework.

While we have discussed security issues in the EOS community, it has generally been focused on one or two specific issues. A well-designed security framework allows us to identify the top threats to the EOS infrastructure and focus on those.

With this post Blockgenic is detailing the process and framework we are planning to implement. The corresponding security plan is attached as a separate document at the bottom of this post. We encourage all BPs who have a security plan to share theirs, so the entire community can benefit from our best practices.

Security Frameworks

There are several security frameworks available to protect information assets and while they may all differ in specifics the broad approach is very similar. That first step is to identify, classify and prioritize the information assets and the associated threats to these assets. The second is to implement a set of security controls to protect these assets from the associated threats. The third is to detect attacks designed to circumvent these controls. The fourth is to respond to these attacks to minimize damage and finally the fifth is to effectively plan for recovery from such attacks.

NIST 800–53, ISO 27000 series, COBIT 5, RFC 2196 etc. are all examples of such security frameworks. While no security framework is perfect, implementing one greatly reduces the likelihood and risk of attacks.

At Blockgenic we have chosen to use the NIST Cybersecurity framework because it is simple and intuitively laid out and the most widely used enterprise security framework in the US where Blockgenic is based. There is pending legislation that would provide a safe harbor to businesses against liability if they comply with the NIST Cybersecurity framework[1][2]. The NIST Cybersecurity framework lays out 5 broad functions across which cyber security activities are organized: Identify, Protect, Detect, Respond and Recover.

We encourage you to read the rest of this paper detailing the process but if you want to go to the detailed framework specific recommendations directly here.

Identify

The first step in an effective security framework is to identify the assets that are valuable, classify them and prioritize them. Part of the identify function is also identifying, classifying and prioritizing the threats associated with these assets. We would be remiss if we did not point out the number of occasions where we have seen an organization not even have a comprehensive view of all the assets under management. Obviously if one is not even aware of an asset then it is hard to protect it.

Often security implementations are a patchwork of technology buzzwords such as Firewall, HSM, DDoS protection without any clear thought as to the assets being protected and all the associated threats. The security of an asset is only as good as the weakest link in the protection chain so for example having strong offline encryption for a key that is easily compromised due to weak user credentials does not add much to the overall security of the system. For Blockgenic we prioritize our security investments based on security risk where

Security Risk = Likelihood of security event x Impact to EOS platform

For this simplified example we will categorize likelihood, impact and risk as High, Medium or Low. We have chosen 2 specific assets 1) Accounts used by Blockgenic team members for node management 2) the EOS end user APIs exposed by the Blockgenic BP nodes for users to submit transactions 3) Blockgenic website. We also look at one specific threats against each of these assets 1) Credentials compromise for the accounts and 2) a Distributed Denial of service attack for the API

We will use these examples to walk through the rest of the framework.

Credentials compromise

For the purposes of this discussion we determine the likelihood of the credentials compromise attack occurring as High and the impact as High resulting in an overall risk level of High.

Likelihood: High

Credentials compromise attacks are frequent because they are relatively straightforward and highly susceptible to social engineering. While the classical image of the hacker is one of employing sophisticated techniques, the reality is that over 50% of attacks are social engineering attacks[3].

The recent Nicehash attacks that siphoned $78 million started with the compromise of one of their engineer’s VPN credentials[4].

Impact: High

Compromised credentials allow the attacker to do everything that the legitimate user can do and chance of detection is low until the attacker takes steps that are significant and noticeable.

Overall Risk: High

The below matrix represents our Risk mapping matrix where the columns represent the Impact and the rows the likelihood

DDoS

Likelihood: High

Most attacks (not all) are primarily motivated by the ability to make money. Traditionally it is much harder to make money by bringing down a service than by obtaining unrestricted access to it. One either has to short the underlying asset which means taking a risky adversarial position[5]. Of course, these days hackers are creative and embed ransom notes in the DDoS traffic[6].

Impact: Medium

Even if attackers succeed in bringing down a single BP they cannot inject arbitrary transactions like with compromised credentials and with several standby producers the impact to the network itself would be not be high.

Also EOS.IO software itself has built in protections that make it resistant to higher level resource consumption attacks common to other crypto networks[7].

Overall Risk: Medium

Note: This is a very small sample set of threats and BPs are encouraged to develop a comprehensive asset and threat list as part of their risk determination. Further these determinations must be continuously evaluated and may change as conditions change.

With these risk determinations out of the way we next examine the steps that can be used to protect against these risks.

Protect

Ideally, we would work through the prioritized list of threats spending resources appropriately to protect against these threats.

Credentials compromise:

Smartcards: Credentials may be compromised either accidentally or deliberately through attacks such as online password cracking, offline password cracking, phishing, keyboard sniffing etc. A good protection solution should be effective against most if not all these attacks. Our preferred solution is a USB smartcard solution such as the Yubikey where the private key is generated on the smartcard and never leaves the smartcard along with a strong PIN that protects the smartcard[8][9]. This ensures that the smartcard is unlocked and only used for authorized signing and decryption operations.

In the absence of smartcard support in applications the private key may be stored on an encrypted USB drive, but this exposes the key during operations.

Passwords: If the above are not possible then passwords may be used with appropriate password management policies for complexity, re-use prevention, rotation etc. We would recommend a truly random password of at least 16 characters. Unfortunately, even with the best management passwords can be revealed via phishing, keyboard sniffing malware etc. and truly random passwords are hard to remember.

A note on password recovery management: As we noted above the security of the system is only as strong as the weakest link. In addition to human operators who can be phished, accidentally made to run malware etc. the other weak link that is often exploited but less well known is password recovery systems. There is no use having a strong random password if the recovery is via a weakly protected email account. Even if the recovery email account itself has a strong password the recovery for the email may be via SMS or another system that can be compromised by social engineering. We strongly encourage that two-factor authentication on all accounts be setup using an authenticator app like google authenticator, as SMS based two factor authentication has been subject to a lot of social engineering hacks recently.[10]

DDoS:

DDoS attacks are conducted by attackers controlling large scale botnets (typically 100s of thousands of compromised machines) and using them to generate attack traffic. The traffic can be anything and could be as simple as legitimate traffic, but most DDoS attacks generally take advantage of weaknesses in the application or protocol stack to amplify their effectiveness. Preventing DDoS attacks against services that need to be publicly available is especially difficult.

DDoS attacks can occur at any network layer from L3 to L7 and a good solution must protect against all attacks at all layers and not just one or two.

Layer 3 DDoS attacks such as ping floods generally try to take advantage of ICMP protocol weaknesses. Another common avenue is exploiting weaknesses in routing protocols, but these have become more resilient over time.

The most common layer 4 attacks are SYN floods and UDP packet floods, but others may use combinations of ACKs, FINs and RSTs to attack the target.

Layer 7 attacks use HTTP weakness or weakness within the app payload such as SQL queries.

Protection measures should be effective against a wide variety of attacks against different layers.

There are two high level ways of protecting

Offloading to an external DDoS protection service: This may be done at the application layer or lower and works by using Anycast to route traffic through the protection provider’s edge network and scrubbing centers. Some of the most common DDoS prevention services (such as Cloudflare, Akamai Kona/Prolexic, Arbor networks, Incapsula etc.) use this method at the application layer. The service provider is responsible for filtering the traffic at the scrubbing center and automated algorithms are often enhanced with personnel. This combination is generally effective since the traffic is intercepted at multiple points close to where it originates rather than waiting for it to be concentrated at the destination network where it may overwhelm any filtering service.

This does require that the customer/BP outsource the DNS name servers to the protection provider. Further this requires that the end server limit communication to the service provider and/or hide its actual address since otherwise the server may be directly attacked bypassing the protection service. Even with the traffic whitelist (implemented at layer 3) attacks at layer 3 may be launched if the true server endpoint becomes known to the attacker[11]. Also, an outage at the DDoS protection provider[12] [13] will result in an outage for the BP or multiple BPs resulting in an unhealthy dependence on one or two of these providers for the entire EOS ecosystem.

Signing up for DDoS protection from cloud/datacenter vendor: The major cloud vendors such as AWS, Azure and Google Cloud all provide DDoS protection services. One of the advantages if your service is hosted on the cloud is that your cloud vendor already controls all the layers of the network stack and has more than enough capacity to handle surges. The disadvantage usually is that waiting till the traffic is concentrated at the destination means the algorithms are not as good at scrubbing DDoS traffic especially if a long-term profile of your legitimate traffic has not yet been built.

Automated scale out: One of the benefits of this approach is the cost credits for automated scale out from your cloud vendor during a DDoS attack. But this requires that your service can be scaled out to multiple machines automatically. Thus, while this can be useful for non-producing nodes, producing nodes are limited by the single threaded nature of the block production code.

Testing DDoS protection: Often there is little or no testing of DDoS protection measures or it is tested with one specific tool or type of attack. So, the first time it would be tested would be during a real attack when to everyone’s surprise the measures that worked so well against one type of traffic turn out to be useless against the real attack. We recommend using comprehensive DDoS testing services such as Breaking Point Cloud or Nimbus DDoS attack platform which perform a range of attacks across all the layers of the network stack rather than getting a false sense of security from testing one or two tools.

The DDoS protection services above generally cost $5000+/month. Blockgenic is planning to use one of the providers above and is still in the process of evaluation of multiple services to finalize one. We encourage smaller BPs who may not be able to afford these services right now to not only think of security in terms of the Protect function (as most people do) but also the Detect, Respond and Recover functions that help in mitigating weaknesses in the Protect function. We will be detailing these functions in a future post due to lack of time, but the published framework contains the procedures for all functions. This means having an effective strategy to bring up an alternate producer node and communicate to other BPs in case of an attack. For BPs lower down on the standby list this may be good enough. Note however that automated failover or traffic redirection will just redirect the DDoS traffic to the new node, so a manual method would be needed. We will be publishing a detailed post on our thoughts on how the BP community can cooperate in this regard to improve the resilience of the entire blockchain network a later time.