This paper offers new thinking toward radically improving the security and privacy of information systems. Currently, in the wake of the OPM failure to secure sensitive personnel records of millions of Americans, federal administrators are scrambling to improve information systems that government increasingly depends on. In addition to all the demands and standards legislated by Congress, federal agencies such as OPM are trying to catch up to basic industry best practices.
Yet neither is industry doing a great job at security either: recent news brings stories of the NYSE suspending all trading as well as United Airlines grounding all flights, both due to technical glitches. Massive data breaches by major corporations are becoming routine. Major security bugs continue to plague all commercial software with no end in sight.
Even if successful, industry state-of-the-art isn’t good enough, I claim, for critical government functions. Put more bluntly, how do we know if the best we know how to do (i.e., following industry) is good enough — and if it isn’t then what do we do instead?
Federal computer security regulations and standards border on incomprehensibility to industry professionals unless they have studied and trained toward the appropriate certifications, and let me clearly state that I have no such training or experience. However, reading a NIST overview with the preceding question in mind reveals critical unstated assumptions that I believe lie at the core of the problem:
- No actual bar is stated as to level of security that must be achieved.
- It’s all about prevention with no consideration how to respond to failures.
- Systems be built using available commercial IT systems.
- Standards such as FISMA dictate practice yet those standards are never tested.
- Details of actual compliance and performance are non-transparent.
I claim that all these assumptions are fatal flaws in a system that will never reliably ensure levels of information security government deserves. That is, whoever architected the entire plan for securing federal information systems assumed it must be built on commercial systems and compiled a regimen attempting to secure that as best they could via remote control through regulation and standards. Apparently nobody asked how successful that approach could be expected to work and if that was really good enough.
Any competent security engineer will tell you never to try to secure a system by patching up something known to be insecure, but that’s exactly what we keep doing. Even a casual read through the license agreement (details in Addendum) for any commercial operating system makes it clear the maker offers zero assurance of quality, much less security. How does it make any sense building critical systems on top of this technology? Just because it’s the best available I claim is not a good answer. It is the best we have, but if it isn’t good enough I suggest we need to open the discussion to new approaches — radically new. This is tantamount to trying to remodel a cheap broken down house into a quality building by fixing it up somehow: instead we need to tear down and rebuild anew.
Suppose a model federal agency builds its digital infrastructure fully to FISMA standards using the best IT personnel and following best practices conscientiously. What attack vectors remain, what are the odds of a major compromise, and how confident are we that the damage levels from intrusions will be acceptable? Given past performance (which isn’t public information so I can’t cite it here) are we happy with this?
Federal regulations and standards such as FISMA do a great disservice — and commercial security products are no different — by purporting to be a complete solution. This fails to acknowledge the possibility of unaddressed threats, insufficient mitigations, and the risks of flawed implementation and operation. Fundamentally better security begins with a brutally honest look at as many vulnerabilities as possible and the serious limitations on our ability to manage them.
Real change will take years and involve great effort but most importantly it will require changing our thinking and renewing the culture of software and IT. I believe I have some starting ideas.
- rethink authorization and data storage
- relentless focus on least privilege throughout the system
- realizing secure systems from insecure components
- design simple and secure over general and complex
- we likely need to give up convenience to raise the level of security
- anticipate exceptions to standard policy with auditable ways to handle special situations
- design for failure with strong accountability
- embrace transparency over the security-by-obscurity instinct
Each one of the ideas above (to be elaborated below) is somewhat vague and speculative, and to be useful will require careful design and implementation, and it certainly won't be cheap or easy. These are offered as general examples of the kind of approaches we need, without benefit of any detailed knowledge of actual federal systems requirements, much less any detail about past security failures. The following descriptions overlap with some repetitions for completeness since many of the ideas are interrelated. I believe these and many more new ideas to emerge from a serious redo of the kind of systems the federal government depends on have the potential to vastly increase level of security that can be achieved.
Summarizing this introductory challenge, the premise of this paper comes down to two points:
- Do we really believe that doing more of the same for federal information systems security is going to work well?
- If not, we need some revolutionary thinking to innovate and build more robust systems in very different ways.
In the following sections I want to just sketch these ideas briefly, providing one commonplace example. Each section touches on topics that deserve detailed treatment that’s well beyond scope of this document and realizing a real system with demonstrated benefits is of course a major undertaking. However, the goal here is motivating such future efforts by showing the potential benefits.
Storage and access permission models
Typical information systems contains large tables of detail records with many fields; a given user of the system has read and/or write privilege to a proscribed set of fields depending on what access they need for the union of job functions they are authorized to perform. Since the system cannot know a priori which records will need to be accessed, usually access is granted to all records even though in normal use only a tiny fraction of records may be accessed on any given day.
Potential problems are readily apparent with this system: there is no accounting of or limit to the number or rate that records are accessed at. A compromised client could access large volumes of records to facilitate a breach even though there is no valid reason to do so. Once that data gets to a compromised client it’s easy to exfiltrate it.
Ideas: authorization policy needs richer primitives such as access volume and rate limits; authorize access in context of the function actually being performed (e.g. in a call center, access the calling customer’s record only during the duration of the call); where feasible provide tokens instead of raw data.
It’s worth explaining that last idea of tokens with an example: sales representatives need the ability to phone customers but not necessarily to know actual phone numbers. A token can enable placing calls through a communication system that does the actual call routing. Disclosure of the token is relatively harmless since without access to the system that accepts them they are meaningless. Other examples include email or mailing addresses, credit card number for charging.
The most generalized form of the concept being applied is that what we traditionally think of as a simple datum can be represented by an array of capabilities that are (in a manner of speaking) actually rolled up within the raw data itself. A phone number contains, for example, calling rights (which can be limited to use during certain hours, prefixed by a recording such as calls from prisons are identified, and so forth); information about geolocation; comparison to incoming CallerId or testing if someone has knowledge of, say, the last four digits as an authentication test; and more.
Least privilege everywhere, all the time
Least privilege is perhaps both the best known and least rigorously applied principle of information security. Diligent application requires effort at the requirements, design, and implementation stages and to most developers apparently is considered a waste of time. That traditional OS authentication and authorization primitives are not nearly rich enough to build in least privilege properly only makes it less likely to be taken seriously. API and network protocols in turn rarely consider least privilege implications when designing interfaces, giving developers plenty of excuses for not even trying very hard. Fixing this state of affairs isn’t difficult if you start at the lowest levels and build up anew.
We need to start thinking of all the capabilities conferred any time that data access is granted and apply least privilege accordingly. Access to multiple pieces of related information may confer more capabilities since combinations of attributes can potentially identify specific individuals or enable answering authentication challenges.
As the phone number example above illustrates, working with raw data implicitly violates the principle of least privilege since it implicitly grants numerous unintended capabilities. This effect is enhanced when multiple pieces of data are provided, either together in a complete customer information record for example, or when an attacker can glean separate pieces and join them together as a powerful identifier or for impersonation (such as identity theft).
Using a tightened access permission model as described above, every machine, user, connection, and software component should operate under a restricted set of authorizations appropriate to the function and environment (including factors such as time of day, location, and so forth). For example the client used by a customer support person should only have access during working hours, restricted scope appropriate to each customer call handled, using minimal information for the task at hand, limited to reasonable volume of access over time. By contrast, legacy architecture would make all customer records accessible in any volume (only arbitrarily limited by software design, bandwidth, and server capacity), representing a much larger risk of unauthorized disclosure in the case of compromise or even operational mistakes.
Still further limitations toward the goal of serious least privilege to consider include:
- instead of full records of all details selectively disclose fields only as needed in context of use
- log all accesses combined with need-to-know context for audit (real-time automated and forensic)
- present human readable data visually in a graphic presentation not amenable to copying
Secure computing with insecure components
Commercial operating systems, software applications and libraries, and networks are simply not secure by any kind of rigorous standard. Software license agreements painfully spell out, often in all caps, that absolutely no warranty whatsoever is provided. Nobody would build a house on a foundation provided under such terms so why do we persist building information systems on this stuff?
Modern datacenters have evolved into a very robust architecture where cheap machines are networked into distributed systems that achieve very high reliability despite being built from components that individually are not very reliable. In a nutshell, by decomposing a service response into a request across numerous redundant worker machines, high availability can be achieved; downtime of major commercial services is quite rare despite persistent and frequent failures and slowdowns within individual components. At risk of oversimplification, key principles of this architecture include:
- numerous front end instances route requests to worker machines but are as simple and robust as possible and do little actual work
- pools of workers distribute subtasks as requests to specialized workers, then combine results
- storage systems are also designed to be simple and redundantly store data at remote locations
- all messaging is load-balanced and automatically retries subtasks in case of failure or delay
- all activity is thoroughly monitored, errors and degraded performance is proactively investigated, and failed components are identified and fixed or replaced
It should be possible (though I have neither found research or undertaken the effort yet myself) to design distributed systems that in toto have higher levels of security and privacy assurance than the individual components they will be built from. The high level principles detailed above provide hints as to how this can be accomplished.
- rigorous least privilege based on rich access permission model can be enforced and monitored by a separate, extremely simple component, independent of actual processing and storage of the data
- one important specific case where least privilege avoids a vulnerability is eliminating remote root access completely (re-imaging from trusted source and reboot is the only intervention allowed)
- strictly limit network communications by servers to the needs of servicing requests
- data and tokens pass through the system encrypted such that they are only usable by the component requesting (so a compromise of an intermediary does not directly lead to breach)
- since personal data is most potent when joined together, storage of records can be split and only joined when strictly required
- wherever possible, keep data and sensitive computation is never allowed on end user clients, to be performed by well protected and isolated machines in a datacenter under better control
- monitoring such a system allows unusual requests or unexpected high volume access to be identified and if necessary shut down
Overcoming the complexity of modern general purpose systems
Handling sensitive information securely on a general purpose PC operated by staff not thoroughly knowledgeable about technical as well as data privacy issues is a recipe for disaster. Modern OS are too easy to compromise, and once malware gets a foothold the potential for abuse is almost immediately total. These systems are designed to make it easy to use the web and install applications, and many extensive efforts to “lock down” systems supposedly making it impossible for users to “shoot themselves in the foot” routinely fail and result in poor usability and many support problems.
It’s beyond my imagination that we can run secure systems with off-the-shelf PC systems.
- Too many processes doing who knows what. I am currently writing on a laptop running 128 processes (expr `ps aux | wc -l` — 1). Recently I saw a very technically savvy user ask, somewhat ominously, “Anyone know what the OkCMbE process does, or why it would be using so much network bandwidth? ”. (It’s a good question.)
- While the kernel/user boundary is somewhat solid, this only protects the integrity of the OS; typically private data or other targets are still fully exposed if access permission exists.
- Applications once installed have full privileges in userland.
- Libraries also have full privileges and potentially see all kinds of data from any application.
- Fully locking down an operating system — yet retaining ability to maintain and support — has never been a priority for operating system makers and likely never satisfactorily achieved.
- Zero-days and targeted attacks like spearfishing continue to be effective enough that it’s always a matter of time, never a question of if, attackers will gain unauthorized access.
It won't be easy but in order to build truly secure systems I suggest we retreat from general purpose computing. The Russian response to NSA surveillance capabilities of reverting to typewritten paper documents actually makes sense, but of course we can't go back to all paper record keeping because the volume and access volume demand would be overwhelming.
Exactly how to engineer the right solution depends on two main factors unknown outside government:
- The specific applications and data handling required.
- Detailed analysis of threat model as well as forensic review of past failures for an attack profile.
However, by making some basic general assumptions, here is one sketch of an example fleshing out the approach. Federal agencies could pool joint requirements for data record management to build and deploy special purpose systems providing access for a wide range of database applications. Every agency in the past has had the luxury of independently setting requirements, designing, building, deploying, and operating their own custom systems — but I suggest this is a luxury and a formula for disaster and must stop. Instead, I suggest designing a special purpose database access appliance with very minimal customization and features. The design effort of hundreds of independent projects should instead be focused on building one very secure, very simple product that will have wide application. If staff requires web access or other things PC can do, use a separate network and machine (possible switchable via KVM) to protect critical systems. The resulting system could be orders of magnitude more secure and handle perhaps 80% of routine data access at a truly higher level of security and privacy. Working from a strong foundation we could incrementally expand the capabilities and use cases from there.
Security comes at a cost
Conceptually speaking, every machine, every user granted access, every connection to the network, and each data access event incurs an implicit cost of additional risk of breach of data or compromise. Strong defensive policies will trade off access risk as a cost-of-doing-business to perform services and manage data. Often administrative efficiency will be at odds with minimizing risk through more limited authorizations and depending on level of risk and sensitivity of the data additional steps and overhead will be required to balance risk to acceptable levels.
If we are serious about better securing federal information systems, realizing that goal begins with explicit declarations of what we are willing to give up in order to get it. There is a lot of wisdom in the proverbial sign in an auto repair shop — Fast, cheap, or quality service: pick any two! — and to build truly more secure systems will require hard choices giving up a lot. Most likely, the more we are willing to give up elsewhere the more successful the resulting systems will be.
Here are some desirable requirements to consider giving up in the interest of a new level of security.
- Fast: the first full production service as described herein will likely be a multi-year effort.
- Cheap: successfully completing an unprecedented project of this scope and scale will require the very best software engineers and security specialists, and they demand good compensation.
- Compatible: assuming we do build a more secure system, connecting it to legacy systems will immediately compromise it; integrating with existing systems will be a whole new challenge.
- Conventional: the implementation, operation, and usability of these new systems will likely be new and unfamiliar, requiring specialized training to use and operate securely.
- Efficient: more secure systems will have more redundant authorization systems, more audit data, and communications will be heavily encrypted and highly distributed — which all adds overhead.
Failure is an option
Conventional systems are extremely fragile and as a result, vulnerabilities too easily result in massive if not total compromise. Deployment, configuration, and operational errors (not to mention software flaws) all represent a massive list of possible errors that are not allowed to ever happen for the system to be secure. As we all know, it’s only a matter of time until something does happen, and usually then it’s game over very quickly.
Instead, we need to acknowledge that mistakes will be made and design our systems to be resilient and, if compromised, ensure that the fallout is proportionate, limited, detected, and easily addressed. This is far more easily said than done. Frankly, I am not certain that we can achieve this fully, but I do believe we can get part way there and iterate better and better from there.
The goal is to build components, and from them systems, that are not susceptible to full compromise by one single action or flaw. Configuration options need to be drastically cut back as near to zero as possible. Deployment as near to fully automated as possible. Software must be built openly from a trusted tool chain and delivered securely. End user applications must be designed to reduce the possibility of operator error, ensure data is accurately recorded, and facilitate recovery in the event of the inevitable error. (As one obvious example, applications can never present the familiar dialog, “Are you sure? This operation cannot be undone.”)
Commitment to transparency
I believe that systems can only be truly secure when there is a strong commitment to transparency. System design behind closed doors makes it too easy to miss or wilfully ignore weak points. As a result of the lack of transparency around federal information systems, this paper can offer nothing more than hints at solutions to improving security and privacy assurances of our systems because of the pervasive suppression of information about how systems are designed and the details of how compromises and breaches occur. While there may be clear political and legal justifications for keeping all the details under wraps, in the end it is “security by obscurity” pure and simple, and we must change in order to improve.
If we had truly secure systems there would be no reason not to disclose the details. (Some might argue that better security technology should be kept secret as a strategic advantage, but I would argue that stronger security everywhere floats all boats.) Only when all details are out in the open can we have a realistic and meaningful dialog about strengths and weaknesses, accurately assign fault where it is due, and take corrective action. (Note that this is not the familiar but questionable “given enough eyeballs, all bugs are shallow” argument.) Everyone designing, operating, and using the system can best contribute to its success only when they can understand how all the parts work.
Other breach mitigations
- All accesses are fully logged with monitoring (virtual over-the-shoulder screen share) by managers and peers for unusual accesses as well as random sampling.
- Numerous basic technical requirements need to be done precisely in order to achieve secure result. Avoid redesigning these in each agency and move to very well designed showcase designs for networks, authentication, authorization, crypto, data protection, and so forth.
- Standardize and automate everything possible: the fewer moving parts the better.
- Components should be designed as standard appliances for use with zero to minimal configuration options because it has proved too difficult for deployments to securely configure systems. Customizations should never allow less than fully secure operation.
- Many information assessment and similar tasks can be performed with enhanced privacy by separating content from personal information. For example, an assessment of personnel based on reported medical history, past addresses and affiliations, criminal record and so forth can be performed (to some degree, with some loss of fidelity, but satisfactory for some cases) with a pseudo-identifier instead of full name/address/DOB/SSN; separate parts of data can be independently assessed; these assessments could be joined for final review (and in some cases, but not all, it may be necessary for investigator to have full set of information). While this inherently compromises the assessment in many cases it could be sufficient and it greatly mitigates privacy impact and lowers risk of a total breach when it can be done this way. This is in principle how TSA screens body scans at a remote location anonymously.
This paper is a preliminary exploration of new possibilities for the prospect of building truly secure systems. As such it is intended to question the status quo, consider the basic question of how secure must federal systems be, while suggesting that what we are doing may not be good enough. In an effort demonstrate that better approaches are possible, a number of speculative new ideas are presented. Each of these represents a major research and development effort, with significant projects required to put into practice and validate new kinds of systems as better future solutions. Not all of these early ideas will necessarily pan out, in fact many may prove difficult or be dead ends. However, the goal here is to open up thinking to new possibilities so that better minds with more real context can find still better directions.
Finally, let me state the challenge I see ahead as directly as possible:
- Have we designed critical systems for maximal security and privacy? Not nearly so far.
- Is there a serious need for more robust and reliable security than we have now? Absolutely.
- Are we out of ideas how to build more secure systems? No: see above for a start.
- Will it be easy to design and deploy more secure systems? No.
- Will we have to give up other things in order to get more secure systems? Definitely.
- Will we make the effort now, or if not now, when? The answer remains to be seen.
Addendum: typical EULA merchantability clauses
That we continue building information systems with any expectation they will be secure on top of commercial operating systems with modern EULA (end user license agreement) terms I have absolutely unimaginable — and nobody even talks about it as being a problem. While I understand why software company lawyers insert this language, I still find it an incredible level of hubris to disclaim any warranty or suitability for any purpose. Customers continue to accept this and feel the pain: that we put up with this lack of quality for the high tech software that so much depends on is absolutely incredible and I suspect has no precedent in history.
Red Hat Enterprise Linux EULA: … to the maximum extent permitted under applicable law, the Programs and the components are provided and licensed “as is” without warranty of any kind, expressed or implied, including the implied warranties of merchantability, non-infringement or fitness for a particular purpose.
Microsoft Windows 8 EULA: … exclude all implied warranties, including those of merchantability, fitness for a particular purpose, and non-infringement.
Mac OS X (10.10) EULA: … disclaim all warranties and conditions with respect to the Apple software and services, either express, implied or statutory, including, but not limited to, the implied warranties and/or conditions of merchantability, satisfactory quality, fitness for a particular purpose, accuracy, quiet enjoyment, and non-infringement of third party rights.