The EU’s General Data Protection Regulation (GDPR) is the most dramatic change in data protection and governance in the last 20 years. With additional impending privacy regulations, increased data breach frequency, and potential reputational damage, data security has become a top priority for security leaders. There is a significant opportunity for data security vendors due to the market size; manual task/services automation; ML-enabled differentiation; and alignment with the data-as-a-service trend. We cataloged ~100 data security businesses to provide a comprehensive landscape. We are excited about innovation in the space and look forward to speaking with startups offering data security solutions.
Over the past two years companies have become more concerned about data privacy dues to regulation, data breach frequency, and reputational damage. GDPR is only the beginning. California, Japan, China, India, Australia, New Zealand, amongst others are developing similar privacy regulation.
During our channel checks, we repeatedly hear security leaders worried about The California Consumer Privacy Act that goes into effect at the start of 2020. The law affords California residents the right to be informed about the types of personal data companies collect and why it is collected. Consumers also have the right to request deletion of personal information, to opt out of the sale of their personal information, and to access the information in a “readily useable format.” Regulation will be a nice tailwind for data security businesses.
Data breaches’ frequency and scope have increased over the past few years. Here you can find a nice visualization of data breaches over the last decade. According to CyberScout, there were 1,579 data breaches in 2017 with at least 179M exposed records (chart below). From the Marriott hack that exposed data of up to 500 million guests to Twitter’s 330 million unmasked user passwords to Equifax, data security breaches’ reach is astounding.
Reputational damage from data breaches, not just regulatory penalties, affects businesses. Through the Cambridge Analytica and Facebook incident it became apparent blaming a third party or claiming ignorance is a good recipe for a public backlash. After the breach, Facebook’s stock plunged close to 20%, active user numbers fell in Europe, and the “#DeleteFacebook” campaign emerged.
In addition to these macrotrends there are a few other reasons we like the data security category: 1) market size, 2) automation opportunity, 3) ML-enablement, and 4) alignment with self-serve data practices.
The large data security market has many components. We break the data security market into five main categories: 1) encryption ($4B in 2019), 2) data catalog ($0.3B), 3) data governance/access ($1.6B), 4) compliance assessment ($0.5B), and 5) Data Loss Prevention (DLP) ($2.2B).
We believe data cataloging, governance/access, and compliance assessment will converge over time and metadata management will be the foundational layer. Basic data cataloging will be commoditized since it only tells you where data is located. The best catalogs will provide data context, identify Personally Identifiable Information (PII), reconcile data points, suggest the best data structures like joins, and offer natural language search. Manual compliance assessments will disappear. Combining these categories achieves a compelling data management and governance market of $2.4B in 2019 growing to $4.1B in 2022, a 19% CAGR.
According to Mordor Intelligence, DLP is also growing quickly. The third-party research firm stated that the market was valued at $2.2B in 2019 and forecast DLP to grow at a 24% CAGR to reach $5.2B in 2023.
Software is replacing services. Previously services and consulting firms would manually identify data. We still see the heritage of this approach with compliance assessment tools. Software can now provide the same value-proposition as service firm’s efforts that often took many months.
ML-enabled differentiation. Solutions that leverage ML to discover PII and track data movement are superior to manual data tagging or compliance surveys. ML not only can be more accurate than a person, it allows for continuous identification and assessment.
Alignment with data-as-a-service. During channel checks we often hear that businesses want to empower their employees to access data themselves instead of relying on storage admins or DBAs. Strong data privacy sets the foundation for this capability as it ensures the right people are able to access the correct data at the appropriate time.
We’ve heard there are a few key product needs for data management and governance vendors. First, data mapping is the foundational layer. The product must be able to work across different data stores and databases in a multi-cloud environment. We’ve come across various implementations to do this including agent-based, agentless, and data-centric tracing. Our channel checks find that businesses should choose the implementation that is best for their environment. For example, microservices environments with polyglot backends may benefit from tracing. For more traditional monolithic app scenarios an agentless approach could be a better fit.
Second, often PII can be specific to an industry. For example, in the auto insurance industry a VIN number could be considered PII. It is important for a vendor to not only provide basic classifiers (e.g. SSN, first name, etc), but also allow the customer to add industry-specific PII.
Once the mapping is complete, customers check for the accuracy of identifying PII. According to one security leader, ML can help improve accuracy, “Accuracy is always one of the most important aspects since there are edge cases. This is where the ML and differentiation comes in to play. Supervised learning online will be huge and lets businesses be sophisticated for each customer use case.”
Finally, buyers posit data lineage is the hardest problem to solve today. One head of security at a unicorn startup stated, “PII is like a virus. Everything it touches it corrupts. We need to see what it is touching to decrease our risk.” Tracking data as it moves throughout a system and chronicling how it is processed is salient for GDPR Article 30 compliance.
DLP is also undergoing advancements due to NLP. We’ve heard it is important to not only identify key words in documents and emails, but the material’s context. This can be achieved by using NLP to assess phrases, cluster documents based on semantic inferences, and tag the documents for particular categories or sensitivities.
Below we present two data security category landscapes: data management and governance and DLP. We acknowledge it is challenging to put many of the businesses in only one category as they are rapidly broadening their scope to include additional spaces.
We cataloged ~80 companies addressing data management and governance from publicly traded corporations like SAP to growth companies like Unifi Software to startups like Privitar. Over the past three years we’ve seen numerous startups enter the space so wanted to highlight some below. Last year during RSA BigID won the Sandbox competition, a signal customers are hungry for new solutions in the space.
It is also important to note there has been a lot of activity in the DLP space. Over the past few weeks we’ve seen tier one VCs fund new DLP solutions including Armorblox (General Catalyst), Concentric (Engineering Capital), and Tessian (Sequoia).
Concerns for data security and privacy are only increasing. Regulation, breaches, and brand integrity catalyze purchasing third-party solutions. The data security category is attractive because it has a large and growing TAM; software is automating manual tasks; ML can improve accuracy and continuous assessment; and it is a requisite for data-as-a-service. There are a cornucopia of businesses addressing data security, and we believe there is an opportunity to build a large, enduring company in the space.