A SOC-MSSP guide (1 of 4)

21 min readNov 12, 2023

From reporting malicious activity to catching it as it happens

Part 2: https://medium.com/@7rm1ef8/a-soc-mssp-guide-2-of-4-f4fb93be2422
Part 3: https://medium.com/@7rm1ef8/a-soc-mssp-guide-3-of-4-658e0cf99745
Part 4: https://medium.com/@7rm1ef8/a-soc-mssp-guide-4-of-4-a78779d830dd

Summary
1. Introduction
1.1. Definitions
1.2. SOC/MSSP vs CERT/CSIRT
2. Human aspects
2.1. Managing a SOC
2.2. SOC analysts’ jobs
2.3. SOC analysts’ needs
2.4. Understanding the profiles
2.5. Conclusion

1. Introduction

The idea for this guide came out of the deep frustration of knowing what a SOC could do while witnessing what SOCs and MSSPs actually do, seeing what they claim to do — internally or to their customers, respectively — and hearing the ideas for improvement actually approved by the management.
When its needs are met and all the aspects of the SOC are correctly built, the SOC is an incredibly powerful defensive entity that can identify and stop — directly or indirectly — internal and external threats from damaging the environment.
However, if these needs are not met or if the SOC is poorly built in some way, then it can appear to be very ineffective and a waste of money from an external — executive, for example — point of view.
From an internal point of view, this leads to high frustration for the staff because they know that and how they could do better, but they are limited doing unattractive work. Most of the time they would try hard until they give up and go for a more attractive job, like in a CERT or CSIRT — where if someone talks to them, something actually happens, so they feel valued.

Many SOCs and MSSPs as they are currently built focus only on the time constraint for detection and response. This is indeed the most important one, because the more time passes as an attack goes undetected, the more damage will come to the company. However, these SOCs were built based on other SOCs from a time with fewer security needs, or at least concerns, and to stay competitive in a market with an economic model from that time. Times have changed, technologies and people as well, and now these SOCs and MSSPs struggle to stay relevant from a security point of view. Even more concerning, they have a hard time hiring people, and an even harder time keeping skilled analysts, as the way they are built creates unattractive work. Some of them try to distract analysts with task rotation, or some interesting project in parallel to their main missions. This isn’t helping the analysts, who can only stand it for so long, and isn’t helping these SOCs or MSSPs either, because they are stuck in a model belonging to the past.

Most of the current guides for building SOCs explain what to do and what not to do to achieve a SOC or MSSP as they exist today and while they usually cover a lot of the aspects — especially for the bigger guides out there — they fail to address all of them with sufficient details to actually build something that will have a strong enough base to go all the way.

One of the main goals of this guide is to explain in a simple — i.e. not deeply technical — way to executives what a SOC actually does and how to improve it so that all — executives, management and SOC staff — can benefit from it.
In the faint hope of general improvement and at least for another reference to which Blue Team members could point, this guide was born.

This guide explains the following:

The differences between a SOC and a MSSP across human, financial, operational and technical aspects
The goals, needs, strengths and weaknesses of the current models of MSSP
What is needed to build a SOC or MSSP that can scale up in time
What to improve to maximize a SOC efficiency to detect and capacity to respond to threats

1.1. Definitions

As the security landscape is constantly evolving, new teams, jobs, services and tools are created and the definitions change, sometimes to the point where differences become almost philosophical.
Therefore, writing some definitions at the beginning of this document will make it easier to understand as it won’t be anchored in time.

1.1.1. SOC

A Security Operations Center (SOC) is a combination of people, skills, tools and processes that monitors an environment for cybersecurity threats.

Its main goal is to detect anomalies occurring in the environment in order to respond to them in the proper way. The response comprises:

an investigation to determine whether the anomaly was a threat and its extent if it were
a remediation, if there were indeed a threat and if this task had been attributed to the SOC

The SOC needs to know and understand the environment it monitors well in order to correctly detect anomalies and to respond to them.
This knowledge and understanding includes everything from the buildings in which a device is connected to the environment, to the people in IT teams, and to the actual business of the company.

In order to detect anomalies, the SOC uses artifacts produced by the different devices in the environment it monitors. The majority of these artifacts are logs created every time an event happens by the applications running on the devices.
Every source generating artifacts useful for the SOC is called a sensor. Therefore, a sensor can represent a piece of hardware, such as a physical firewall filtering network traffic, or a piece of software, such as a web server hosting websites.
In order to speed up and facilitate the detection of anomalies, the logs created by the sensors should be gathered and centralized into a SOC’s most powerful tool: a SIEM. The SIEM is, in a way, a database storing the logs, on which specific queries are run to single out anomalies.
When an anomaly is detected, the SIEM creates an alert, and the SOC’s response starts. An analyst investigates the anomaly to determine whether there is a security incident, and if the actions are malicious. The analyst logs all the findings in the SIRP for traceability.
The SIEM and the SIRP are the two main tools of a SOC, and have been for quite some time. Once these tools are in place, the next step is to use a DevOps platform, and then a SOAR. The former focuses on infrastructure and detection improvements, while the latter enables automated response.

The tools and the inner workings of a SOC are discussed at length later in the document.

1.1.2. MSSP

A Managed Security Services Provider (MSSP) is an entity that sells different security services to its customers — as per Gartner’s definition — usually a SOC is included in these services.
Now, Gartner’s definition of MSS uses the customer’s point of view, whereas this document presents the MSSP from the provider’s point of view — while of course taking into account the customers’ needs.

Focusing on the SOC service — which is the biggest part in all aspects — of an MSSP, the goals, needs, strengths and weaknesses of the MSSP are more or less the same as those of a SOC, multiplied by the number of customers to which the MSSP provides services.

The main goal of the MSSP is — like a SOC — to detect anomalies occurring in the environment, but most of the time the MSSP will not have any means of remediation in its customers’ environments.
Therefore its response will be limited to communicating to its customers an incident ticket that comprises:

a context — i.e. how and why the anomaly was detected in the first place
an investigation with facts, analysis and conclusion
remediation recommendations — where an internal SOC would be able to remedy on its own
impacts explanations — to help the customer understand the risks and prioritize their actions

The needs of an MSSP are the same as those of a SOC in terms of understanding the environments it monitors but to minimize cost and maximize efficiency, the MSSP tries to mutualize as much as it can across its customers (people, hardware, software, templates, etc).
Of course, this is quite a lot harder to achieve as an MSSP because a strong knowledge management is mandatory. Indeed, poor knowledge management will result in an amount of time loss growing at a larger rate than the number of customers — i.e. the difficulty to find a specific information grows at a larger rate than the volume of data if the data is unstructured.
This time loss will, in turn, impact the efficiency of the analysts and this will result either in a loss of productivity or quality of the work done, or even both — not to mention the mental impacts on these analysts.
Finally, this will show up in the finances one way or the other: less margins because the ratio of analyst-to-customer grows faster than the number of customers for the same quantity/quality level, customer loss due to dissatisfactions on quantity, quality or price paid for the level of service, etc.

The “bad news” on the needs of an MSSP are met by equally — arguably greater — “good news” on its strengths. Where a correctly built SOC is a powerful defensive entity, a correctly built MSSP can simply be the best defensive entity, period.
From a detection perspective, having multiple, different environments with different sensor types and editors is a gold mine for detection use cases ideas and implementation: when built correctly, any upgrade on detection made by the MSSP can instantly be applied to all its customers, therefore mutualizing all the R&D efforts — which leads to better skilled analysts, reduction in overall R&D costs, customers satisfaction, etc.
From an investigation perspective, a higher number of environments is also beneficial. The R&D mentioned above will translate into better tuned detection rules which will fire fewer alerts that end up being False Positives (FP, i.e. alerts that shouldn’t have been fired in the first place). This directly lessens the time spent on “pointless” investigations, so the analysts work on more pertinent ones, increasing their skills and motivation- this also means that the analyst-to-customer ratio goes down as there are fewer alerts per customer.

If these needs are not met or if the MSSP is poorly built, then, the cost for the MSSP would be higher and the attractiveness for its analysts lower, to the point where it would be more cost efficient for the MSSP to maintain one SOC per customer and just be, in fine, a juxtaposition of SOCs. This juxtaposition of SOCs is the complete opposite of the mutualization wanted by an MSSP.
In this worst case scenario, the MSSP would appear ineffective and a waste of money internally, but also for its customers. Its skilled analysts, like in a SOC, would eventually give up to find a more attractive job in a CERT or CSIRT.

1.2. SOC/MSSP vs CERT/CSIRT

While today’s SOC and MSSP are mostly focused on detection and investigation — with sometimes means of remediation — the incident response is the main job of a Computer Emergency Response Team (CERT) or a Computer Security Incident Response Team (CSIRT).

Because of the struggles of today’s SOC and MSSP — briefly mentioned before — the attractiveness of a SOC analyst job is less that than of a CERT analyst: from a technical perspective, it is both more challenging and rewarding to work on an actual incident caused by a malicious actor than to chase False Positive alerts from misconfigured detection rules.
Moreover, CERT analysts have a real sense of purpose in their job as, again, they work on actual, tangible issues where SOC analysts perform repetitive tasks without much hope for improvement.
Adding up all of this to the fact that the mindset and skills needed for a SOC or CERT analyst are very similar results in the creation of a vicious circle:

A CERT is more attractive to analysts than a SOC job-wise, so higher skilled analysts tend to end up in a CERT.
Higher skilled analysts leave SOCs, therefore lowering the overall SOCs skills, hence their overall capacity to properly perform their tasks.
SOCs do not perform as well as executives had hoped, so a highly skilled CERT is needed as “last resort” when — and not if — the SOC does not detect and react in time and a crisis arises.
Security funds are put into the tools, training/certifications and salaries of CERT analysts to make sure they are up to the task when it comes.
Since the funds allocated to security are finite and prioritized towards the CERT, the SOC budget is that much smaller, therefore the tools, training/certifications and/or salaries of its analysts are lower than what they could be.
A lower-budget SOC means a less mature SOC, directly translating into less attractive tasks for its analysts, which, at last, deepens the attractiveness gap between SOC and CERT.

From this observation, the — somewhat controversial — statement that good CERTs exist and are needed only because bad SOCs and MSSPs exist could be made.
It could therefore be argued — and more or less is in this guide — that a good SOC or MSSP would not only reduce the need of a good CERT, but also minimize the consequences of actual incidents as the earlier they are caught and dealt with, the less they directly and indirectly damage the environment, hence the company.

In order to avoid any misunderstandings: there always will be a need for security detection and for incident response, but it would be better for everyone if the entities tasked with defensive security could be more proactive and not only post-mortem reactive.
It has long been clear that it is better for everyone to detect, catch and manage the fires as they start rather than waiting until they have burned everything to the ground to investigate the cause and rebuild everything.
Why not do the same with security incidents?

2. Human aspects

This chapter tries and explains as simply and clearly as possible the tasks of SOC analysts, the skills required — both hard and soft, their needs and the most common mindset this particular population has.

In order for the SOC to consistently attain the goals set by the executives and since the actual intelligence and effectiveness of the SOC is defined by its crew, understanding the humans behind the job titles or employee numbers is of utmost importance.
Doing so at all times, whether it be when recruiting, managing or planning mid/long term strategies is the only reliable way to ensure the best efficiency possible.

2.1. Managing a SOC

Every company, private or public, has goals to achieve and whatever the motivations or how often or pertinent the goals are, they mark the desired state and time available to reach it from the current state.

Once this is clear, one or more strategies are imagined and one of them is adopted. Each strategy requires actions to be done in a timely fashion and those actions and timetable can be broken down into resources. The human resources each have a set of skills and a workload capacity that will, according to the strategy, enable the company to reach its desired state in time, therefore fulfilling its goals.
This is all very high level because it is not the point of this document but the important thing is that company goals ultimately translate into human resources (and other material needs).

From a high level — executive/HR — perspective, a SOC (or MSSP) is, like any other department, viewed in terms of goals and resources.
However, as explained in the following subchapters, the hard skills required to do most of the tasks in a SOC, the soft skills needed to work efficiently as a team and the mindset to keep calm under pressure combined with the current (most of the time) amount of “busy work” compared to “interesting work” and the lack of trained professionals make it so that one SOC analyst is not equal to another one and they have plenty of available opportunities that literally present themselves to them on a daily basis.
This means that it may be easy-ish to attract SOC analysts from other companies because they may not be happy with their current situation, but it is actually really hard to keep them.

The amount of turnover is crucial in a SOC or MSSP because they both need strong teamplay in order to perform well — or at least to the level expected by executives — so the overall skill of a SOC is far from being the mere sum of the skills of its members.
Therefore, to maintain a strong cohesion, there must be a constant clear majority of members that form a stable core which can absorb newcomers and remain as-is even when it loses people.
In other words, if you replace one player of a sports team without the team having time to train as a whole — even if you take the best player, a well trained team’s performance may not vary much. Now, if you replace 20–30% (the more, the worst) of the players over a short period, then it is foolish to expect the team to stay at the same level.

This is why good management is needed. In this context, the human component has to be taken into account, as simply managing human resources will not cut it.
Each member of the SOC — analyst or otherwise — has to be understood or at least heard and compromises must be made so that both parties win — that is to say the company and the employee.
In the best case scenario, the SOC managers are actually leaders, the difference being that a leader leads by showing the example and inspiring his/her subordinates where the manager leads by distributing resources wherever needed to attain the objectives set. These leaders would naturally take into account the human component in their job and come to HR/upper management with solutions in the form of compromises.
Most likely, the SOC managers are “regular” managers and they would inevitably be facing issues by not hearing or acting on their teams’ needs and complaints.
In both cases, HR has to hear either the compromises or the issues and help the SOC managers find and settle compromises that would benefit everyone — this means that the employees also need to see an actual gain; it does not have to necessarily be a big one, but they need to see and know that the company is willing to move a step in their direction.

All in all, the most important thing to remember is that upper management and HR have to clearly identify the type of managers the SOC has and help them build and maintain a strong cohesion by listening to and hearing the people to find compromises.

This may be a bit of a catchy phrase but it completely applies in this case: it’s time to put the “H” back in “HR”.

2.2. SOC analysts’ jobs

Depending on the SOC or MSSP size and the missions it is expected to perform, the exact jobs and tasks may vary. The table below shows an example of the most common tasks, the skills they require and jobs that can exist by assembling these tasks.

Example of SOC jobs with associated tasks and skills

The jobs and tasks detailed in the table are merely examples of those that can exist in a SOC or MSSP. The tasks that add up to a job are always based on the needs, which depend on the context — and so does the number of people needed per job.
The hard skills shown in the table are not exhaustive, they are limited to the “security” aspects which are the essence of a SOC. Keep in mind that skills such as “Digital investigation” require both the knowledge and know-how of a digital investigation and the knowledge of how the underlying environment actually works. For example, to determine if a behavior is legitimate on a Windows system, one must know how Windows works to know what to look for and if such a chain of events would more likely be the result of malicious or legitimate intent, once the proper artifacts are gathered.

Ultimately, the jobs and tasks have to be tailored in a realistic way, meaning that there is actually a team member that can take up these tasks and has sufficient time to complete all their assignments. This is especially true for the technical skills needed in the long run: a “does it all” position will result in mediocre performances for the person holding it and/or a high amount of time (and money) spent training.

2.3. SOC analysts’ needs

The tasks and jobs presented above require diverse skills, including highly technical ones.
This is due to the fact that in order to find anomalies, qualify incidents and perform remediation actions, a SOC (or CERT) analyst has to have a deep understanding of “normal operations” to single out the “abnormal operations” and cut out the malicious activity, preferably without interrupting production.
This is true for every asset monitored by the SOC, hardware or software, systems, networks, Cloud based activity, IoT, OT…
Just to be clear: it is nowadays obvious for everyone that each item of the previous list would need a different person (or team) with special skills to deploy or maintain, but everybody expects the SOC to monitor all of these and promptly and correctly respond to any and all malicious activity.

Well, if it is not expected for one person (or team) to manage multiple scopes because each one has its own specificities, then how can it be expected for one person (or team) to understand — meaning knowing why this or that was done this way and how it should normally behave — multiple scopes and be able to restore of fix whatever may have been damaged or misconfigured?
The simple answer is that it is not humanly possible for one person to have, at all times, an understanding deep enough of all those things to perform the job expected of a SOC.

This is why a SOC is mainly teamwork and why a SOC analyst needs to spend most of his/her time reading through documentation, understanding what they are looking at and comparing the behavior — mainly a sequence of events — of what they are investigating to a baseline of a “normal” behavior.
For the bigger SOC (MSSP) out there, there can be people specialized in one specific field who are called whenever needed, but for the majority of SOC and MSSPs, the analysts have to be able to do most of their investigations and remediations by themselves with the occasional support of other colleagues.

Therefore, the SOC analysts have some needs in order for them to perform at the expected level:

A well organized, up-to-date, very accessible knowledge repository enables all SOC personnel to minimize the time and effort put into open source research. For this to work, every analyst needs to contribute — this would only happen if the repository is well structured and accessible and if the SOC managers keep showing and reminding everyone that it actually helps a lot. For bigger SOCs/MSSPs, it is strongly advised to use the services of a knowledge manager whose actual job is to make information easier to access and update.
Regular and recurrent security training — internal or external — is very important to maintain a high level of awareness for analysts, build up cohesion and individual skills. The more the tasks given to the analysts are repetitive and “basic” (not technically advanced), the more regular training is needed.
Security training is fine, but as stated before, the security part is only the tip of the iceberg. The SOC analysts also need training for every scope they are tasked to monitor in order to understand them and more importantly differentiate normal from abnormal. All SOC analysts do not need to follow all training, but there have to be enough analysts trained for every scope at all times so that the knowledge is there — both up-to-date in the knowledge repository and in the minds of some analysts that are currently working. This ensures that the SOC responds correctly and in a timely fashion, whatever the security issue may be.
Finally, there need to be regular crisis simulations — both advertised and not advertised as an exercise — so that every analyst has experienced the pressure and the atmosphere of a crisis and that everyone knows their place and tasks in “crisis mode”. This is the only way to make sure that when an actual crisis happens, there is only a cybersecurity issue and not also a simultaneous “oh my god what are we supposed to do”/”headless chickens running around” issue.

Usually, the further down the list, the less likely it is that the point is being well addressed. Therefore it is important for upper management, HR and SOC managers to understand and keep in mind that the SOC analysts need all of this to correctly do their jobs.

2.4. Understanding the profiles

The SOC managers have to know the profiles or their analysts in order to make the best of their teams.
Knowing what to aim for for the SOC as a whole, what the needs for the analysts are, what tasks can be given, what skills they require and how they can be added up to make coherent jobs is good — arguably better than what exists in some places today — but it is not enough.

To truly maximize the efficiency of the team members and, in the end, that of the SOC itself, there has to be a job and a place for everyone in the team and everyone in the team has to have a job and a place.
In other words, the company and the employee both need to be happy with the tasks assigned and the job done by the employee — this often means compromise.

A good way to reach an acceptable compromise is for the SOC managers and HR to understand the profile of the analyst — i.e. what makes them tick, explain to the analyst what the SOC needs and figure out together how to join both ends. If there is actually no common ground between what the analyst wants and what the SOC needs then it is maybe time to reassign the analyst elsewhere.
This may come across as crude, but motivation is key for the analyst to keep their skills at the expected level and for them to work well with the rest of the team. A SOC simply cannot afford internal conflicts or people that let themselves get carried by the flow, as the former simply destroys the cohesion and the latter is just dead weight that needs to be carried by the rest.

There are mostly two types of technical profiles for which a SOC or MSSP should look and one management profile. This of course covers only the SOC analysts and managers; other profiles may be needed for other jobs in the SOC.

Before talking about the management profile, here is a quick reminder that a great technician — i.e. person with high technical skills — rarely makes for a great manager or even a mediocre one, for that matter. Indeed, this cannot be said nor emphasized enough: management is not, never has been nor ever will be, an evolution of technical expertise.
A good SOC manager is typically someone curious, willing to try and understand technical subjects — at least enough to get the issues and possible solutions, with a good strategic vision and capacity for compromise. Basically, someone who is able to find a working compromise between the objectives set by the executives and the technical issues that the SOC is facing so that the SOC moves forward towards the objectives, even if the pace is slowed by technical issues.
In other words, a good SOC manager is a “Yes, but” person — meaning that everything could be achieved, but at some cost.

There are two main types of technical profiles that can be of use in a SOC.
The first one is the motivated and curious type: they most likely went into cybersecurity out of passion or interest for learning. The people with this profile are always asking “why” and “how” questions about every subject so they can truly understand what is happening. These make for good and great analysts because when they are responding to an incident, they will not rest until they are satisfied that they have understood exactly what happened and how so they can properly remedy the situation. Also, they are the type of people a SOC needs to better itself: because they like to learn and keep learning, they are often people that easily lose interest in repetitive actions in the sense that for them, any and every action they have to repeat without any added value is superfluous and therefore can and should be automated. In other words, they will try and optimize everything around them so that they can focus on what they want: learning stuff.
The downside is that they need to be kept in check and reminded that there are objectives and production priorities. They indeed tend to deep dive into subjects and sometimes the dive is too deep, taking too much of their time compared to what would have been acceptable from a production cost perspective.

The second type of technical profile is disciplined and hardworking in the sense that they don’t mind — or in some cases they even enjoy — repetitive tasks in a structured context with procedures to follow to the letter. For bigger SOCs and MSSPs, they can make the bulk of the teams tasked with investigations and remediation or even MRO because the job they do and the results they produce are very stable and reliable; that is exactly how a SOC needs to be. Also, if they are assigned tasks that suit them, they tend to be quiet and generally happy. That’s why it is extremely important to listen to and hear them whenever they make remarks or propose some kind of improvement about these tasks.
The people matching this profile often have less advanced technical skills than the other profile because they tend to dislike change, either in the tasks they are assigned or if they have to learn new things. It usually takes time and energy to make them pick up new skills or change their routine.

Both technical profiles are very important for a stable, reliable SOC that keeps on improving both the quality of its detection and response and the quantity of incidents per unit of time it can handle.
Of course and again, good management is key to put the right profiles on the right jobs, have all the profiles synergize well within the SOC and accurately convey the impacts of technical issues up the chain while explaining the “bigger picture” to the analysts.

2.5. Conclusion

A SOC or MSSP is a complex entity that requires a variety of people with specific profiles to perform how it is expected. All the profiles have to be correctly identified and positioned on the correct job, from the SOC analysts to the SOC managers. Therefore, it is important that HR helps the SOC managers manage their human resources and that both the upper management and HR pick the right kind of profile for the SOC managers together.
As a reminder: there is such a thing as technical expertise and it needs to be acknowledged — both in the missions and in the salary — and said technical expertise is the only way to improve technically.

Some of the current SOCs and MSSPs struggles are not having the right people in the right places, putting technical experts in management positions, leaving the MSSP managers battling the customers alone and/or having internal battles between the executives, HR and the SOC or MSSP managers instead of the formers helping the latter.
This leads to frustration, SOC managers leaving for another position with less pressure or a higher pay for the same amount of stress, SOC analysts corresponding to the first profile leaving for offensive or CERT/CSIRT positions and finally SOC analysts of the second profile leaving because as they stayed the longest, they are asked to do tasks they don’t want nor like.