Expert’s Corner with Carolina Christofoletti
Carolina Christofoletti is a Child Sexual Abuse Material (CSAM) subject matter expert, with vast experience both in CSAM Intelligence and CSAM Trust & Safety. She currently leads the CSAM Blockchain Intelligence efforts at TRM Labs as part of the Threat Intelligence Team.
Her career includes an extensive CSAM academic background starting in 2017 at University of São Paulo — where she received an Alumni Excellence Award and a Law Degree. She holds an L.L.M on International Crime, Transitional Justice and Peace Procedures from Universidad Católica de Colombia (Colombia), a masters degree in Criminal Compliance from the Universidad Castilla La Mancha (Spain) and a masters degree in Cybercrime from the University of Antonio Nebrija (Spain) and is an L.L.M candidate for Digital Forensics at IPOG (Brazil).
Carolina Christofoletti is currently assigned as a CSAM researcher at the IGCAC Working Group, where she researches CSAM networks in depth and in partnership with national and international law enforcement agencies as well as at the Institute for Advanced Studies at the University of São Paulo (USP), where she researches CSAM Trust & Safety. She also serves as a Consultant on CSAM Trust & Safety matters.
1. In a recent Linkedin post, you said “Image databases will not solve the CSAM problem.” Could you explain more about what you mean by that?
CSAM Image Databases are bound by a logical contradiction.
Let’s suppose that:
- ID1 is a CSAM database (‘known’ is defined as existing in ID1).
- Image A is a CSAM file that is part of ID1 (‘known’ CSAM images)
- Image B and Image C are CSAM files that are not part of ID1 (‘unknown’ CSAM images)
- Video A is a CSAM Video (CSAM media we know nothing about)
(1) In order for a Image A to be added to ID1, it must first be found — and found in a manner that meets certain requirements, which is to say that Image A must be found (either directly or indirectly) by someone who has access to ID1, and who is willing to add it to the database. This someone is either a Law Enforcement Agency (Image A being thus part of a seizure) or a CSAM Hotline (Image A being often found by someone else and reported to the organization).
(2) It rapidly becomes quite clear that ID1 can only be useful if Image A is ever found and if the number of Image As does not exceed the scalability of ID1. And here lies the bottleneck:
The mere existence of ID1 has the harmful, unintended effect of obfuscating its main dependency, which is exactly a proactive approach for CSAM detection.
Instead of focusing on Image A (CSAM files flagged by ID1), we should be focusing on Image B and Image C (CSAM files that easily bypass it).
(3) Without constant updates, ID1 rotates in circles — as a CSAM image database serves (without further additions) only to identify known CSAM files (Image A), having no power over the unknown CSAM files (Image B and C).
Image A can only empower the discovery of Image B and C if:
- Image B and C appear before Image A and in the same digital environment, being still discoverable at the time Image A is detected. (linear condition), and
- If Image A is used as a review flagger, rather than as a ‘tool’ for simply locating and taking down known CSAM files.
(4) Because CSAM criminals do not report their sources and because the research around proactive detection is still poorly explored, Image B and C will hardly ever enter ID1.
Moreover, Image B and C might not be reported to or be found by a CSAM Hotline or a Law Enforcement Agency with access to ID1.
Even if Image B and C are actively circulating on the Internet, the fact that they circulate only among a closed groups of offenders who never report them, the fact that Image B and Image C are so severe that someone can fear their own prosecution by reporting it and the lack of a ‘proactive’ approach of CSAM hotlines to really ‘look for those files’ might, often, eliminate the chances of having Image B and C ever inserted into ID1.
(5) Even if Image A, Image B and Image C have the potential of being added to ID1, the same possibility isn’t available for Video A — since ID1s are image databases, which do not include CSAM videos.
A possible solution for this case would be extracting Image A (a CSAM scene) from Video A and enable the detection per similarity. Although the technology for this process already exists, it unfortunately remains poorly explored up to this point.
(6) Even though online platforms often expect Image A to be directly uploaded into their platforms (in order to trigger ID1), this hardly ever happens.
A quick observation into CSAM networks will show that the way Image A commonly enters online platforms is through an external link. The finding of Image A depends, as such, on the opening AND scanning of those links.
And here, the problem begins, as CSAM criminals tend to look for file hosting platforms with poor or low integration with ID1, and CSAM Trust & Safety teams are not equipped with CSAM detection tools based on ID1 or that enable automatic detection in this case.
At this point, we realize how dependent on ‘opening the links’ and ‘human reviewers’ CSAM Trust & Safety still is. A possible solution for this would be working on ‘risk scoring’ metrics for unknown links.
2. The EU recently beefed up their regulations to ensure online child safety. How important do you think government regulations are or not to online child safety?
The main risk of any regulation is getting the “what to regulate” problem wrong. This is also true for the upcoming EU CSAM regulation.
Once an industry standard is set by regulation, and which online platforms must comply with, CSAM Trust & Safety managers become extremely resistant to greenlighting a different approach to tackle CSAM threats by the teams they coordinate. And with this, the antilogic rotates in circles — revealing the real origin of Trust & Safety NDAs.
Since online platforms risk, with punitive CSAM regulations, having to pay highly expensive fines if new CSAM content is actually found on their sites (as ‘unknown CSAM files’ detection always happen after the CSAM files were uploaded to the platform), efficient CSAM approaches are sacrificed for the sake of compliance with the given regulation.
And this highlights how the proposed EU CSAM regulation can fail miserably, and how it merges itself with the ID1 paradox.
After all, as long as ID1 remains a CSAM Trust & Safety Industry Standard, proactive CSAM detection will have no place in CSAM Trust & Safety teams and CSAM numbers will keep being artificially generated (“Hey, that’s what’s in the database”).
Even though CSAM networks are highly specialized (for example, Image A may never appear together with Image B, even though this is on the surface “imaginable’’), Image A is hardly ever analyzed in its CSAM network context; moreover, the context is also not captured by ID1.
Since Image A, Image B and Image C are simply and independently added to ID1, a second state of blindness regarding CSAM databases is created — forcing CSAM Trust & Safety teams to require even higher levels of training, or CSAM Threat Actors will easily bypass human eyes too.
In particular, when it comes to something as complex as CSAM & Online Platforms interaction, we must be extremely careful with any ‘harmonized’ regulation. CSAM networks are not organized in the same way inside Instagram as they are in Twitter. TikTok and Snapchat present CSAM threats that are very different from each other, and so on and so forth. And this creates an issue for any regulation aimed at establishing “common Trust & Safety practices” anywhere.
Additionally, CSAM networks have shown a great ability to change faster than laws do. This means that, faced with a mandatory practice, Trust & Safety Teams risk being forced to comply with regulations that, soon, will not make sense anymore for detecting the latest violating content.
For me personally, I think that ID1 solutions (the very core of the new EU proposal) belong in the Trust & Safety tool chest that, even though helpful, don’t deserve to be on the Unicorn’s Throne (“magical solution”) anymore.
Even though platforms such as Facebook, Twitter, TikTok and others constantly scan their platforms against ID1, the fact that CSAM networks exist within them under the radar shows us how ID1 has become outdated for its purpose as a “magical solution”.
By saying that the CSAM problem can be solved by scanning all other non-Big Tech platforms and matching them against a database of known CSAM files, the EU indirectly argues that Big Tech is not the main target of this regulation — but rather a small “Platform X” whatever that may be.
The specific CSAM threat posed by “Platform X” which MUST BE under EU jurisdiction is yet to be demonstrated while, parallel to that, a quick analysis of CSAM networks points to Asia, Oceania and Arabia as the preferred ‘jurisdictions’ for CSAM hosting — despite this, CSAM hotlines refer primarily to the EU and the United States.
This is only an example of the contradictory data we need to resolve, highlighting once more the urgent need to really research the data before proposing any solutions to solve the problem.
From a compliance perspective, the new EU proposal gives us the weird impression of a compliance program written without any risk assessment — a logical contradiction through which we derive conclusions without having ever established premises. See, for example, that the creation of the EU Center (premise generator) and the CSAM scanners are meant to be simultaneous, while the first should have been used mainly to validate the adequacy of the second.
Until now, the risk assessment documentation, the document that says ‘what is really going on’ with the targets that the EU intends to regulate is missing (e.g. what is the real problem with Instagram). The fact that we keep supporting CSAM regulations with numbers we know nothing about should cause an overall discomfort.
3. If Web3 is the next big thing, what do you think of the metaverse with regard to online child safety? Is there a greater risk around child grooming?
The metaverse will only surface the fact that child safety is an issue much bigger than just ‘known CSAM files’.
The hyperrealistic metaverse avenues give us the false impression that we are facing a new, or even an increased child safety threat. Instead, what we are facing is a “new wine in an old bottle.” Because the wine label is now more colorful, more visual, we tend to pay more attention to it. But that’s all it is — -a new label.
The bad news is, the prototype for the metaverse child safety threat already exists on online platforms. It’s only the lack of research and proper detection tools that keep this threat in “silent mode’’.
A quick look into online platforms’ channels will rapidly show us that CSAM Threat Actors are already explicitly organized, managing their CSAM networks together, near child profiles on some of these platforms. This is exactly what child safety advocates should be expecting in the metaverse.
What causes some degree of discomfort in the metaverse is not the graphical world it brings with it, but the fact that “CSAM networks moving in tandem” have been and are (with the ID1 as the only industry standard) well below CSAM Trust & Safety teams’ radars.
The fact that metaverse worlds will allow us to actually see CSAM perpetrators walking their avatars around children’s accounts once, twice, twenty times a day doesn’t make CSAM Threat Actors more threatening — it actually makes children safer.
What was an internal log visible only to Trust & Safety teams (e.g. the threat actors movement) now becomes a phenomenon easily seen by whoever is around. The question now depends on the efficiency of ‘reporting buttons’ and ‘review metrics’.
Contrary to common opinion, I personally believe that it is precisely the metaverse — because of the level of transparency afforded by its visual nature — that will take the reins of child safety for the next decade. Metaverse visuals might be the thing that instigates the “minimal duty of care” standards, and I would be very happy if that finally happened.
4. Given your experience as a CSAM researcher, what advice would you give to online platforms?
Child Sexual Abuse Materials (CSAM) networks must be understood as what they are: A social science phenomenon.
From a CSAM research perspective, this highlights the hidden potential of qualitative methods (“hows” instead of “how many”) such as network analysis for Trust & Safety governance.
Blindly reviewing numbers hasn’t been of any help up to this point. The reason is a partnership with CSAM Researchers (external and internal ones), in order to understand what the proper research questions are, is always advisable.
If I could provide a single CSAM research insight to online platforms, I would say that the way CSAM Threat Actors interact with each other tends to provide more useful insights and a better detection system than their isolated pieces of CSAM content.
My advice would be, in this sense, to better research the birth, development, and death of CSAM networks that operate inside online platforms’ own channels — moving thus towards a proactive CSAM Trust & Safety detection approach that is based on predictive metrics derived, precisely, from these social science scenarios.
5. Organizations like IWF and Thorn have been created to protect children from online sexual abuse. However, for a cause that could be a global initiative, these organizations can sometimes take a proprietary posture towards their tools and data. With elimination of online child abuse being such an important problem, do you think we can get to more collaborative and distributed solutions?
Proprietary postures towards CSAM tools and data can be better understood, in the case of NGOs, by a proprietary interest towards financial sponsors (hereby named “Data Sensitivity”) — rather than a wise, thoughtful decision to protect children.
Proprietary postures towards CSAM tools and data can be better understood, in the online platforms case, by the legal interest of compliance (hereby named “Data Privacy”) — rather than a wise, thoughtful decision to protect children.
Similar to the point about the proposed EU regulation, “proprietary” tools subjected to no external review also risk getting the problem wrong.
For example, even though known identifiers used by CSAM offenders and CSAM victims would have been extremely helpful for content moderators to proactively identify CSAM, Threat Actors can bypass online platform’s traditional “image-based CSAM detectors”, but “CSAM context” is hardly ever the focus of those proprietary tools.
And here we can easily see how the issue spins, once again, in circles. Because CSAM databases are such a sensitive topic, those tools become not only proprietary, but “Law Enforcement only”. But where and how does Law Enforcement work start? We absolutely need this context, and we absolutely need to integrate content moderators in this discussion. You might agree with me — as mentioned above — that with a ID1, law enforcement does not go much further — and here is how the second bottleneck is created:
Content Moderators cannot be properly trained on how to recognize a CSAM threat — as the tools are ‘highly sensitive’ — and Law Enforcement cannot proactively navigate online platforms for leads as they do not know how to find the networks. A lose-lose scenario, in which we clearly see the harm caused by three parallel worlds (NGOs, Law Enforcement Agencies and Trust & Safety Groups) that should have never started working against each other.
Having Trust & Safety teams interacting with CSAM NGOs only as ‘anonymous readers of public (thus filtered) research reports’ has the unfortunate effect of turning what could have been a highly impactful CSAM “Trust & Safety Trends & Issues” Research Report into one that hardly ever meets the actual Trust & Safety Research needs.
Even though I agree that CSAM is a sensitive topic, I disagree that Trust & Safety teams must be trained, for example, with the help of blind hash databases whose context they have no idea of or with the help of redacted, filtered, blurred CSAM Research Reports.
A more transparent approach to CSAM Trust & Safety training — benefiting from the qualitative findings of those NGOs — is, as such, advisable.
In the era of artificial intelligence, we’ll learn how to manage and further develop synthetic datasets. A cooperative approach to training and a distributed, synthetic approach to CSAM data is, as such, my added “how” comment to a “yes” answer.