Best AI Content Detectors for 2024

Figure out what tools can detect AI (if any)

Artturi Jalli
37 min readDec 16, 2023

This is a comprehensive review of the best AI detectors on the internet. I’ve tested every tool and here are the insights, experiments, and my valuation.

In these reviews, I assess these detectors' quality, features, accuracy, and more.

If you want to read a quicker 5-minute recap of this post, make sure to check this post.

Top Picks

Here are the top 3 picks for your convenience:

  1. Originality.AI
  2. Content at Scale
  3. Crossplag AI Detector

Disclaimer: This post has affiliate links at no cost to you. Also, I don’t believe in AI detection. But this is a list that I could put together and has the tools that worked the best when I tried. :)

1. Originality AI

Originality.ai is an AI writing software that helps you detect if your content:

  • Is written by AI.
  • Has plagiarism.

These are both important aspects of using AI in your writing.

Even though these days AI-written content looks human-written, some programs can still detect whether the content is written by AI or not. Originality.ai demonstrates this well.

Also, even though AI mostly produces unique and original content, there’s always a chance of plagiarism. This is why it makes sense to run AI-written content through a plagiarism checker to be on the safe side.

Let’s test Originality.ai to see how it detects AI-written and human-written content.

Try Originality.ai

Performance

To get an idea of the performance of Originality.ai’s AI content detector, I fed the tool:

  • 10 human-written text samples.
  • 10 AI-written text samples.

And calculate the accuracy of the tool based on these inputs.

Besides, I tried to trick the AI content detector with some easy tricks to make it think the AI-written content was human-written.

1. Human-Generated Content

As the first test, let’s try to analyze human-written content with Originality.ai.

To pull this off, I’ve taken 10 test samples from my blog posts. These pieces of content are 100% human-written. In an ideal world, the AI detector should score it all 100% original, and 0% AI.

Example 1

Here’s the input:

Output:

76% original — mission successful.

Example 2

Here’s the input:

And here’s the output from the AI detector:

4% original — mission failed.

Example 3

Input:

Output:

2% original — mission failed.

Example 4

Input:

Output:

83% original — mission successful.

Example 5

Input:

Output:

8% original — mission failed.

Example 6

Input:

Output:

82% original — mission successful.

Example 7

Input:

Output:

23% original — mission failed.

Example 8

Input:

Output:

99% original — mission successful.

Example 9

Input:

Output:

0% original — mission failed.

Example 10

Input:

Output

2% original — mission failed.

Based on these results, it’s clear to see that Originality.ai isn’t the best tool for recognizing human-written content.

In the above tests, Originality.ai only recognized 4/10 human-written pieces of text right.

But this is not a big problem. The tool is supposed to detect AI-written content. Most people are going to use this tool knowing that the content is AI-written and to make it sound less like AI.

2. AI-Written Content

As another test, let’s feed originality.ai some AI-written text samples. I’ve generated these samples using ChatGPT.

In an ideal situation, originality.ai would give a 0% original and 100% AI score. But detecting AI with 100% accuracy is hard.

For our purposes, let’s count all successful missions where the AI score exceeds the original score.

Example 1

Input:

Output

100% AI written — mission successful.

Example 2

Input:

Output

99% AI written — mission successful.

Example 3

Input:

Output

61% AI written — mission successful.

Example 4

Input:

Output

66% AI written — mission successful.

Example 5

Input:

Output

1% AI written — mission failed.

Example 6

Input:

Output

98% AI written — mission successful.

Example 7

Input:

Output

100% AI written — mission successful.

Example 8

Input:

Output

100% AI written — mission successful.

Example 9

Input:

Output

70% AI written — mission successful.

Example 10

Input:

Output

59% AI written — mission successful.

Pretty impressive! The AI detector was able to successfully identify 90% of the inputs as AI-written.

Try Originality.ai

Last but not least, let’s see if we can fool the AI detector by making a small change and get a big shift in the score.

Can You Fool Originality AI Content Detector?

I’ve tried a bunch of AI content detector tools. I’ve noticed that sometimes even the tiniest change in the input completely changes the score.

So to make 100% AI-generated content look 100% human-generated, it might be enough to just remove a single letter or add one extra word in the mix.

Let’s run some cheap tricks to see if this is also the case in Originality.ai.

Test 1: Remove a Comma

➡️ TLDR; Adding a small grammatical error did not change the output of the Originality.ai AI detector significantly.

As a first test, let’s introduce a small grammatical error in the Originiality.ai input. More specifically, I’ll remove the comma that follows the word “Additionally”:

This didn’t change the originality score significantly. It seems the detector doesn’t care about small changes and can see the big picture.

Test 2: Make a Typo

➡️ TLDR; Adding a small typo to the content did not significantly change the AI detector output.

Next, let’s try to misspell one of the words to see if such a small change would alter the AI detection score.

I’m going to intentionally write “their” as “ther” without “i” and scan the content:

Once again, this only slightly moved the score. As expected, such a small change shouldn’t alter the score by much.

Test 3: Use an AI Paraphraser

➡️ TLDR; Rephrasing the AI-written content did not fool the Originality.ai detector.

Based on a couple of tests, it seems small input changes aren’t able to change the score.

Now, let’s try something more significant.

These days, you can also use AI to paraphrase your content. One example of such a rewording tool is called QuillBot.

Make sure to read my complete QuillBot review.

Here I’ve reworded one of my AI-generated inputs. With QuillBot, making this change took only a second.

Now, let’s try to input this AI-generated rephrased sample text into the Originality.ai detector to see what happens:

Amazing! It still recognizes the content to be AI-generated. Some other tools failed this test miserably.

Keep in mind that even though I ran quite some tests, the sample size is still quite small. I highly recommend you experiment with tests like this. Change words, change punctuation, add a sentence, or remove a sentence.

Try Originality.ai

Plagiarism Checker

Notice that Originality.ai is not only an AI content detector. There’s a plagiarism checker too!

For example, here I’ve copy-pasted a part of a blog post from my other blog.

The originality.ai plagiarism score is correctly 100%. The sample I checked is indeed 100% copied from an existing blog post.

It seems to be a powerful plagiarism checker based on the few examples of duplicates I ran through it.

If you already tried Originality.ai, you probably noticed that the plagiarism checker is automatically enabled when you detect AI content.

You can uncheck the plagiarism checker if you’re only interested in checking the content for AI.

Pros

  • Accurate. Originality.AI is an accurate tool when it comes to detecting AI-written content. In my case, it spotted 90% of AI-generated content, right?
  • Detects plagiarism. There’s also a powerful plagiarism checker tool that you can use to ensure your writing is truly unique.
  • Hard to fool. You can’t just change a word or punctuation to fool the AI detector to give a better score. This seems obvious but it’s not the case for many AI detector tools.

Cons

  • No free trial. Originality is a paid tool without a free trial. But luckily, the pricing is affordable at $1 for 10,000 scanned words.
  • False positives. Originality.ai isn’t good at telling human-written content apart from AI-written text. Out of my 10 human-written samples, it claimed 6 to be AI-generated.

Final Verdict

I recommend experimenting with a tool like Originality.ai. It can help you write less AI-like content as well as be sure you’re not copying someone else’s work.

Originality.ai works best when you already know that the input text is written by AI and you want to make it sound less like AI.

But if you’re given a random piece of text, you can’t rely on Originality.ai to tell whether it’s AI-written or not.

However, unfortunately, an AI detector like Originality.ai (or any other publicly available tool) isn’t enough to tell if Google still thinks AI writes your content. They most likely use a different approach.

So if you are a blogger and get a 100% original score from Originality, it does not mean Google wouldn’t still be able to tell it’s written by AI.

Try Originality.AI

2. Content at Scale

Content at Scale is a complete AI writing tool with long-form content creation capabilities.

One of the cool features of this tool is the free AI content detector.

The idea of the AI detector is simple:

  1. Take a piece of text.
  2. Copy-paste it into the AI detector.
  3. See how likely your text was written by AI.

Content at Scale AI detector gives you a human-written score between 0% and 100%. This score reveals how much the tool thinks your content is human-written and how much AI-written.

A score of 0% means the tool thinks your content is entirely written by AI. A score of 100% means the tool thinks you wrote it all yourself.

Next, let’s put the Content at Scale AI content detector to the test.

Performance

I extensively tested the AI content detector by feeding it two types of data:

  • 10 pieces of human-written text from my blog.
  • 10 samples of AI-generated text (using ChatGPT).

1. Human-Generated Content

Let’s start with human-generated inputs.

These inputs are random samples of blog posts I’ve written myself. In other words, all these inputs should give me a close 100% score.

Example 1

Falsely detected as AI-written — mission failed.

Example 2

Falsely detected as AI-written — mission failed.

Example 3

Correctly detected as human-written — mission successful.

Example 4

Falsely detected as AI-written — mission failed.

Example 5

Correctly detected as human-written — mission successful.

Example 6

Correctly detected as human-written — mission successful.

Example 7

Falsely detected as AI-written — mission failed.

Example 8

Falsely detected as AI-written — mission failed.

Example 9

Falsely detected as AI-written — mission failed.

Example 10

Falsely detected as AI-written — mission failed.

By tallying up the scores, the Content at Scale AI detector only detected 30% of the human-written samples to be human-written.

So when it comes to detecting human-written content, Content at Scale did a really poor job.

Next, let’s try the AI-detection tool by feeding it some AI-generated content. This is the main way content creators will use AI detectors anyways — to find out if their content looks natural.

2. AI-Written Content

I generated 10 pieces of text using ChatGPT — the revolutionary AI writing model released by OpenAI.

If the AI content detector works, it should give a score close to 0% for all the following pieces of text.

Example 1

Correctly detected as AI-written — mission successful.

Example 2

Correctly detected as AI-written — mission successful.

Example 3

Falsely detected as human-written — mission failed.

Example 4

Correctly detected as AI-written — mission successful.

Example 5

Correctly detected as AI-written — mission successful.

Example 6

Correctly detected as AI-written — mission successful.

Example 7

Correctly detected as AI-written — mission successful.

Example 8

Correctly detected as AI-written — mission successful.

Example 9

Correctly detected as AI-written — mission successful.

Example 10

Correctly detected as AI-written — mission successful.

Summing up the results, it appears that the Content at Scale AI detector is quite good at detecting AI content.

In total, it correctly identified 90% of the AI-written articles as AI-written.

However, the problem is that it didn’t quite identify human-written pieces as human-written. In total, the tool was successful 12/20 times which is only 60%.

So if you don’t know in advance whether the piece of text is written by AI, there’s no way to detect it reliably with Content at Scale.

Can You Trick the AI Content Detector?

Now, let’s play some tricks with the Content at Scale AI detector to see if it’s easy to fool.

I’ve reviewed other AI detectors and noticed that small changes in the input can have a big impact on the output.

This means in the eyes of an AI detector, text with a 100% AI-written score becomes 100% human-written by a tiny change in the input — obviously not good! Small changes should only have little to no impact on the score.

Let’s see what happens when I try to fool the Content at Scale AI detector.

Test 1: Remove a Comma

➡️ TLDR; Removing a comma didn’t change the results significantly.

As a first step, I tried to fool Content at Scale by tweaking the output by a tiny bit. More specifically, I removed a single comma that follows the word “Additionally”.

This is a mistake AI these days would never make, so perhaps the tool now thinks I’ve written the content…

But here’s what happened:

The AI detector score changed a tiny bit. However, it was unphased by the fact that I removed a comma from the output.

Test 2: Make a Typo

➡️ TLDR; Removing a single letter changed the AI detector score by quite a bit.

I’m trying another similar “trick”. This time, I’ll put the comma back and instead introduce a typo by removing the letter “i” from “their”.

Here’s what happened this time:

Now the score changed quite a bit. The tool still says the content is written by AI but with less confidence. I can imagine making one or two of these changes could completely change its mind.

Based on these two tests, it seems it’s quite easy to fool the Content at Scale AI detector. Given a 100% AI-written score, just change a character or two and your content might get a 100% human text score.

Test 3: Use an AI Paraphraser

➡️ TLDR; AI-paraphrased content did not fool the tool.

Last but not least, let’s try fooling the AI content detector by rephrasing the content with AI.

I used a tool called QuillBot to re-write the AI-generated input in hopes to make it look less AI-generated in the eyes of AI detectors. In the image below, the right-hand side is the AI-generated rephrased version of the AI-generated input text.

Now, let’s feed this rephrased version to the Content at Scale AI detector:

Amazing! It did not get fooled by the paraphrasing of the input. It still knows the content is written by AI.

Based on these tests, it still seems you can pretty easily fool the AI detector. Sometimes the result remains unchanged even when using paraphrasing tools like Quillbot. On the other hand, sometimes the tool is phased by a small change like an extra word, comma, or typo.

Pros

  • Free to use. Content at Scale offers the AI detector for free. This makes it easy to use and worthwhile to try.
  • Decent accuracy. With AI-generated content, the AI content detector identified 90% of the inputs as AI-written. With human-generated content, the result was less impressive, though.
  • Other AI-writing features. The AI content detector is just one of the features of Content at Scale.

Cons

  • Inaccurate with human text. The Content at Scale AI Detector couldn’t quite detect human-written content.
  • Pretty easy to mislead. By making small changes in the input, you can fool the AI content detector into believing your content is human-generated.

Final Verdict

Because the Content at Scale AI detector is a free tool, it won’t hurt to give it a try!

The accuracy was pretty impressive with AI-generated content. But because it didn’t detect human-written text as human-written, you can’t rely on it.

Based on a bunch of AI detector tools I’ve tested, there’s not a fully reliable AI content detector. You for sure cannot use a tool like this to detect if homework or essays were written by AI.

3. Crossplag AI Detector

CrossPlag offers an AI-detecting service that is still in the testing phase. Notice that this tool only works for English content!

The tool works by following these simple steps:

  1. Copy-paste text into the CrossPlag AI content detector.
  2. Wait for 1–2 seconds.
  3. See a score between 0–100%.

The reason why such an AI detector exists is to help content creators get an idea of how bot-like the content is. Once these tools become more accurate, they can be applied in detecting AI use in homework, and similar.

Performance

Let’s run some tests to see how the CrossPlag AI detector performs.

The idea of the following test is simple. I will:

  1. Input 10 human-written text samples to CrossPlag’s AI detector.
  2. Input 10 AI-written text samples to CrossPlag’s AI detector.

Based on the outcomes, I’ll calculate the accuracy and analyze the results.

Lastly, I’m going to perform three easy tricks with which I try to fool the AI detector.

Let’s get started!

1. Human-Generated Content

First, let’s input 10 human-written text samples into CrossPlag’s AI detector.

In an ideal world, CrossPlag should show a 0% AI Content Index for each of these samples.

Example 1

99% AI written — mission failed.

Example 2

99% AI written — mission failed.

Example 3

6% AI written — mission succeeded.

Example 4

11% AI written — mission succeeded.

Example 5

8% AI written — mission succeeded.

Example 6

84% AI written — mission failed.

Example 7

92% AI written — mission failed.

Example 8

100% AI written — mission failed.

Example 9

95% AI written — mission failed.

Example 10

72% AI written — mission failed.

As a result, CrossPlag only recognized 3/10 of the human-written articles as human-written. It falsely claims that 60% of my text is produced by AI.

But because we’re talking about an AI content detector, perhaps we shouldn’t use it to detect human-written content.

2. AI-Written Content

Let’s see how CrossPlag recognizes AI-written content from the 10 AI-written samples I give it. I’ve generated these text samples using ChatGPT.

Example 1

10% AI written — mission failed.

Example 2

100% AI written — mission succeeded.

Example 3

100% AI written — mission succeeded.

Example 4

88% AI written — mission succeeded.

Example 5

100% AI written — mission succeeded.

Example 6

100% AI written — mission succeeded.

Example 7

69% AI written — mission succeeded.

Example 8

100% AI written — mission succeeded.

Example 9

93% AI written — mission succeeded.

Example 10

98% AI written — mission succeeded.

In total, CrossPlag recognized 9/10 of the AI-written samples as AI-written. This is quite a good score, to be honest!

The tool performs well when it needs to detect AI-written content. But it doesn’t work when you’re using it to recognize human-written content.

If you know you’ve written the content with AI, this is not a big problem. But if you’re using CrossPlag to check the content that you don’t know in advance, you can’t trust it.

Can You Trick the AI Content Detector?

Let’s see if we can trick the CrossPlag AI content detector to pass the AI content detection.

Test 1: Remove a Comma

➡️ TLDR; Introducing a small grammatical error didn’t affect the outcome of the AI detector much.

Let’s make an ever-so-slight change to the text by introducing a grammar mistake in it and see what happens.

I will remove the comma after the word “Additionally”:

Awesome! The tool didn’t worry about this change too much. It still believes the content is mostly written by AI — which is indeed the case.

The reason I wanted to test this is that some other AI content detectors went from 0% human-written to 98% human-written by removing a single comma from the content.

Test 2: Make a Typo

➡️ TLDR; Removing one character from the text dropped the AI Content Index from 100% to 58%.

Removing a comma didn’t affect the AI detector’s outcome by much. But for good measure, let’s run another similar test.

This time, let’s introduce an intentional typo in the content.

For example, let’s remove the letter “i” from “their”:

This time the AI index changed quite drastically.

This change kind of makes sense — AI writers these days don’t make these types of mistakes. But it emphasizes this kind of typo too much, in my opinion. I would think the score would go from 100% to 95% at most. But not to 58%.

Based on this simple test, it’s quite easy to trick the CrossPlag AI detector.

On the other hand, now the text has a typo. In other words, this is not the most practical way to pass an AI content detector test.

Test 3: Use an AI Paraphraser

➡️ TLDR; Paraphrasing the AI-written text did not change CrossPlag AI content detectors' mind.

Let’s run one more test to try to fool the AI checker.

These days there are quite powerful paraphrasing tools that are also powered by AI. An AI paraphrase takes your text as input and spits out a re-worded version of the text.

One example of such a paraphrasing tool is QuillBot.

I’ve written an entire review of QuillBot. Make sure to check it out in case this sounds interesting.

Anyways, when testing other AI content detectors, I’ve been able to fool them with this trick. Let’s see what happens with CrossPlag’s AI detector.

First, here’s a paraphrased version of one of the AI-written samples.

If the detector works, this should be marked as AI-written content.

Great, CrossPlag still flags it as AI-written content!

Based on these simple tests, it seems to be more difficult to trick the CrossPlag AI-checker than e.g. writer.com’s AI detector.

CrossPlag also was confused by removing a single letter from the AI-written text and reduced the AI-generated score quite a bit.

However, more rigorous testing is needed to get a better idea of the performance.

Pros

  • Free to use. CrossPlag AI content detector is free to use. After 2 detections you need to sign up for the service, but you can continue using it for free.
  • Accurate in detecting AI-written content. Based on the 10 AI-written samples, CrossPlag did a stellar job as it detected 90% of the content to be written by AI.

Cons

  • Makes mistakes. CrossPlag cannot tell if human-written content is written by humans or AI. This is problematic if you don’t know who has written the content in advance.
  • Doesn’t reflect Google’s AI detector. This is important for bloggers. If Google uses an AI detector, it probably works entirely differently than free online tools. So don’t use AI detectors and think “Now Google won’t detect my content as AI-written”.

Final Verdict

I think CrossPlag’s AI content detector is a great showcase of how seemingly well-written AI-generated content can easily be detected as AI-written.

You can use this type of tool to get an idea of the content. But CrossPlag is by no means accurate. If you don’t know whether the content is written by humans or AI in advance, you can’t trust the tool’s score.

For bloggers who generate blog posts with AI, this is a good reminder. Google will probably be able to point AI-written thin posts out quite easily. Although Google probably uses an entirely different method.

4. Writer.com AI Detector

To put it short, writer.com AI content detector checks if your text is written by AI. It gives you a human-written score between 0–100%.

If the text is 100% human-generated, then you’re good.

If the text is less than 80% human-generated, the tool suggests you edit the text until you get a score of 80%+.

The tool is free to use. All you need to do is copy-paste text content to the writer.com text editor and click Analyze text. Then, in a matter of seconds, the tool shows you the human-written score.

Now, let’s see how the tool performs.

Performance

Let’s try writer.com’s AI content detector by feeding it 10 human-written samples and 10 AI-written samples. Furthermore, I’m also going to analyze the results to see whether you should use this tool or not.

Let’s start by feeding the human-written content to the AI detector.

1. Human-Generated Content

The following 10 examples of writing are sections randomly taken from my coding/tech blog. These are 100% human-written pieces of content.

So in an ideal world, the AI content detector should give a 100% score for each input.

Let’s see how it plays out!

Example 1

Example 2

Example 3

Example 4

Example 5

Example 6

Example 7

Example 8

Example 9

Example 10

In total, the AI only detected 1 of the 10 human-written pieces to be entirely written by humans.

But that’s just the number of 100% human-written predictions. We can’t be so strict when assessing the performance of this tool because — it’s super hard to tell if the content is exactly 100% human-generated. A more relaxed limit like 80% is good enough.

On writer.com , if the score is more than 80%, you’re going to see a green color!

But in my tests, we only saw 4/10 green lights. This means the tool still thinks 60% of the samples I provided it with are written by AI, even though they’re written by me. So it’s not accurate in detecting human-written content.

But hey, it’s an AI detector, not a human detector. As another test, let’s feed writer.com AI content detector a bunch of AI-written samples.

2. AI-Generated Content

Now, let’s see how the AI detector performs with AI-generated content. Ideally, we should see a 0% score for all the text samples that follow.

But because being absolute in predicting writing is tricky, let’s only care about the scores that aren’t green!

Example 1

Example 2

Example 3

Example 4

Example 5

Example 6

Example 7

Example 8

Example 9

Example 10

I think the AI detector did way better in this one. It detected 8/10 of the samples to be written by AI. Also, it gave some really low scores for the samples which is good!

But it also falsely identified two samples to be almost entirely written by humans. This makes me think it’s quite easy to trick the algorithm behind this tool.

Can You Trick the AI Content Detector?

Writer.com’s AI detector does a decent job of detecting AI-written content. In my tests, writer.com detected 80% of the AI-written samples to be written by AI — not perfect, but it at least gives you some direction.

Now, let’s try to cheat the system with a bunch of easy tricks.

I’m going to take this sample that the AI detector correctly marked as 0% human-generated:

Test 1: Remove a Comma

➡️ TLDR; Removing a single comma changed the AI content detector’s mind from 0% human-generated to 71% human-generated content.

As a first test, let’s introduce a small grammatical incorrectness in the content by removing a comma:

Wow…

Removing a single character from the text makes the AI detector see the content in a completely different light. Now the tool says almost all the content is written by a human, even though I just made a single change in it.

Test 2: Make a Typo

➡️ TLDR; Removing a single character changed the AI content detector’s mind from 0% human-generated to 98% human-generated content.

Now, let’s make an intentional typo to see how it affects the output of the AI content detector. I’m going to remove the character “i” from “their”:

This time removing one character completely changed the AI detector’s mind about the content. Now it claims that almost the entire piece of text was written by a human even though we just changed one character.

But making the text grammatically incorrect is not the best way to make it fool an AI detector. As a result, you get a good score but are left with incorrectly written content.

Let’s try something more “clever”.

Test 3: Use an AI Paraphraser

➡️ TLDR; Rephrasing the content changed the AI content detector’s mind from 0% human-generated to 98% human-generated content.

You can use AI to trick an AI content detector.

These days, there are lots of paraphrasing tools, such as QuillBot. The idea is simple: Input some text to the tool, click “Paraphrase”, and let the AI re-write the text for you.

For example, let’s input the 0% human-generated content to QuillBot:

Now, let’s enter the paraphrased version in the AI detector.

And here we go!

A 0% human-written sample that the AI claims to be 98% human-written. All I had to do was generate the content and paraphrase it with AI. So in total, it took 5 seconds to trick the AI detector. Also, the new content looks okay (except for the first sentence which sounds odd).

Based on the tests, let’s take a look at the pros and cons of writer.com’s AI content detector.

Pros

  • Free. The writer.com AI content detector is free to use. You don’t even need an account to start testing the content.
  • Easy to use. Because the AI content detector is just a browser app, all you need to do is enter writer.com’s website to start using it. No installations are needed!
  • Decent accuracy. Based on my tests, the tool was able to spot 8/10 AI-written pieces.

Cons

  • Easy to trick. It’s easy to make the content pass the AI detector with flying colors by rephrasing the AI-written text with a tool like QuillBot. Also, very small changes in the input can completely change the detector’s mind.
  • Unreliable. Even though the AI content detector was able to point out some AI-written samples, it falsely claimed 2/10 of my tests to be human-written even though they were not.
  • Word limit. Last but not least, there’s a 350-word word limit in the writer.com AI content detector.

Final Verdict

Writer.com’s AI content detector can give you an idea of AI-generated content. But it’s not sophisticated enough to be even close to detecting all the AI-written content.

Even though my sample size in testing was quite small, I could already get an idea of how unreliable the tool is.

  • 60% of human-written samples were falsely detected as AI-written content.
  • 20% of AI-written samples were falsely identified as human-written content.

Also, it only took 2 seconds to make a change to the input that completely fooled the detector.

I would say this tool gives you some idea of whether the content is AI-written or not. But I wouldn’t use it!

5. Copyleaks AI Detector

CopyLeaks AI is an AI content detector that recognizes AI-written content with decent probability.

The working principle is simple:

  1. Copy-paste text to CopyLeaks editor.
  2. Run the scan.
  3. See the originality score.

However, because of the high-quality AI content, it might seem impossible to have a tool to classify the text as AI-written. Let’s put the CopyLeaks AI detector to the test to see if it works.

Performance

To test the CopyLeaks AI performance, I’ll feed the tool 10 human-written text samples and 10 AI-written samples. More specifically, I took:

  • 10 random human-written text samples from my blog.
  • 10 AI-generated samples of text that I produced with ChatGPT.

Let’s start by feeding the tool some human-generated text.

1. Human-Generated Content

In this section, I input parts of my blog posts into the CopyLeaks AI detector. In each section, I’ve highlighted whether the tool detected the content successfully or not.

In an ideal world, CopyLeaks AI should identify all of the pieces as human-written.

Let’s start the tests.

Example 1

Falsely identified as AI-written — mission failed.

Example 2

Falsely identified as AI-written — mission failed.

Example 3

Correctly identified as human-written — mission succeeded.

Example 4

Correctly identified as human-written — mission succeeded.

Example 5

Correctly identified as human-written — mission succeeded.

Example 6

Correctly identified as human-written — mission succeeded.

Example 7

Correctly identified as human-written — mission succeeded.

Example 8

Correctly identified as human-written — mission succeeded.

Example 9

Correctly identified as human-written — mission succeeded.

Example 10

Correctly identified as human-written — mission succeeded.

CopyLeaks AI correctly detected 8/10 human-written samples as human-written.

This is impressive, although the sample size is small with only 10 text samples. To get more accurate results, you’d have to do an order of magnitude more tests.

But with this data already, you can tell the AI detector makes some mistakes and is by no means reliable.

So if you’re given a text chapter or blog posts, you can’t truly rely on CopyLeaks AI.

But depending on your use case, this might not be an issue. Many people use AI detectors to edit their content until it no longer looks AI-written.

2. AI-Written Content

I believe most authors use AI content detectors to help edit their AI-written content until it appears human-written.

This is why it’s more important to see how the CopyLeaks AI Detector performs with AI-written text samples.

Here are the results of inputting 10 AI-written text samples into CopyLeaks AI.

Example 1

Falsely identified as human-written — mission failed.

Example 2

Correctly identified as AI-written — mission succeeded.

Example 3

Correctly identified as AI-written — mission succeeded.

Example 4

Falsely identified as human-written — mission failed.

Example 5

Correctly identified as AI-written — mission succeeded.

Example 6

Falsely identified as human-written — mission failed.

Example 7

Falsely identified as human-written — mission failed.

Example 8

Correctly identified as AI-written — mission succeeded.

Example 9

Falsely identified as human-written — mission failed.

Example 10

Correctly identified as AI-written — mission succeeded.

That didn’t go so well.

The AI only detected 5/10 AI-written pieces as AI-written.

So if you use ChatGPT, there’s only about a 50% chance of getting caught by CopyLeaks AI. Keep in mind the sample size is small with only 10 pieces of text, so you’d need more rigorous testing to get a more reliable result of accuracy.

However, if 5/10 tests fail, you can already conclude that the AI detector doesn’t work.

Can You Trick the AI Content Detector?

Last but not least, let’s play some cheap tricks on the CopyLeaks AI detector to see if we can fool it

I’m going to use this piece of AI-written text as the input by making some small changes to it.

Test 1: Remove a Comma

➡️ TLDR; Removing 1 comma fooled the detector completely.

When testing AI content detectors, I’ve noticed that some of the tools are quite easy to fool. With a single character change or typo, the tool might change its opinion from 0% human-generated to 100% human-generated.

This is not ideal. If you change a single character or add a single word, that shouldn’t affect the result.

Let’s try removing the comma that follows the word “Additionally” and see what happens:

As a result, removing a single comma made the tool think this AI-written text sample is human text.

Test 2: Make a Typo

➡️ TLDR; Removing a single letter completely changed the detector’s mind.

As another test, let’s see what happens when I remove a single character from the AI-written text.

For example, I’ll remove the letter “i” from “their”:

Once again, the AI detector changed its mind from AI-written to human text.

Test 3: Use an AI Paraphraser

➡️ TLDR; rewording the AI-written text fooled the AI detector badly.

Last but not least, let’s try rewording the AI-written text. I’m using an AI paraphrasing tool called Quillbot. With QuillBot, it takes about 2–3 seconds to reword the entire chapter of the text.

If you pay attention to the reworded text, it looks decent. But parts of it look very bot-like and you don’t even need an AI detector to be suspicious about the content.

Anyway, let’s input this reworded AI-paraphrased sample into CopyLeaks AI:

And once more, these quick changes completely turned the AI detector around.

Pros

  • Free. CopyLeaks AI is free to use. This makes it accessible and effortless to give it a try. Also, the UI is super simple.
  • Detects human written text decently. Based on my experiences, CopyLeaks can decently classify human-written text as human-written. It makes some mistakes, though.

Cons

  • Doesn’t detect AI-written content. When it comes to detecting AI-written content, CopyLeaks doesn’t work. In my case, it detected only 50% of the AI-written samples as AI-written.
  • Easy to fool. You can fool the detector by removing or replacing a single character.

Final Verdict

CopyLeaks AI doesn’t detect modern AI-written content. I’m sure this tool was great when models like ChatGPT didn’t exist.

But it doesn’t seem to pick up ChatGPT-generated content which is a big minus as most AI-written stuff these days is generated with ChatGPT.

And to be fair, none of the AI detectors I’ve tried work that well. Some have 90% accuracy, though. But that’s way too little to be a completely reliable tool.

6. GPTZero

GPTZero is an online AI detector specifically designed to differentiate between human and AI-generated text.

It’s a powerful tool that dives into the world of AI and brings clarity to the blurred lines between artificial and human intellect.

GPTZero operates on an accessible, user-friendly platform where you simply input any written text.

The tool then processes your submission and provides a likelihood that the text was generated by AI.

The scoring system used by GPTZero is an intricate algorithm that scrutinizes the subtleties in language use, patterns, and structure often found in AI-generated content.

It employs advanced machine learning techniques to recognize these characteristics, ultimately providing a score on a defined scale.

Now, let’s see how it performs.

Performance

To test GPTZero, the natural first step is to feed it some AI-written content to see if it flags it as “AI-written”.

If it performs well, we can test how well it acknowledges human written content.

I’ll start inputting ChatGPT-written content to it and see if it flags the text “AI written”.

The tool asks to input at least 250 characters, so let’s make it long by entering 150–400-word blog posts into the system.

Example 1

Input:

Output:

GPTZero got it right. This post is indeed written by AI.

Example 2

Input:

Output:

GPTZero got it wrong. This post is entirely written with AI.

Example 3

Input:

Output:

GPTZero got it wrong. The post is entirely written by AI.

Example 4

Input:

Output:

GPTZero got it wrong. The post is entirely written by AI.

Example 5

Input:

Output:

GPTZero got it wrong. The post is entirely written by AI.

I think we can stop right here.

Out of the 5 ChatGPT-generated inputs, GPTZero failed to detect AI in 4 of those.

That’s an 80% false negative rate.

Thus, it’s clear to me that a tool like this is completely unreliable. I didn’t even craft my prompts in a particular way and I used the GPT3.5 model. I can only imagine what would’ve happened had I used GPT4…

Nonetheless, let’s try something else with GPTZero to see if it’s robust on those rare occasions when it gets the AI detection score right.

Can You Easily Fool GPTZero?

Let’s try to fool GPTZero with a bunch of easy tests.

To do this, let’s choose the text chapter I used in the first example where GPTZero was able to point out that the content is likely entirely written by AI.

At the end of this post (not visible in the above screenshot) there’s a final sentence that reads: “Until next time, keep cuddling those kitties and embracing the magic of the meow!

Let’s see if we make an ever-so-slight change to the content and type: “Until next time, keep cuddling those kitties and embracing the magic of MEOW!

You see, I changed “magic of the meow!” to “magic of MEOW”.

Surprisingly, GPTZero completely changed its mind about this content.

Instead of “Entirely written by AI” it said that it’s “Entirely written by human” with such a small change…

Based on this, it’s clearly easy to fool GPTZero by changing one word or even one character.

Results

GPTZero does not work.

Given ChatGPT inputs, it gets them wrong most of the time and claims them to be human-written.

I used the worst version of ChatGPT, that is, GPT3.5 on this post. Still, out of the 5 inputs, GPTZero only flagged 1 as AI-written…

Also, if GPTZero marks something AI-written, you can just change one word or character to make it “completely human-written”.

I wouldn’t use an AI detector like this. It simply isn’t accurate.

In my eyes, ChatGPT is so clever that it produces text like humans. Thus, it might not follow a specific pattern that would be easy to detect.

This is just my anecdotal experience, though. I have not researched this topic.

Now, I have written a blog post about the best AI detectors. Make sure to check that out if you want to experiment with other similar tools.

However, I must tell you first that there are no tools that would be even close to 90% accurate. There are some decently performing ones, but almost all AI detectors are easy to fool!

Wrapping Up

And that’s a wrap.

GPTZero has a ton of users and a lot of hype behind it. But as it turns out, it’s not capable of detecting AI in ChatGPT-written content.

I think ChatGPT is just too clever to be detected.

It probably doesn’t follow a simple hard-coded pattern, but mimics the human brain behavior when coming up with content.

Of course, these detectors might improve over time. But for now, GPTZero is not worth using in my eyes.

Thanks for reading. Happy writing!

--

--

Artturi Jalli

Youtube: @jalliartturi to take your blog to the next level