Seeking Guidance on AI generated vs Human Generated text classification Model

2 min readSep 8, 2023

Project Overview

I’m currently working on an AI content detection tool. The goal of this tool is to take a piece of text as input and estimate the percentage of that text that is human-written. This tool is similar to what’s offered by originaity.ai.

Seeking Guidance:

I’m seeking guidance on how to train this model to achieve the highest accuracy possible. What model should i select for the task. I’m aware that training a model with limited data can be challenging, and I’m also considering perplexity as a factor in the training process.

Proposed Approach: My plan is to start with a pre-trained language model and modify its last layer to make it capable of estimating the percentage of human-written content in a given text.

Data Collection: To train this model, I intend to curate a dataset that consists of pairs of human-written text and AI-generated text, all on the same topic. This dataset will be the foundation for training the model to distinguish between human and AI-generated content.

I would greatly appreciate any advice or insights from the community on how to best proceed with this project, especially when it comes to fine-tuning the model with limited data and integrating perplexity into the training process. Your expertise and suggestions would be highly valuable. Thank you!

Seeking Guidance on AI generated vs Human Generated text classification Model

Project Overview

Seeking Guidance:

Written by Chhabrasamarjeetsingh