How to Detect Code Plagiarism
A short guide to using Stanford’s MOSS
When classes moved online in 2020, academic institutions across the country watched as the rate of cheating soared. It’s never fun dealing with plagiarism, but it is important to detect it, regardless of where we stand in the academic debate on how to best handle this type of cheating during a pandemic, as well as in general.
The best tool that helps you to do this for free is Stanford’s MOSS. This tutorial provides a quick route to setting it up, to help you hit the ground running.
Table of Contents:
- What is MOSS?
- How to register with MOSS
- How to set up MOSS on your computer
- How to check an assignment for plagiarism
- A note of caution
1. What is MOSS?
The name MOSS stands for Measure of Software Similarity. MOSS does exactly what its name says—it measures similarity between pairs of files from a list. You supply this list, and MOSS does the rest, and very effectively. When it finishes, it returns the results as a web page that you can examine.
MOSS uses a document fingerprinting algorithm called winnowing, which is robust to whitespace, noise, and order. In other words, if a student tries to cheat by changing whitespace, changing variable names or sprinkling extra statements here and there, or scrambling some statements, MOSS will still flag them for possible code plagiarism.
To be able to cheat successfully with MOSS, the student would need to go to great lengths to modify their code to a point where the style completely changes. Doing this requires just as much effort as genuinely doing the work!
This paper explains in more depth how the winnowing algorithm works.
2. How to register with MOSS
Send an email to firstname.lastname@example.org that looks exactly like this:
username@domain to your e-mail address. Do not change anything else. I had a space between the two lines in my first attempt and their server ignored my email.
3. How to set up MOSS on your computer
After you email the registration request, you will receive a response that contains a Perl script and your unique user id, along with instructions on how to set up the software on your computer.
Steps to set up MOSS, based on that response email:
- Highlight and copy the Perl script, which is all the lines below the dashed line where it says to “cut here.”
- Open Terminal and create a new folder wherever you want. I created mine in my Applications directory and named it “MOSS.”
- In this folder, create a file named
moss.pland paste the Perl script into it. Save and close.
- Grant this file execution permission:
chmod ug+x moss.pl
me@Briennas-MBP /Applications % mkdir MOSS
me@Briennas-MBP /Applications % cd MOSS
me@Briennas-MBP MOSS % nano moss.pl
me@Briennas-MBP MOSS % ls -l
-rw-r--r-- 1 me admin 11154 Dec 22 14:39 moss.pl
me@Briennas-MBP MOSS % chmod ug+x moss.pl
me@Briennas-MBP MOSS % ls -l
-rwxr-xr-- 1 me admin 11154 Dec 22 14:39 moss.pl
I did this on a Mac, but the workflow should be similar on Windows.
4. How to check an assignment for plagiarism
Let’s consider my midterm exam. To check this exam for plagiarism, we just need to 1) organize the files and 2) submit the request to the server.
Organize the files.
Each exam submission consists of one .java file. All these files should be in the same directory, according to MOSS requirements. We can create a directory at
/Applications/MOSS/Midterm_Exam/and put all the submissions inside it.
(base) me@Briennas-MBP MOSS % ls
(base) me@Briennas-MBP MOSS % cd Midterm_Exam
(base) me@Briennas-MBP Midterm_Exam % ls
A.java C.java E.java G.java I.java K.java
B.java D.java F.java H.java J.java L.java
For confidentiality, I changed each student’s name to an arbitrary letter ID, but I usually use their real names, e.g. jane_doe.java.
Submit the request to the server.
From the Midterm_Exam directory, we can submit the request:
../moss.pl -l java -c "Midterm Exam" ./*.java
[path/to/moss/executable] -l [language] -c [name] [path/to/files]
The moss script contains detailed usage instructions that explain each possible option, but, as shown here,
-c do everything that I need.
Specify a language with
-l. This option tells MOSS which language the files are in, so it can process them appropriately. Since the exam submissions are written in Java, we specify
MOSS supports the following languages:
Provide a label with -c. This option informs MOSS that we would like to label our report as such, for our own records.
(base) me@Briennas-MBP MOSS % ./moss.pl -l java -c "Midterm Exam" ./Midterm_Exam/*.java
Checking files . . .
Uploading ./Midterm_Exam/A.java ...done.
Uploading ./Midterm_Exam/B.java ...done.
Uploading ./Midterm_Exam/C.java ...done.
Uploading ./Midterm_Exam/D.java ...done.
Uploading ./Midterm_Exam/E.java ...done.
Uploading ./Midterm_Exam/F.java ...done.
Uploading ./Midterm_Exam/G.java ...done.
Uploading ./Midterm_Exam/H.java ...done.
Uploading ./Midterm_Exam/I.java ...done.
Uploading ./Midterm_Exam/J.java ...done.
Uploading ./Midterm_Exam/K.java ...done.
Uploading ./Midterm_Exam/L.java ...done.
Query submitted. Waiting for the server's response.
Once MOSS finishes and returns the result as a URL, copy and paste it into a browser. The URL is valid for 14 days. After that, you will need to resubmit the query to see the results again.
These pairwise comparisons show that quite a few of the submissions are highly similar. Yikes. Let’s look at Students F and J.
Students F and J exemplify a strong case of code plagiarism, all the way down to the same mistakes in logic. MOSS is actually quite conservative about its similarity determinations. If it says that two files look similar, then they look quite similar.
Also, we can see that one student attempted to change the variable names and change the comments, but unfortunately for them this was not enough to fool the winnowing algorithm.
5. A note of caution
MOSS shouldn’t be given supreme authority on what constitutes plagiarism. While it detects code similarity, it doesn’t know why the code is similar. To decide whether or not there was plagiarism, a human needs to go and look at the flagged sections. Stanford also emphasizes that regardless of who first discovered the flagged code—the human or MOSS—the case that code plagiarism happened should stand on its own.
If you’d like to read more of my articles or explore millions of other articles, you can sign up for Medium membership:
Join Medium with my referral link — Brienna Herold
As a Medium member, a portion of your membership fee goes to writers you read, and you get full access to every story…
You can also subscribe to my email list to get notified whenever I publish a new article:
Get an email whenever Brienna Herold publishes.
Get an email whenever Brienna Herold publishes. By signing up, you will create a Medium account if you don’t already…
Some other stories from me that might interest you:
How to bulk access arXiv full-text preprints
With Python3 and the MacOS X command line