CodeX
Published in

CodeX

CODEX

How to Detect Code Plagiarism

A short guide to using Stanford’s MOSS

Source: Sielan, via iStock

When classes moved online in 2020, academic institutions across the country watched as the rate of cheating soared. It’s never fun dealing with plagiarism, but it is important to detect it, regardless of where we stand in the academic debate on how to best handle this type of cheating during a pandemic, as well as in general.

The best tool that helps you to do this for free is Stanford’s MOSS. This tutorial provides a quick route to setting it up, to help you hit the ground running.

Table of Contents:

  1. What is MOSS?
  2. How to register with MOSS
  3. How to set up MOSS on your computer
  4. How to check an assignment for plagiarism
  5. A note of caution

1. What is MOSS?

The name MOSS stands for Measure of Software Similarity. MOSS does exactly what its name says—it measures similarity between pairs of files from a list. You supply this list, and MOSS does the rest, and very effectively. When it finishes, it returns the results as a web page that you can examine.

MOSS uses a document fingerprinting algorithm called winnowing, which is robust to whitespace, noise, and order. In other words, if a student tries to cheat by changing whitespace, changing variable names or sprinkling extra statements here and there, or scrambling some statements, MOSS will still flag them for possible code plagiarism.

To be able to cheat successfully with MOSS, the student would need to go to great lengths to modify their code to a point where the style completely changes. Doing this requires just as much effort as genuinely doing the work!

This paper explains in more depth how the winnowing algorithm works.

2. How to register with MOSS

Send an email to moss@moss.stanford.edu that looks exactly like this:

registeruser
mail username@domain

Change username@domain to your e-mail address. Do not change anything else. I had a space between the two lines in my first attempt and their server ignored my email.

3. How to set up MOSS on your computer

After you email the registration request, you will receive a response that contains a Perl script and your unique user id, along with instructions on how to set up the software on your computer.

Steps to set up MOSS, based on that response email:

  1. Highlight and copy the Perl script, which is all the lines below the dashed line where it says to “cut here.”
  2. Open Terminal and create a new folder wherever you want. I created mine in my Applications directory and named it “MOSS.”
  3. In this folder, create a file named moss.pl and paste the Perl script into it. Save and close.
  4. Grant this file execution permission: chmod ug+x moss.pl

Terminal output:

me@Briennas-MBP /Applications % mkdir MOSS
me@Briennas-MBP /Applications % cd MOSS
me@Briennas-MBP MOSS % nano moss.pl
me@Briennas-MBP MOSS % ls -l
total 24
-rw-r--r-- 1 me admin 11154 Dec 22 14:39 moss.pl
me@Briennas-MBP MOSS % chmod ug+x moss.pl
me@Briennas-MBP MOSS % ls -l
total 24
-rwxr-xr-- 1 me admin 11154 Dec 22 14:39 moss.pl

I did this on a Mac, but the workflow should be similar on Windows.

4. How to check an assignment for plagiarism

Let’s consider my midterm exam. To check this exam for plagiarism, we just need to 1) organize the files and 2) submit the request to the server.

Organize the files.

Each exam submission consists of one .java file. All these files should be in the same directory, according to MOSS requirements. We can create a directory at /Applications/MOSS/Midterm_Exam/and put all the submissions inside it.

Terminal output:

(base) me@Briennas-MBP MOSS % ls
Midterm_Exam moss.pl
(base) me@Briennas-MBP MOSS % cd Midterm_Exam
(base) me@Briennas-MBP Midterm_Exam % ls
A.java C.java E.java G.java I.java K.java
B.java D.java F.java H.java J.java L.java

For confidentiality, I changed each student’s name to an arbitrary letter ID, but I usually use their real names, e.g. jane_doe.java.

Submit the request to the server.

From the Midterm_Exam directory, we can submit the request:

../moss.pl -l java -c "Midterm Exam" ./*.java

General syntax:

[path/to/moss/executable] -l [language] -c [name] [path/to/files]

The moss script contains detailed usage instructions that explain each possible option, but, as shown here, -l and -c do everything that I need.

Specify a language with -l. This option tells MOSS which language the files are in, so it can process them appropriately. Since the exam submissions are written in Java, we specify -l java.

MOSS supports the following languages:

@languages = (“c”, “cc”, “java”, “ml”, “pascal”, “ada”, “lisp”, “scheme”, “haskell”, “fortran”, “ascii”, “vhdl”, “perl”, “matlab”, “python”, “mips”, “prolog”, “spice”, “vb”, “csharp”, “modula2”, “a8086”, “javascript”, “plsql”, “verilog”);

Provide a label with -c. This option informs MOSS that we would like to label our report as such, for our own records.

Terminal output:

(base) me@Briennas-MBP MOSS % ./moss.pl -l java -c "Midterm Exam" ./Midterm_Exam/*.java
Checking files . . .
OK
Uploading ./Midterm_Exam/A.java ...done.
Uploading ./Midterm_Exam/B.java ...done.
Uploading ./Midterm_Exam/C.java ...done.
Uploading ./Midterm_Exam/D.java ...done.
Uploading ./Midterm_Exam/E.java ...done.
Uploading ./Midterm_Exam/F.java ...done.
Uploading ./Midterm_Exam/G.java ...done.
Uploading ./Midterm_Exam/H.java ...done.
Uploading ./Midterm_Exam/I.java ...done.
Uploading ./Midterm_Exam/J.java ...done.
Uploading ./Midterm_Exam/K.java ...done.
Uploading ./Midterm_Exam/L.java ...done.
Query submitted. Waiting for the server's response.
http://moss.stanford.edu/results/7/8894930687706

Once MOSS finishes and returns the result as a URL, copy and paste it into a browser. The URL is valid for 14 days. After that, you will need to resubmit the query to see the results again.

These pairwise comparisons show that quite a few of the submissions are highly similar. Yikes. Let’s look at Students F and J.

Students F and J exemplify a strong case of code plagiarism, all the way down to the same mistakes in logic. MOSS is actually quite conservative about its similarity determinations. If it says that two files look similar, then they look quite similar.

Also, we can see that one student attempted to change the variable names and change the comments, but unfortunately for them this was not enough to fool the winnowing algorithm.

5. A note of caution

MOSS shouldn’t be given supreme authority on what constitutes plagiarism. While it detects code similarity, it doesn’t know why the code is similar. To decide whether or not there was plagiarism, a human needs to go and look at the flagged sections. Stanford also emphasizes that regardless of who first discovered the flagged code—the human or MOSS—the case that code plagiarism happened should stand on its own.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store