Automated Answer Grader

Published in

Analytics Vidhya

3 min readJan 25, 2020

Uses dissimilar vector clustering to learn answers and award marks.

Teachers evaluating students’ scripts manually

Manually evaluating scripts is a huge Map-Reduce system. The answer scripts are first distributed among the teachers. The evaluation is done by each teacher. The governing body takes steps to avoid any unwanted outcomes. Finally all these grades are collected back for the results to be published. Imagine, if we could have a system that takes in the answer scripts and award marks by itself, generate a report with a feedback for each answer, all in a split second. Our project does that.

First we will need to prepare a system that understands the answers written in natural language. To do this, we need to make a system learn. Our dataset is a JSON file with the following format.

{
    "Question": {
        "Answers: [
            "Answer 1",
            "Answer 2",
            ,
            ,
            ,
            "Answer N",
        ]
        "Marks": M
    }
}

Each question has a set of answers from which the model learns. Each answer goes from natural language to an input vector through the following steps.

Word and sentence count: Counting no of words and sentences.
Computing correct ratio: No of correct words to total words.
Stop words removal: The answer is stripped of all stop words.
POS tagging: All interjections, prepositions, conjunctions and pronouns are filtered.
Synonym inclusion: Synonyms of nouns and adjectives are included.
Frequency distribution: The repetition of words and its synonyms are counted.

>>> vector = prepare_vector("I love to code. Code is addictive.")
>>> vector
{'word_count': 7, 'sentence_count': 2, 'correct_ratio': 1.0, 'clean_words': 3, 'love': 1, 'beloved': 1, 'code': 2, 'addictive': 1}

The features of this vector are word count, sentence count, correct ratio, words(and its synonyms) with its frequency distribution. To accommodate the last feature, we use a dictionary. A sample input would look like the one in the above snippet.

The machine learning model chosen is previously published by Analytics Vidhya on Medium, written by me. It takes input in the form of dictionaries rather than numpy arrays to accommodate dissimilar features which are words here. The number of clusters, i.e., value of K is equal to the marks attribute in the dataset file. Each question will have a separate instance of the model to ensure that there is ambiguity.

The obtained cluster centers are sorted with feature weights in the priority order of content, presentation and correctness. These can be found in our vectors as clean_words, word_count, sentence_count and correct_ratio. The weight of each feature is as below.

feature_weights = {
    'clean_words': 0.5
    'word_count': 0.2,
    'sentence_count': 0.2,
    'correct_ratio': 0.1,
}

Now that our model is trained, we need to evaluate the scripts. These answers are to be fed to the system in a JSON file in the below format.

{
    "Question 1": "Answer 1",
    "Question 2": "Answer 2",
    ,
    ,
    ,
    "Question N": "Answer N"
}

The question is used as the key to fetch the pre-trained model. The answer goes through all the steps that the answers in dataset go through. These vectors are now passed to the predict() function. With the returned label, the matching cluster center is identified. The position of the matching cluster in the sorted center gives the marks of that particular answer.

For each answer a feedback is given, the best cluster center from the sorted center is taken along with the vector of the student answer. The weighted features used for sorting are compared. If clean words are less, content is to be increased. If sentences are fewer, presentation is to be improved. And if the spelling mistakes are more then it is suggested to reduce them. The output obtained is in the below format.

{
    "question": {
        "answer": "The answer to the question",
        "marks_awarded": m,
        "max_marks": n,
        "feedback": "Necessary feedback to improve answer."
    }
}

Thus, by using this system we can bring down the complexity of evaluation from a Map-Reduce like system to that of a simple desktop application.

Code? It’s all at https://github.com/AjayRajNelapudi/Automated-Answer-Grading. Star my repo if you liked it.

Automated Answer Grader

Written by Ajay Raj Nelapudi