Maple 17 includes a new package for linguistic analysis and grading of essays. The ability for a computer to successfully grade essays is inherently mathematical. Given a set of essays that have already been graded by hand, the computer looks for patterns in the essays and tries to weight them according to the given scores. Things like key words, sentence structure, length, and variation of words tend to have a significant correlation to good or bad scores. Maple's scoring model can pick from up to 20 algorithms, each measuring dozens of properties to formulate a model that can be used to predict scores for new essays. The EssayTools package contains functions for:
The grading commands are best used in an advisory capacity. They are great for giving insight into student responses for homework and practice. They are also effective as a double-check in high-stakes testing where many markers are utilized. For example, any essay where the human score disagrees with the computer score by more than one or two points could be flagged and re-graded by an independent human. To get good predictions it is best to seed your model with hundreds of scored essays - the more the better. For the purposes of this overview, we will use examples that have insufficient data in order to illustrate the form of the commands as well as some of the pitfalls and limitations of technology like this. Consider a wide open short-answer question like "Why is the sky blue?". Gather several responses and put them in an array. Here, we call the array Answers. Provide a second array with a grade for each of the answers. In this example, we call the second array Scores. In practice, these responses and grades could be read in from a .csv file using one of the ImportVector or ExcelTools:-Import commands.
Use the BuildScoreModel command from the EssayTools package to generate a model.
This model could be saved, for example, by calling LibraryTools:-Save(model,"blue_sky.mla");. Later, when you have more responses, you can just point libname := libname, "blue_sky.mla"; and the variable, model, will implicitly be available for use. Now, let's examine a previously unseen response:
Other EssayTools commands can be used to get an idea of what goes on behind the scenes. First, let's look at the words that occur with high frequency in the model answers.
Let's use some of these high-frequency words to form nonsense answers.
This model gives a disproportionate weight to the use of the word "molecules", which always appears in good answers, but does not appear in low-scoring answers. This emphasizes the need to have a large sample of data to build the model with. Reduction techniques can be used to simplify the essay by dropping unimportant words, coalescing words with similar meaning, and splitting the response into smaller idea-phrases.
An important tool in this process is the use of word lemmas.
Knowing where to split a sentence requires that each sub-part usually has at least a noun and verb, and be split at a conjunction.
Part-of-speech information is not available for all words; as you can see, "scatters" is not known and returns FAIL. The Reduce command will attempt to use the Lemma command to get the root word.
Another technique for scoring essays is to find the most similar essay or essays in the model set. There are many ways to compute similarity scores.
s According to the default similarity metric, BinaryJaccardCoefficient, the second essay is most similar, receiving a score of .22.
The CosineCoefficient metric instead picks answers one and four, while the DiceCoefficient metric picks essay eight. Each of these metrics, and their binary counterparts, measures similarity in a different way. When a similarity score exceeds a certain minimum, it indicates that there is a good chance those essays are copies of each other in part or in whole.
The DetectPlagiarism command compares all given essays with each other and flags those that exceed a minimum similarity score. In this case we see that essays 1 and 4 are certainly copies of each other, as indicated by a similarity score of 1. Essays 7 and 8 are flagged as possible copies since they are both short, and start with exactly the same 5 words. |