TSE employs two sets of validators to help ensure the best possible data set and grader creation.
Data Set Validators inform whether or not the data set is usable. They also help direct how to best prepare your training file. Having a data set file meeting validator standards will aide in a smoother grader creation process.
Grader Validators provide a snapshot of your grader's scoring efficacy and ability to create accurate graders.
Validators exist in two forms:
This form of valdation can occur after you upload your data set and if your data set is not meeting acceptable efficacy levels to create a grader.
Graders cannot be created for data sets that do not pass validation.
Each report will illuminate a specific issue regarding your data set.
Follow validator descriptions to fix errors in your data set.
|Word Count|| |
|Duplicate Submissions||This validator ensures that there are not duplicate essays in each data set. If there are duplicate texts, it will trip this validator.|
|Uneven Distribution||Rubric traits are not appropriately distributed because they are all the same score.|
|Max Number of Scoring Dimensions||It appears that you have more than 10 score columns in your csv. Please remove some so that you have fewer than 10 score columns and try again.|
|Min. Number of Scoring Dimensions||It appears that your file does not have any scoring columns. Our models need to be able to look at scores in order to learn how to score essays. Please include a scoring column in your csv and try again!|
|Duplicate Headers||It appears that two of your scoring columns have the same headers. Please change one so that they are all unique.|
|Values on All Columns|| |
It appears that there are some missing scores or essays in your data set.
|Scores and Numbers|| |
At this time, we only allow for scores to be integers or decimals. It appears that your scores are not. Please convert your scores to a numeric scale and try again.
|Headers are Alphanumeric||It appears that your headers contain tabs or linebreaks. Please remove these and try again.|
|Headers Exceed Character Limit||It appears that one of your headers is longer than 30 characters, please shorten it to 30 characters or less and try again.|
|Headers aren't Empty||It appears that one of your column headers contains no text or description, please give it a title and try again.|
In-system validators provide a snapshot of your data's ability to create a usable grader. At a glance, data sets will be labeled with green checks, yellow exclamation points, and red x's.
Quadratic Weighted Kappa (QWK) is a measure of agreement. For The Scoring Engine, it is a measure of how our predictive capabilities agree with the scores from the dataset you give us.
The scale ranges from 0.0 to 1.0.
QWK range from 0.6 and above
QWK range from 0.5 - 0.6
QWK range from 0.0 - 0.5
Depending on the evaluation results of your data set, you may be ready to create a grader or reevaluate your training data.