How can we help?

Home > Turnitin Scoring Engine > Evaluation > Understanding Validators

Understanding Validators

TSE employs two sets of validators to help ensure the best possible data set and grader creation.


Data Set Validators inform whether or not the data set is usable. They also help direct how to best prepare your training file. Having a data set file meeting validator standards will aide in a smoother grader creation process. 


Grader Validators provide a snapshot of your grader's scoring efficacy and ability to create accurate graders.   


Validators exist in two forms: 

Data Set Validators

Grader Validators 

Data Set Validators

This form of valdation can occur after you upload your data set and if your data set is not meeting acceptable efficacy levels to create a grader.  

Graders cannot be created for data sets that do not pass validation.



  • If errors exist, you can access the data set validator report by clicking "Download error report" immediately following your upload.  





  • Access the data set validator report by clicking the link from your homepage. 



  • Validator reports are downloaded as a .csv– accessible by most spreadsheet programs. 

Each report will illuminate a specific issue regarding your data set.

Follow validator descriptions to fix errors in your data set.  


  • When complete, update your data set by selecting the "Update Data Set" option from the corresponding drop-down menu. 




Data Set Report Validators
Validator Description
Word Count
  • Min word count = 2

  • Max word count = 5000

  • If any are higher or lower than max/min, it will fail and give that error in the error report

Duplicate Submissions This validator ensures that there are not duplicate essays in each data set. If there are duplicate texts, it will trip this validator.
Uneven Distribution  Rubric traits are not appropriately distributed because they are all the same score.
Max Number of Scoring Dimensions It appears that you have more than 10 score columns in your csv. Please remove some so that you have fewer than 10 score columns and try again.
Min. Number of Scoring Dimensions It appears that your file does not have any scoring columns. Our models need to be able to look at scores in order to learn how to score essays. Please include a scoring column in your csv and try again!
Duplicate Headers It appears that two of your scoring columns have the same headers. Please change one so that they are all unique.
Values on All Columns

It appears that there are some missing scores or essays in your data set.

Scores and Numbers

At this time, we only allow for scores to be integers or decimals. It appears that your scores are not. Please convert your scores to a numeric scale and try again.

Headers are Alphanumeric  It appears that your headers contain tabs or linebreaks. Please remove these and try again.
Headers Exceed Character Limit It appears that one of your headers is longer than 30 characters, please shorten it to 30 characters or less and try again.
Headers aren't Empty  It appears that one of your column headers contains no text or description, please give it a title and try again.


Grader Validators

In-system validators provide a snapshot of your data's ability to create a usable grader.  At a glance, data sets will be labeled with green checks, yellow exclamation points, and red x's.


Quadratic Weighted Kappa (QWK) is a measure of agreement. For The Scoring Engine, it is a measure of how our predictive capabilities agree with the scores from the dataset you give us. 


The scale ranges from 0.0 to 1.0.


QWK range from 0.6 and above


QWK range from 0.5 - 0.6


QWK range from 0.0 - 0.5


  • More detailed information can be found by clicking the expand arrow.




  • Grader Validators measure efficacy using quadratic weighted kappa or QWK. 




Depending on the evaluation results of your data set, you may be ready to create a grader or reevaluate your training data. 

For more information about graders, visit our page on Using a Grader


Last modified


This page has no custom tags.


(not set)