How can we help?

Home > Turnitin Scoring Engine > Evaluation > Preparing Training Data Sets

Preparing Training Data Sets

The Turnitin Scoring Engine uses automated Graders to score writing submissions. These Graders predict the scores for submissions based on patterns learned from your experts. In order to teach a Grader to make accurate predictions you will need a sample set of pre-scored valid essays. 

 

In order for the Scoring Engine to appropriately parse and process your sample data, the file of pre-scored essays you provide must be formatted as follows:

Required Formatting Standards

File Type

The file must be formatted using the CSV standard (comma-separated values) and should end with a “.csv” file extension (e.g. “your-file-name.csv”). Most common spreadsheet applications are capable of saving files in the CSV format. Use the “Save As…” option in your spreadsheet program to save the file as a CSV instead of the default format.

 

"Save as..." menu in Microsoft Excel for Mac 2011

"Save as..." menu in Microsoft Excel 2011 (Mac)

"Save as..." menu in Microsoft Excel 2013

"Save as..." menu in Microsoft Excel 2013 (Windows)

Encoding

The file must be ASCII or UTF-8 encoded. If you use a common spreadsheet application to create and save your training file, this requirement should not be a concern. If your computer’s operating system defaults to a non-Latin alphabet (e.g., Arabic, Chinese, Hebrew, Japanese, Korean, Russian, etc.), you should check your spreadsheet application’s settings to ensure the UTF-8 format is selected when saving the file. Other versions of UTF including UTF-16 are not supported.

Required Content Standards

“Submission ID” Column

You are not required to include identifiers for each essay in your training file. However, if you do include a column of data to store unique identifiers for your essays (e.g., a submission number), the header or label placed at the top of this column must be “ID”. Please DO NOT use values that reveal the identities of your students (e.g., names, email addresses, etc.).

“Responses” Column

You must have a column containing the written prompt responses or essays your experts scored. Responses cannot be longer than 5,000 words.

Score Column(s)

You must include at least ONE column containing a score for each essay. You may include up to TEN total score columns in the file. Every response or essay must have a score for all of the scoring columns you include. In other words, essays with missing score values must be either be scored or removed from the training file before you can submit it.

Score Column Values

The labels or headers you use for each score column MUST be unique, must be no more than 30 characters long, cannot be empty or blank, and cannot contain line breaks or tabs. Also, score values must be integers (e.g., 0, 1, 2, etc.) or decimal values (e.g., .5, 1.5, 2.75, etc.). Finally, you cannot assign the same score to every single essay for any of your score columns.

"Reader" Column

You are not required to include identifiers that indicate the expert who read and scored each essay in your training file. However, if you do include a column containing information such as each reader’s identifier or initials, the header or label placed at the top of this column must be “Reader”.

 

 

You must to post a comment.
Last modified
14:01, 26 Apr 2016

Tags

This page has no custom tags.

Classifications

(not set)