This guide helps instructors understand how Turnitin calculates word count, where to find it in the product, and what factors, such as file type and character set, can affect it.
In this guide:
- Overview of word count
- How to view word count in Turnitin
- Word count and file type
- Word count and character sets
Overview of word count
Turnitin calculates word count to help instructors and students assess the length and scope of submissions. However, word count values may differ depending on file type, character set, and how the content is processed.
Word count is also an important component of Turnitin’s file requirements. This guide will help you learn more about these file requirements .
How to view word count in Turnitin
Not sure which version of Turnitin you are using? Learn how to identify your version of the Similarity Report
Feedback Studio/Originality Check
In the classic Similarity Report, word count can be found in the submission information.
To open the submission information, select the “i” icon in the bottom of the layers side panel.
—
Similarity/SimCheck
In the Similarity and SimCheck classic Similarity Report, word count can be found in the submission details.
To open the submission details, select the Submission Details option in the top right-hand corner of the report.
In the new, enhanced Similarity Report, word count can be found in the submission details.
To open the submission details, select the “i” icon
in the top
right-hand corner of the report.
Available information is organized into relevant tabs (depending
on license).
Word count is found under the File tab.
Word count and file type
Turnitin processes file formats differently, which can lead to variations in word count.
| File Type | Processing Method |
| .doc , .docx | Turnitin extracts word count directly from Microsoft Word. Formatting and hidden text are also parsed. |
|
Turnitin’s text extraction system calculates word count. Turnitin analyzes the visible text layer of the PDF, which may omit:
|
|
| .txt | Word count is calculated by Turnitin’s text extraction system. Due to the basic nature of these files, there should be minimal discrepancies. |
Word count and character sets
Turnitin uses internal parsing logic to count words. Non-Latin alphabets (e.g., Chinese, Japanese, Korean) and certain symbols may affect this count.
-
Chinese, Japanese, Korean:
- Turnitin often counts characters as individual words.
- Punctuation and spacing rules can cause variation from traditional word processing tools.
-
Thai, Arabic, and similar languages:
- Microsoft Word may not count words reliably due to script-specific rules.
- PDF files that use Turnitin’s text extraction system may yield a more consistent or different count.
-
Special characters or symbols:
- May be excluded unless attached to standard word structures.
-
Languages with compound words (e.g., German):
- Long compound words are typically counted as one word.
If your institution uses Turnitin for multiple languages, word count expectations may vary based on academic or language norms.