Content portal

The content portal is only available for Crossref Similarity Check customers.

Crossref Similarity Check users who are administrators will have access to the content portal. To access the portal, select Content from the side menu.

Using the portal

The content portal will display all the publishers connected to your account. This tool will allow you to view how much of your content has been successfully indexed and is now searchable. It will also allow you to self diagnose the content that has failed to be indexed through the Error Report.

The columns for each publisher show the number of content items indexed, the number of items that are pending, the number of items that have failed, and the successfully indexed percentage.

Select the arrow to the left of the publisher’s name to expand the information.

The expanded information includes the DOI prefix unique to the publisher.

Select Download to download the Error Report to your device. The Error Report only contain the content items that failed to be indexed. The report will be a .csv file.

The report will only download up to 500,000 lines.

Understanding the report

To download the Error Report, select Content from the side menu. From here, select the drop-down for the publisher you’d like a report for.

Select Download to download the report to your device. The report will be a .csv file.

The table below details what is included in the report:

turnitin_internal_id The ID that Turnitin attributed to your content. This ID helps us keep track of the content on our end.
provider_item_id The iThenticateDOI (Digital Object Identifier).
doi_prefix The DOI prefix that is unique to each publisher.
publisher_name The name of the publisher.
state

The current state of the file. The four states are:

  • INDEXED - The item has been successfully indexed.
  • PENDING - The item is in the process of being indexed.
  • ERRORED - The item has been unsuccessfully indexed. Check the error_code and error_msg column to diagnose the problem.
  • INVALID - There is insufficient data to attempt to index the item.
article_url The full-text URL for the file.
final_url The full-text URL will sometimes redirect to a different URL than the one provided. If this occurs, this section will show the final URL from which Turnitin indexed the content.
mimetype Also known as the content type. This column will help identify what the type of content that has been indexed is. It will also indicate when content has been incorrectly indexed.
error_code The error code will indicate what has caused an error with the indexing.
error_msg The error message will give more information about the error.

Content types

The mimetype column will show the content type of the item. Below is a legend to help you see what content type each item will be indexed as.

application/pdf PDF (.pdf)
text/plain Plain Text (.txt)
text/html HTML
application/vnd.oasis.opendocument.text OpenDocument Text (.odt)
application/vnd.openxmlformats-officedocument.wordprocessingml.document Word Document (.docx)
application/msword Word Document (.doc)
application/x-hwp Hancom’s Hangul Word Document (.hwp)
application/rtf Rich Text Format (.rtf)
text/rtf Rich Text Format (.rtf)
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet Excel Document (.xlsx)
application/vnd.openxmlformats-officedocument.presentationml.presentation Powerpoint Document (.pptx)

If a content item is not one of these types, it will be unable to be indexed.

Error code meanings

These are the following error messages, what they mean, and what actions would need to be taken either by a publisher or Turnitin in order to resolve that error.

Error Code:

CONNECTION_REFUSED
Error Message: Connection was refused
Error Meaning: We have attempted to download the file but the remote server has refused our connection.
What can you do? Ensure that our IP address and UserAgent is added to an allow list for your site. Otherwise ask your webmaster to investigate further.

Error Code:

CONNECTION_RESET
Error Message: Connection was reset
Error Meaning: We have attempted to download the file from the remote server but during the download the server has closed the connection.
What can you do? Ensure that our IP address and UserAgent is added to an allow list for your site. Otherwise ask your webmaster to investigate further.

Error Code:

CONNECTION_TIMED_OUT
Error Message: Connection timed out
Error Meaning: We have attempted to download the file but the remote server has not responded quickly enough and so we have timed out.
What can you do? Ensure that your server is able to send data at a sufficient speed or check for any network problems.

Error Code:

DUPLICATE_CONTENT
Error Message: Document not indexed as it is already indexed under item ID: 10.1524/9783486858280.119. (internal Turnitin ID: 233609665, hash of content: -7848972029731998828)
Error Meaning:

We have received two separate items from the same provider which are identical to each other. We will not index an identical item and so any duplicates will display this error. The error message includes the item ID which indicates the item which we previously indexed that is identical to this item. The causes of this error include:

  • We have been provided with the exact same full-text URL for two separate items
  • We have different URLs but they point to the same file
  • We have different URLs but we are sent to an error page which displays identical text
What can you do?

Depending on what has caused the error they may have to:

In some cases this error is valid because multiple articles are contained within a single file. However, we will still not index the same file multiple times and so we can only leave this error on the duplicate items.

Error Code:

DUPLICATE_CROSSCHECK_ITEM
Error Message: The crosscheck item exists for more than one provider
Error Meaning: We still have processed the item but have duplicated it within our system due to an error within the Turnitin system.
What can you do? No action required.

Error Code:

EXTRACT_FAILED
Error Message: Internal processing issue: {additional info}
Error Meaning: We have successfully downloaded a file but have failed to index it into our database. Causes of this can vary but usually the issue is that the file is in a format or encoding that we do not support.
What can you do? Check whether the file conforms with our supported file specifications. If not then update the file.

Error Code:

FETCH_FAIL_GENERAL
Error Message: Unable to make connection {additional info}
Error Meaning: We have received a general error from the site when trying to download the file. The error message will provide a specific error code.
What can you do?

Depending on what the error code in the error message is the provider will have to do one of the following:


Error Code:

FETCH_FAIL_2XX / FETCH_FAIL_4XX / FETCH_FAIL_5XX
Error Message: FETCH_FAILED_XXX: {additional info}
Error Meaning: We have received an error from the site when trying to download the file. The error message will provide a specific error code.
What can you do?

Depending on what the error code in the error message is the provider will have to do one of the following:


Error Code:

HTTP_RESPONSE_TO_HTTPS_CLIENT
Error Message: Server gave HTTP response to HTTPS client
Error Meaning: When connecting to a secure HTTPS link the server has responded with an insecure HTTP response.
What can you do? Investigate further into why these incorrect responses are being sent.

Error Code:

INTERNAL_ISSUE
Error Message: Varies depending on issue
Error Meaning: We have encountered some kind of internal issue while trying to process the item. Turnitin actively monitors and tries to resolve these issues but on occasion these errors may be presented to the provider.
What can you do? Contact Turnitin for more information.

Error Code:

INVALID_CERTIFICATE
Error Message: Certificate was invalid
Error Meaning: When attempting to download the item from your site we have encountered an error with the security certificates used on your site. Security certificates are used in order to verify the identity of a website and create a secure connection between your site and a user. If a certificate is invalid for any reason then we are unable to connect to the site as it poses a security risk to our system. This could and probably will affect others trying to access your site as well. You can go to https://www.sslshopper.com/ssl-checker.html or https://www.digicert.com/help/ to check your website domain to see the errors with the certificates.
What can you do? Fix the certificates on your site.

Error Code:

INVALID_CROSSREF
Error Message: Item invalid: Full Text URL
Error Meaning: The provider has not supplied a full-text URL within their metadata.
What can you do? Update the item with a valid full-text URL.

Error Code:

MISSING_LOCATION_HEADER
Error Message: Missing Location header
Error Meaning: The link indicates that a redirect is required but a new link to redirect to has not been supplied within the link header.
What can you do? Either update the item with a valid full-text URL or contact your webmaster to investigate the error on your site.

Error Code:

NETWORK_IS_UNREACHABLE
Error Message: Network is unreachable
Error Meaning: The link has returned an error stating that the site is unreachable.
What can you do? Either update the item with a valid full-text URL or contact your webmaster to investigate the error on your site.

Error Code:

NOT_ENOUGH_TEXT
Error Message: Item invalid: Provided document length must be >=200
Error Meaning: The file we’ve downloaded has less than 200 characters and so we have not indexed it. This is usually because the file does not contain any extractable text such as a PDF that contains an image of text rather than embedded text.
What can you do?

Either:


Error Code:

NO_FILE
Error Message: Item invalid: No local path present
Error Meaning: We have successfully downloaded an item and saved it to our server. However, when we have attempted to index the file we cannot locate it on our server anymore.
What can you do? Contact Turnitin for more information.

Error Code:

NO_ROUTE_TO_HOST
Error Message: No route to host
Error Meaning: The link has returned an error stating that there is no route to the host.
What can you do? Either update the item with a valid full-text URL or contact your webmaster to investigate the error on your site.

Error Code:

NO_SUCH_HOST
Error Message: No such host
Error Meaning: The link has returned an error stating that the host is not reachable.
What can you do? Either update the item with a valid full-text URL or contact your webmaster to investigate the error on your site.

Error Code:

TOO_MANY_REDIRECTS
Error Message: Too many redirects
Error Meaning: When following the link provided we have been redirected to a new link more than 10 times. This may indicate that we have entered a redirect loop and so we stop trying after the 10th redirect.
What can you do? Check the redirects on the link.

Error Code:

UNKNOWN_ERROR
Error Message: Varies depending on issue.
Error Meaning: We have encountered some kind of issue we are not able to identify. The issue could either be internal or on the provider side.
What can you do? The error message may provide sufficient information for the provider to identify and fix the problem. If not then they need to contact Turnitin for more information.

Error Code:

UNREADABLE_FILE
Error Message: Varies depending on issue
Error Meaning: We have successfully downloaded a file but have failed to index it into our database. Causes of this can vary but usually the issue is that the file is in a format or encoding that we do not support.
What can you do? Check whether the file conforms with our supported file specifications. If not then update the file.

Error Code:

UNSUPPORTED_MIME_TYPE
Error Message: Item invalid: Supplied content file has an unsupported mime type
Error Meaning:

A mime type is used to tell computers what type of file they have downloaded so they know how to handle it. For example, a PDF should have the mime type ‘application/pdf’.

When we receive this error we have successfully downloaded a file but the mime type (or content type) associated against that file is not supported by Turnitin. Sometimes the actual file might be a format we support but the mime type associated against it does not match and so we encounter an error.

What can you do?

This error can occur in several ways:

  • The mime type in the HTTP header is not supported. This HTTP header is created by the provider. If the file is indeed a file type we do not support then they need to update it to a file type we do support.
  • If the mime type in the header does not match the actual file type then they need to update the mime type in the HTTP header to match the file type.
  • After we have downloaded the file we attempt to determine the mime type of the file ourselves by sending it through our system. If at this point we identify an invalid mime type then we will show this error. This normally occurs due to the file being generated incorrectly. The provider will have to examine the file to confirm where the issue lies and resolve it.

Error Code:

UNREADABLE_FILE_LOCAL_PATH_DOES_NOT_EXIST
Error Message: Varies depending on issue.
Error Meaning: We have successfully downloaded a file but have failed to index it into our database due to an internal issue on our system.
What can you do? No action required.