We are always working to improve our services and provide users with the best possible experience. This page outlines pertinent information for API users about product changes.

November 2012 Release

Beta Upload Processing

This page contains:

  • A review of the API changes and how they may affect you
  • Examples of data returned by the API
  • How to test the new beta Upload Processing changes

Instructions for Integrating the Beta Upload Processing System

In order to provide an adequate amount of time for iThenticate’s custom built or MTS integrators to test our new upload processing system a beta testing period is available to our customers before we move everyone over to the new process. This beta testing period will last until January 2014. We suggest testing the upload process via your current API setup as we believe, for most customers, that no change is required to your API process in order to successfully use the new upload processing system.

Please download the provided test files and follow the recommended test cases below to ensure that your integration works properly with the new upload processing system.

Setting the flag to test the new upload processing system

To test the upload processing system, API integrations will need to set a specific flag, an XMLRPC boolean flag in the "document.add" method called “non_blocking_upload”.

<member>

<name>non_blocking_upload</name>

<value><boolean>1</boolean></value>

</member>

This is an XMLRPC structure member to include along with the existing “submit_to”, “folder”, and “uploads” members.

The effect of adding this flag will be that the “document.add” method will return more quickly. Your code should then examine the “is_pending” flag returned in the “document.get” method. There is no change to the XMLRPC response returned by the “document.add” method.

Your code will then call the “document.get” method and should inspect the “is_pending” flag of the response. When this flag is no longer true then there will either be a “parts” array that includes the ID of the completed report, or an “error” string element that provides the reason that the document could not be processed.

The following are examples shown are abstract representations of the XMLRPC data structures returned.

Response from the "document.add" (this is unchanged):

{

'sid' => 'db61dccf6333e34393fb032de89520c7bdb4a075',

'messages' => [

'Uploaded 1 document successfully'

],

'uploaded' => [

{

'filename' => 'example.pdf',

'id' => '9764583',

'mime_type' => 'application/pdf',

'folder' => {

'name' => 'My Documents',

'id' => '133344'

}

}

],

'status' => '200',

'response_timestamp' => '20121106T17:26:43',

'api_status' => '200'

};

Here is a very early response from a "document.get" call after uploading with the "document.add" method. Note that is_pending is true and there is no “parts” array (and the percent_match is an empty string indicating an undefined value).

{

'uploaded_time' => '20121106T17:26:43’,

'author_last' => '',

'is_pending' => '1',

'mime_type' => 'application/pdf',

'processed_time' => '20121106T17:26:43',

'percent_match' => '',

'id' => '9764583',

'title' => 'example.pdf',

'author_first' => ''

};

As polling continues the "document.get" method returns more data. Here “parts” are included, but the document is still pending.

{

'uploaded_time' => '20121106T17:26:43',

'author_last' => '',

'is_pending' => '1',

'processed_time' => '20121106T17:26:43',

'percent_match' => '',

'parts' => [

{

'doc_id' => '9764583',

'max_percent_match' => '',

'id' => '9811839',

'score' => '',

'words' => '844'

}

],

'id' => '9764583',

'title' => 'example.pdf',

'author_first' => ''

};

Finally a "document.get" response when "is_pending" is false and percent_match has a defined value. This indicates that the report is ready for viewing.

{

'uploaded_time' => '20121106T17:21:14',

'author_last' => '',

'is_pending' => '0',

'processed_time' => '20121106T17:21:24',

'percent_match' => '26',

'parts' => [

{

'doc_id' => '9764567',

'id' => '9811823',

'score' => '26',

'words' => '844'

}

],

'id' => '9764567',

'title' => 'example.pdf',

'author_first' => ''

};

This is an example of a “document.get” response after iThenticate has determined that the PDF contained no text (e.g. if the PDF is only an image of a document):

Note that "is_pending" is false and the response includes the "error" describing the error in human-readable form. The error code can be used by iThenticate technical support to help in diagnosis of the problem.

{

'uploaded_time' => '20121106T17:28:01',

'author_last' => '',

'is_pending' => '0',

'mime_type' => 'application/pdf',

'processed_time' => '20121106T17:28:01',

'error' => 'The document must contain at least 20 words of text to be accepted by the system. Error: -909',

'percent_match' => '',

'id' => '9764588',

'title' => 'just_an_image.pdf',

'author_first' => '’

};

How to test whether the new upload processing system functions properly with your integration

The following zip file contains documents that you can use to test the upload processing system. The files are labeled with the terms “Success” and “Fail” to inform you whether they should upload successfully or produce an error when processing in the new upload system. We recommend testing both types of documents to see how your integration handles both successful and failed uploads.

Testing successful documents

If errors are experienced when uploading files that should be successful it may be due to how quickly your integration requests the "document.get" method. The new upload system accepts documents in the process much quicker than before but the text extraction does not occur within the "document.add" method as it used to. Polling document.get for the "is_pending" to be 0 will help avoid the issue of expecting the “parts” element to be returned when document.get is requested too soon after "document.add" is successful.

Testing documents that produce errors

Many document errors will now be encountered in the document.get method due to the system extracting the text after "document.add" is successful. The previous system had a much more extensive check within the "document.add" method that slowed our systems down considerably, which is why we have decoupled this check from the document.add method and now run that check within the "document.get" method. Errors will be returned when is_pending = 0.

About These Changes

We have been working to improve the Document Upload Process for iThenticate’s users. We are pleased to be able to provide our API customers with a Beta testing period that ends in January 2014. In January 2014 all of our API integrators will be on the new upload processing system. The following FAQs provide more information about this upgrade.

What is the new upload processing system?

A number of improvements to the iThenticate Upload Process have been completed. The improvements include faster document processing, improved system responsiveness and reliability, and more detailed error reporting. No changes are required for API customers at this time, however, we encourage API customers to beta test as described below as soon as possible.

Why we are making this change?

By upgrading our servers and including a new Upload Processing System the service’s speed has increased substantially. Furthermore, the new processing system is required for our next product release when we will make available the Document Viewer (DV) within the Similarity Report. The DV allows users to view the uploaded document in its original format, including images, tables and graphs within the Similarity Report.

Here is an example of what the DV will look like in iThenticate:

How will the iThenticate API change?

The iThenticate upload process has always been asynchronous. Documents are uploaded via the XMLRPC “document.add” method, then the “document.get” method is polled to determine when the document is no longer pending indicating that a report has been generated and may be viewed. The changes involve more background processing and therefore the “document.add” method will return more quickly.

Will we have to update our code?

If your API integration watches the “is_pending” flag included in the “document.get” XMLRPC response you may not require any changes to your code. If your integration expects to see a “parts” element included in the “document.get” response you may need to update your code. Because the “document.add” method will return faster the “parts” element may not exist when “document.get” is called quickly after the “document.add” method.

In addition, since more processing happens after the “document.add” call has returned, errors that might have been reported during the “document.add” method may now be reported during the “document.get” call.