Show/Hide Toolbars

Therefore Web API Programming Interface 1.0

Therefore Web API > Developers Guide > Documents

Retrieving Files from Documents

Scroll

A Thereforeā„¢ document may contain one or more files/documents (e.g. 2 word documents, a PDF file, and 3 JPEG images), usually belonging to the same topic or task. These individual documents in a Thereforeā„¢ document are often referred to as streams. The

 

Here is a comparison table of options available in Therefore Web API for file retrieval.

 

Thereforeā„¢ Web API

Method Name

HTTP

verb

JSON/REST Streamed Endpoint

(restun-streamed)

JSON/REST Endpoints

(restun, restwin)

SOAP Endpoints

(soapun, soapwin)

Transfer-Encoding: chunked

Binary Payload

Base64 String

JSON Byte Array

Binary Payload

Base64 String

JSON Byte Array

GetDocument

POST

-

-

YES,

see IsStreamDataBase64JSONNeeded param in request

YES

-

YES

-

GetDocumentStream

POST

-

-

-

(only in XML encoded response message,

use Accept: application/xml)

YES

-

YES

-

GetConvertedDocStreams

POST

-

-

YES,

see IsStreamDataBase64JSONNeeded param in request

YES

-

YES

-

GetThumbnail

POST

-

-

-

(only in XML encoded response message,

use Accept: application/xml)

YES

 

YES

-

GetDocumentStreamRaw

POST

YES

YES

-

-

YES

-

-

GetConvertedDocStreamsRaw

POST

YES

YES

-

-

YES

-

-

GetDocumentStreamFile

GET

YES

YES

-

-

YES

-

-

 

 

Retrieving Files with a streaming data transfer (Transfer-Encoding: chunked)

There is a streamed endpoint available in Thereforeā„¢ Web API. The difference from the other endpoints is that the methods of this endpoint return the HTTP response as Transfer-Encoding: chunked.

Methods on the streamed (chunked) endpoint consumes less memory on both server and client side. Use it to retrieve big or huge files. It provides similar performance compared to retrieving files as binary data.

 

info

Read more (with an example) in the Endpoints\Streamed JSON endpoint to download big files section of Thereforeā„¢ Web API Documentation.

 

 

Retrieving Files as Binary Data

To retrieve a file as binary data use GetDocumentStreamRaw, GetConvertedDocStreamsRaw (HTTP POST) or GetDocumentStreamFile (HTTP GET) Web API methods instead of GetDocumentStream (HTTP POST, Base64 or byte array). It is the recommended way because these three methods provide better performance than GetDocumentStream or GetDocument (when used for stream retrieval) and smaller message size. It provides similar performance compared to the streaming data transfer option.

 

Here is an example of calling GetConvertedDocStreamsRaw medthod in Postman. Binary file content is returned in the body of the response.

Notice two headers in the response: Content-Type and Content-Disposition.

 

 WebAPI.RetrieveFiles.GetConvertedDocStreamRaw.RequestZip

 

 

Retrieving Files as a Base64 string (XML, JSON) or a byte array (JSON)

 

To retrieve streams from a Therefore document you can use the GetDocument Web API method (with the parameter IsStreamsInfoAndDataNeeded set to true), GetDocumentStream, or GetConvertedDocStreams. The GetThumbnail method returns thumbnail of a document.

 

Data returned in the StreamData property is encoded as a byte array in JSON responses and as a Base64 string in XML responses.

Use the following HTTP header to get back response as XML from JSON Endpoints (restun, restwin):

Accept: application/xml

 

Retrieving files as a byte array in JSON

Set the IsStreamDataBase64JSONNeeded parameter to False (or omit this parameter) and get file content as a byte array in the StreamData property of the response.

 

WebAPI.RetrieveFiles.GetDocument.ByteArray.Request

 

Retrieving files as a Base64 string in JSON

Set the IsStreamDataBase64JSONNeeded parameter to True and get Base64 encoded file content in the StreamDataBase64JSON property of the response.

 

WebAPI.RetrieveFiles.GetDocument.Base64.Request

 

C# Code Example. Using GetDocument Web API method

 

Step-by-Step guide illustrates how to extract one or all of these streams from a document via the Web API.

 

 

 

 

 // 1. Create a channel factory to the web service endpoint

 ...

 ChannelFactory<IThereforeServicechannelFactory = new ChannelFactory<IThereforeService>(bindingendpoint);

 

 // 2. Create a channel to the web service endpoint

 IThereforeService service = channelFactory.CreateChannel();

 

 // 3. Create request parameters

 GetDocumentParams parameters = new GetDocumentParams();

 parameters.DocNo = = 123; // TODO: specify document number here

 parameters.IsStreamsInfoAndDataNeeded = true;

 

 // 4. Retrieve the document from the server

 GetDocumentResponse response = service.GetDocument(parameters);

 

 // 5. Close the channel and channel factory.

 ((IClientChannel)service).Close();

 channelFactory.Close();

 

 // 6. Extract all file streams to the specified directory

 string extractDir = "D:\\temp\\";

 string output = string.Empty;

 foreach (var streamInfo in response.StreamsInfo)

 {

         string extractFileName = Path.Combine(extractDirstreamInfo.FileName);

         File.WriteAllBytes(extractFileNamestreamInfo.StreamData);

         output += String.Format("Document stream extracted to {0}{1}"extractFileNameEnvironment.NewLine);

 }

Ā© 2023 Therefore Corporation, all rights reserved.