Documents

Translate documents

Translate documents text and/or title to the desired language.

POST
https://api.textreveal.com/api/2.0/documents/translate/batch

This route allow its users to translate all documents except the text of premium ones. For premium news, users still can translate the title.

See the Language Support page for more information

Request

Request Body

  • documents*object[]

    List of documents to translate

  • fields(string (enum))[]

    Fields to translate

    Default: ["title", "text"]Values: "title", "text"
  • languagestring

    The language in which you want the document to be translated

    Default: "english"Example: "italian"
Request
{
  "documents": [
    {
      "extracted": "2022-12-30T22:59:57.502Z",
      "id": "c34ac671a1b0b80078f9acd7e80217e28e8c554e14e1de707fb4370e52299add"
    },
    {
      "extracted": "2022-12-28T13:15:06.644Z",
      "id": "0d285bcb024438d022c91d75556c4159786e72b7f3b4b3a22562ca9d1dbabb4a"
    }
  ],
  "fields": [
    "title",
    "text"
  ],
  "language": "french"
}

Response

Response - 200

An identifier of the analysis to retrieve results

  • errorobject | object
  • extracted*date-time

    The document's extraction date.

    Example: "2022-11-25T14:31:10.834Z"
  • id*string

    The document identifier.

    Example: "c34ac671a1b0b80078f9acd7e80217e28e8c554e14e1de707fb4370e52299add"
  • partialboolean

    Whether the document has been partially translated or not.

    Example: true
  • textstring

    The document's text translated. Undefined if 'text' is not in the requested fields.

    Example: "A translated text"
  • titlestring

    The document's title translated. Undefined if 'title' is not in the requested fields.

    Example: "A translated title"
Response
[
  {
    "extracted": "2022-12-30T22:59:57.502Z",
    "id": "c34ac671a1b0b80078f9acd7e80217e28e8c554e14e1de707fb4370e52299add",
    "text": "the document's text translated",
    "title": "the document's title translated"
  },
  {
    "extracted": "2022-12-28T13:15:06.644Z",
    "id": "0d285bcb024438d022c91d75556c4159786e72b7f3b4b3a22562ca9d1dbabb4a",
    "partial": true,
    "text": "the document's text translated",
    "title": "the document's title translated"
  }
]

Examples

With the POST /analyze/download endpoint

In this example, we'll use the POST /analyze/download endpoint to retrieve the top 500 documents of our analyze. And then we'll use the GET /documents endpoint to retrieve a summary for each of them.

example.py
import json
import requests
# Functions found in the section "Quick start" under "Getting started"
from connect_v2 import read_config, get_token
 
config = read_config()
host = config['api']['host']
token = get_token(config)
 
instance_id = 'INSTANCE_ID' # Replace with your instance id
headers = {
    'Content-Type': 'application/json',
    'Authorization': f'Bearer {token}'
}
 
# Download the top 500 documents
payload = json.dumps({
    'instance': instance_id,
    'limit': 500, # The documents route is limited to 500 documents
    'sort': {
        'field': 'document_positive',
        'order': 'DESC'
    },
    'fields': ['title', 'extract_date'] # id is automatically added
})
 
endpoint = f'{host}/api/2.0/analyze/download'
response = requests.post(endpoint, headers=headers, data=payload)
lines = [json.loads(line) for line in response.text.splitlines()]
 
# Now we can use the id and extract_date to generate a summary
payload = json.dumps(
    {
        "fields": ["summary"],
        "documents": list(
            map(lambda x: {"id": x["id"], "extracted": x["extract_date"]}, lines)
        ),
    }
)
 
endpoint = f"{host}/api/2.0/documents"
response = requests.post(endpoint, headers=headers, data=payload)
documents_with_summaries = response.json()
 
# Now join the documents with their summaries
documents = {}
for line in lines:
    documents[line["id"]] = line
for document in documents_with_summaries:
    documents[document["id"]]["summary"] = document.get("summary")
 
# We now have a dictionary with the documents and their summaries
print(json.dumps(documents, indent=2))

Error handling

We raise error an error in the following cases:

  • The user’s company is out of quota:
    • Each company can translate up to 10.000 documents. This quota is defined in the company license.
    • Each document count as 1 translation.
    • We return a 403 error with all information in headers in this case.
  • A document is not found or his language is not supported:
    • This document is still counted in the quota.
    • Other documents are not impacted.
  • An error occurred when translating a field:
    • Other fields are not impacted.
    • An error containing helpful information is added in the document response.
example response with errors
[
  {
    "id": "97563b96-eeb7-492f-b12c-fbfa03927d6d",
    "extracted": "2023-01-13T14:51:57.740Z",
    "error": {
      "message": "Document 97563b96-eeb7-492f-b12c-fbfa03927d6d not found on date 2023-01-13T14:51:57.740Z",
      "statusCode": 404
    }
  },
  {
    "id": "6ab99392-b8cb-4533-9c28-01718a2360fe",
    "extracted": "2023-06-27T13:49:14.740Z",
    "text": "the document's text translated",
    "error": {
      "field": {
        "title": {
          "message": "Failed to translate title",
          "statusCode": 500
        }
      }
    }
  }
]

Partial translation

When a document's text or title reaches a length of 10,000 bytes, only the first 10,000 bytes are translated.

  • This limit is lowered depending on the content of the document, to avoid sentence breaks.
  • If a sentence reaches the 10,000 bytes limit, it is ignored.
  • If there is no sentence with less than 10.000 bytes, an error is thrown.

When a partial translation is done, the returned document will contain "partial": true