Translate documents
Translate documents text and/or title to the desired language.
This route allow its users to translate all documents except the text of premium ones. For premium news, users still can translate the title.
See the Language Support page for more information
Request
{
"documents": [
{
"extracted": "2022-12-30T22:59:57.502Z",
"id": "c34ac671a1b0b80078f9acd7e80217e28e8c554e14e1de707fb4370e52299add"
},
{
"extracted": "2022-12-28T13:15:06.644Z",
"id": "0d285bcb024438d022c91d75556c4159786e72b7f3b4b3a22562ca9d1dbabb4a"
}
],
"fields": [
"title",
"text"
],
"language": "french"
}
Response
Response - 200
An identifier of the analysis to retrieve results
error
object | objectextracted*
date-timeThe document's extraction date.
Example: "2022-11-25T14:31:10.834Z"id*
stringThe document identifier.
Example: "c34ac671a1b0b80078f9acd7e80217e28e8c554e14e1de707fb4370e52299add"partial
booleanWhether the document has been partially translated or not.
Example: truetext
stringThe document's text translated. Undefined if 'text' is not in the requested fields.
Example: "A translated text"title
stringThe document's title translated. Undefined if 'title' is not in the requested fields.
Example: "A translated title"
[
{
"extracted": "2022-12-30T22:59:57.502Z",
"id": "c34ac671a1b0b80078f9acd7e80217e28e8c554e14e1de707fb4370e52299add",
"text": "the document's text translated",
"title": "the document's title translated"
},
{
"extracted": "2022-12-28T13:15:06.644Z",
"id": "0d285bcb024438d022c91d75556c4159786e72b7f3b4b3a22562ca9d1dbabb4a",
"partial": true,
"text": "the document's text translated",
"title": "the document's title translated"
}
]
Examples
With the POST /analyze/download
endpoint
In this example, we'll use the POST /analyze/download
endpoint to retrieve the top 500 documents of our analyze.
And then we'll use the GET /documents
endpoint to retrieve a summary for each of them.
import json
import requests
# Functions found in the section "Quick start" under "Getting started"
from connect_v2 import read_config, get_token
config = read_config()
host = config['api']['host']
token = get_token(config)
instance_id = 'INSTANCE_ID' # Replace with your instance id
headers = {
'Content-Type': 'application/json',
'Authorization': f'Bearer {token}'
}
# Download the top 500 documents
payload = json.dumps({
'instance': instance_id,
'limit': 500, # The documents route is limited to 500 documents
'sort': {
'field': 'document_positive',
'order': 'DESC'
},
'fields': ['title', 'extract_date'] # id is automatically added
})
endpoint = f'{host}/api/2.0/analyze/download'
response = requests.post(endpoint, headers=headers, data=payload)
lines = [json.loads(line) for line in response.text.splitlines()]
# Now we can use the id and extract_date to generate a summary
payload = json.dumps(
{
"fields": ["summary"],
"documents": list(
map(lambda x: {"id": x["id"], "extracted": x["extract_date"]}, lines)
),
}
)
endpoint = f"{host}/api/2.0/documents"
response = requests.post(endpoint, headers=headers, data=payload)
documents_with_summaries = response.json()
# Now join the documents with their summaries
documents = {}
for line in lines:
documents[line["id"]] = line
for document in documents_with_summaries:
documents[document["id"]]["summary"] = document.get("summary")
# We now have a dictionary with the documents and their summaries
print(json.dumps(documents, indent=2))
Error handling
We raise error an error in the following cases:
- The user’s company is out of quota:
- Each company can translate up to 10.000 documents. This quota is defined in the company license.
- Each document count as 1 translation.
- We return a 403 error with all information in headers in this case.
- A document is not found or his language is not supported:
- This document is still counted in the quota.
- Other documents are not impacted.
- An error occurred when translating a field:
- Other fields are not impacted.
- An
error
containing helpful information is added in the document response.
[
{
"id": "97563b96-eeb7-492f-b12c-fbfa03927d6d",
"extracted": "2023-01-13T14:51:57.740Z",
"error": {
"message": "Document 97563b96-eeb7-492f-b12c-fbfa03927d6d not found on date 2023-01-13T14:51:57.740Z",
"statusCode": 404
}
},
{
"id": "6ab99392-b8cb-4533-9c28-01718a2360fe",
"extracted": "2023-06-27T13:49:14.740Z",
"text": "the document's text translated",
"error": {
"field": {
"title": {
"message": "Failed to translate title",
"statusCode": 500
}
}
}
}
]
Partial translation
When a document's text or title reaches a length of 10,000 bytes, only the first 10,000 bytes are translated.
- This limit is lowered depending on the content of the document, to avoid sentence breaks.
- If a sentence reaches the 10,000 bytes limit, it is ignored.
- If there is no sentence with less than 10.000 bytes, an error is thrown.
When a partial translation is done, the returned document will contain "partial": true