Guide

Analyze

Data Dictionary

This page reports on input and output data available on TextReveal® analyze resource. It presents data columns, their descriptions and their coverage as well

POST `/analyze/dataset`

This route allows a client to launch a query to analyse data relevant to a list of entities

Parameters

Body parameters

name	type	description	scope	required
`entities`	list[dict]	List of entities to be requested	global	Yes
`entity_of_interest`	string	Unique id for the entity of interest	entity	Yes
`keywords`	list	List of keywords to search. All keywords with a length strictly lower than 3 characters are filtered out except for `Japanese`, `Chinese` and `Korean` languages.	entity	Yes
`concepts`	dict[str, list[str]]	List of concepts or risks that are to be analyzed. Each individual concept is defined by its own list of keywords. Punctuation is not handled in the concept labels. Each concept label must be unique (case insensitive)	global	No
`concepts_filter`	dict[str, list[str]]	Same as `concepts` but filters out documents that does not contain the concepts. Note: You can either use `concepts` or `concepts_filter`	global	No
`sentiments_filter`	dict[str, dict[str, int]]	Partial object containing a min/max values for each sentiments. The end analysis will contains documents that match these filters. Allowed keys: `positive`, `negative`, `neutral`, `polarity` Note: This can be compared to a filter on: `document_{sentiment}.mean` key for `positive`, `negative` and `neutral` `document_{sentiment}` key for `polarity`	global	No
`sites_excludes`	list	List of website to exclude from search. N.B: Use the base domain of the websites.	global	No
`min_match`	int	The message must contain at least `min_match` keywords. When used, each entity must have at least `min_match` keywords. Example: `keywords: ["apple", "iphone", "macbook"]` `min_match: 2` Behavior = TextReveal® API will only keep the document if and only if at least 2 elements from the keywords list appear in the document.	global	No
`min_repeat`	int	The message must contain at least `min_repeat` occurrence of a keyword. Example: `keywords: ["apple", "iphone"]` `min_repeat: 2` Behavior = TextReveal® API will only keep the document if and only if it contains at least 2 occurrences of either `apple` or `iphone`.	global	No
`start_date`	date	`format: (YYYY-MM-DD)`	global	Yes
`end_date`	date	`format: (YYYY-MM-DD)`	global	Yes
`site_type`	list	Type of sites to search (field `thread.site_type`) Available options are : news blogs discussions licensed_news premium_news Default value is : blogs, news and discussions	global	No
`languages`	list[str]	List of languages to search, see Language Support page for more information. Note: We do not recommend using multiple values. Default value is : ['english']	global	No
`countries`	list[str]	List of countries to search (field `thread.country`). N.B: Use alpha-2 format.	global	No
`sites`	list	List of websites to search. N.B: Use the base domain of the websites.	global	No
`co_mentions`	list	List of keywords to search with the keywords list. Works like a boolean `AND`. Example: `keywords : ["TotalEnergy"]` `co_mentions: ["gas", "oil price"]` Behavior = TextReveal® API will look for documents relevant to at least one of the co_mentions. For the above example, below are the different cases of relevancy: `TotalEnergy` and `gas` `TotalEnergy` and `oil price` `TotalEnergy` and `oil price` and `gas` N.B: Search of `co_mentions` is operated in full-text and is case insensitive.	global	No
`keywords_exclude`	list	List of keywords to exclude from the search. Works like a boolean `AND NOT`. Example: `keywords: ["apple", "iphone"]` `keywords_exclude: ["Steve Jobs", "Tim Cook"]` Behavior: TextReveal® API will look for documents relevant to `apple` the company or `Iphone` but NOT containing either `Steve Jobs` or `Tim Cook`. N.B: Search of `keywords_exclude` is operated in full-text and is case insensitive.	global	No
`qscore`	float	Quality threshold to filter out unreadable data. The default value is 50. No filtering is applied if the `quality-score` worker is not provided.	global	No
`neg_keywords`	list	List of keywords not used for search but for named entity resolution or annotation task. Detailed explanation: Using generic keywords or high cardinality keywords can bring huge volume of data to process or reduce quality of the data extracted. `neg_keywords` parameter allows you to add such keywords so that you can use them to annotate sentence containing them within documents already containing less generic keywords. Example: `keywords: ['Microsoft']` (Used for search in datalake) `neg_keywords: ['MSFT']` (Not used for search in datalake) Behavior = Textreveal® api will look for documents containing only microsoft, then after, it will annotate every sentence mentionning `MSFT` or `Microsoft`	entity	No
`workers`	list	Workflow steps definition	global	Yes
`context`	string	Context description of the entity	entity	Yes
`similarity_threshold`	float	Similarity score threshold for recognized or matched entities. Filters out documents containing entities with a similarity score lesser than the threshold.	global	No
`search_in`	list[string]	Allows to define if the documents extraction has to be done by searching entity keywords in the title and/or in the text. Example: `search_in: ["title", "text"]` Note: This parameter is only applied on the keywords of entity, not on `keywords_exclude`, `co_mentions`, `neg_keywords`, `min_repeat`, `min_match`. Not available with `ner-linking` worker.	global	No

Response

name	type	description
`instance_id`	string	The unique identifier of the analysis

POST `/analyze/tql`

This route allows a client to launch an analysis for data relevant to a list of entities using Textreveal query language (TQL).

The TextReveal Query Language (TQL) is a simple text-based query language for filtering data. It is composed with a field on which a value is applied: site_type: "news". Each filter can be combined to create a boolean expression with AND, OR and NOT operators.
Example: (text:"Apple TV" OR title:"Steve Jobs") AND NOT text:"apple tree"

Unlike the dataset route, the TQL route requests all types of sites. The news site type groups news, premium_news and licensed_news. Moreover, the workers are implicitly enabled if the associated parameter is used. For example, the quality score worker will be enabled if the qscore parameter is used.

Parameters

Body parameters

name	type	description	scope	required
`entities`	list[dict]	List of entities to be requested	global	Yes
`entity_of_interest`	string	Unique id for the entity of interest	entity	Yes
`context`	string	Context description of the entity. The context is mandatory if you use the `similarity_threshold` parameter.	entity	No
`query`	string	A TQL query to define the entity of interest. The TQL query will be used for the data extraction. Accepted fields are : `country` `ner` `site` `site_type` `text` `title` Specific values for `site_type`: `news` `blogs` `discussions` Example: `((title:"1&1" AND text:"1&1 DRILLISCH") OR (title:"DRILLISCH" AND text:"1&1 DRILLISCH") AND (ner:"1&1 DRILLISCH"))`	entity	Yes
`annotate_keywords`	list	List of keywords not used for search but for named entity resolution or annotation task.	entity	Yes
`concepts`	dict[str, list[str]]	List of concepts or risks that are to be analyzed. Each individual concept is defined by its own list of keywords. Punctuation is not handled in the concept labels. Each concept label must be unique (case insensitive)	global	No
`concepts_filter`	dict[str, list[str]]	Same as `concepts` but filters out documents that does not contain the concepts. Note: You can either use `concepts` or `concepts_filter`	global	No
`sentiments_filter`	dict[str, dict[str, int]]	Partial object containing a min/max values for each sentiments. The end analysis will contains documents that match these filters. Allowed keys: `positive`, `negative`, `neutral`, `polarity` Note: This can be compared to a filter on: `document_{sentiment}.mean` key for `positive`, `negative` and `neutral` `document_{sentiment}` key for `polarity`	global	No
`min_match`	int	The message must contain at least `min_match` annotate keywords. When used, each entity must have at least `min_match` keywords. Example: `annotate_keywords: ["apple", "iphone", "macbook"]` `min_match: 2` Behavior = TextReveal® API will only keep the document if and only if at least 2 elements from the keywords list appear in the document.	global	No
`min_repeat`	int	The message must contain at least `min_repeat` occurrence of an annotate keyword. Example: `annotate_keywords: ["apple", "iphone"]` `min_repeat: 2` Behavior = TextReveal® API will only keep the document if and only if it contains at least 2 occurrences of either `apple` or `iphone`.	global	No
`start_date`	date	`format: (YYYY-MM-DD)`	global	Yes
`end_date`	date	`format: (YYYY-MM-DD)`	global	Yes
`language`	string	Language to search, see Language Support page for more information. Default value is : english	global	No
`qscore`	float	Quality threshold to filter out unreadable data. No filtering is applied if the `qscore` parameter is not provided.	global	No
`similarity_threshold`	float	Similarity score threshold for recognized or matched entities. Filters out documents containing entities with a similarity score lesser than the threshold.	global	No

Response

name	type	description
`instance_id`	string	The unique identifier of the analysis

POST `/analyze/download`

This route allows a client to preview result of an analysis previously run

Parameters

Body parameters

name	type	description	required
`instance`	string	Id of the instance you want to retrieve the textual data	Yes
`limit`	number\|dict	When `limit` is a number (e.g., `100`): Specifies the total maximum number of documents to retrieve, sorted as defined in the `sort` parameter. When `limit` is a dictionary (e.g., `{"by": "entity", "value": 3}`): Indicates a per-object limit. For each distinct object (in the example, each `"entity"`), the system will return up to `{value}` documents. The actual total can therefore exceed `{value}`, depending on how many entities (or other objects) are present. Example: If there are 10 distinct entities and `limit:{"by": "entity", "value": 3}`, you may receive up to 30 total documents (3 per entity). Important: Certain fields (like `id`, `title`, `sentences`, `url` or `thread`) are only returned if the total number of documents (the “computed limit”) is ≤ 2000.	Yes
`date`	string	Filter the documents on a given date. Use `%Y-%m-%d` format. The date must be included in the date range of the analysis.	No
`entity`	string	Filter the documents on a given entity. The entity must be an entity of interest of the analysis.	No*
`concept`	string	Filter the documents on a given concept. The concept must be present in the analysis.	No
`sort`	dict	Sort the documents in ascending or descending order given a field	no
`sort`>`field`	string	The field to sort the documents. Available fields are: `document_negative` `document_neutral` `document_positive` `document_polarity` `document_entity_polarity` `document_entity_positive` `document_entity_neutral` `document_entity_negative`	yes
`sort`>`order`	string	The order of the sorting. Available values are `ASC` `DESC`	yes
`fields`	list[str]	Collect only the fields you need. By default, all fields except `summary` are returned. `id` field is always returned. Available keys for the `fields` parameter are: `concepts` `document_entity_negative` `document_entity_neutral` `document_entity_polarity` `document_entity_positive` `document_negative` `document_neutral` `document_polarity` `document_positive` `entities` `extract_date` `id` `language` `mentions` `qscore` `sentences` `thread` `title` `url` `summary`	No

If you use the sort parameter, the date parameter can become mandatory if your analysis has generated a certain amount of results (2 500 000 documents).
When using the sort parameter with a field that has aggregation functions (e.g, min, max, median, mean), we will use the mean value.
When using one of the entity match field (document_entity_polarity, document_entity_positive, document_entity_neutral, document_entity_negative) in the sort parameter, the entity parameter is mandatory.
premium_news text cannot be retrieved. Each sentence is replaced by this placeholder The download of licensed text is not allowed.
The summary field is an experimental feature, we recommend to use the /documents route with the document id, as seen in the example page here

Response

name	type	parent	description
`extract_date`	datetime		Corresponds to the date of extraction of the article. (YYYY-MM-DD)
`language`	string		Language of the article
`thread`	dict		Parent key for `country`, `site`, `site_type` and `title`
`country`	string	thread	2 letter ISO country code
`site`	string	thread	Site of the article
`site_type`	string	thread	Site type of the article Available options are : news blogs discussions licensed_news premium_news
`title`	string	thread	Title of the thread mapped from sentences of type 2 (If no sentences, the title will be an empty string)
`url`	string		Url of the article
`id`	string		id of the article
`title`	string		Title of the document mapped from sentences of type 1 (If no sentences, the title will be an empty string)
`sentences`	list[dict]		List of sentences with their match and indicators when available
`text`	string	sentences	Text of the sentence
~~`entities`~~	~~list[str]~~	~~sentences~~	~~(Deprecated*) List containing the labels of the matched entities in the sentence~~
`sentence_id`	int	sentences	Id of the sentence
`type`	int	sentences	Type of the sentence: 0 - text 1 - title 2 - thread.title
`matches`	list[dict]	sentences	List of matched keywords or entities
`results`	dict	sentences	List of indicators: sentiments (optional) ~~(Deprecated*) emotions (optional)~~
`negative`	float	results	Negative sentiment probability
`positive`	float	results	Positive sentiment probability
`neutral`	float	results	Neutral sentiment probability
`polarity`	float	results	The aggregated of positive and negative sentiment scores at the sentence level. Formula: `(positive_sentiment - negative_sentiment) / (positive_sentiment + negative_sentiment)`. Range: `-1 to 1`
`polarity_exp`	float	results	The aggregated score, at the sentence level, of the difference between negative and positive sentiment scores processed into a sigmoid in order to smooth outliers. Formula: `1 / (1+e^(-(positive_sentiment-negative_sentiment)))` Range: `0 to 1`
`document_entity_polarity`	dict		Evaluate the sentiment level towards the entity of interest in all sentences mentioning the entity in a given document. Formula: Select all sentences where there is an entity match in the document Average the positive and negative scores of the selected sentences Compute the formula `(avg_positive_sentiment - avg_negative_sentiment) / (avg_positive_sentiment + avg_negative_sentiment)` Range: `-1 to 1`
`document_entity_positive`	dict		Evaluate the level of positive sentiment towards an entity of interest in all sentences mentioning the entity in a given document. Formula: For each entity of interest matched in the document: Select all the sentences where there is an entity match in the document Aggregate the sentences' positive score with each aggregation function (min, max, mean and median) Range: `0 to 1`
`document_entity_neutral`	dict		Evaluate the level of neutral sentiment towards an entity of interest in all sentences mentioning the entity in a given document. Formula: For each entity of interest matched in the document: Select all the sentences where there is an entity match in the document Aggregate the sentences' neutral score with each aggregation function (min, max, mean and median) Range: `0 to 1`
`document_entity_negative`	dict		Evaluate the level of negative sentiment towards an entity of interest in all sentences mentioning the entity in a given document. Formula: For each entity of interest matched in the document: Select all the sentences where there is an entity match in the document Aggregate the sentences' negative score with each aggregation function (min, max, mean and median) Range: `0 to 1`
`document_{sentiment}`	dict		Evaluate the desired sentiment¹ of a document Formula: Select all sentences in the document Aggregate the sentences `{sentiment}` score with each aggregation function (min, max, mean and median)
`document_polarity`	float		Evaluate the sentiment level in all sentences in a given document. Formula: Select all sentences in the document Average the positive and negative scores of the selected sentences Compute the formula `(avg_positive_sentiment - avg_negative_sentiment) / (avg_positive_sentiment + avg_negative_sentiment)` Range: `-1 to 1`
`nb_sentences`	int		Number of sentences composing the article
`text`	string	matches	mention of the keyword or entity in the sentence
`entity`	dict	matches	Identifier of the entity
`count`	dict	matches	Prevalence of keywords for the matched concept.
`similarity`	float	matches	Cosine similarity score between the sentence and the context of the entity. Ranges between [0,1]
`qscore`	float		Readability score of the document. This score is calculated on some kpis such as the average length of sentences within the document or the ratio of non alphanumerical character within the document.
`concepts`	dict		Sum of occurrences of the keywords related to a given concept in each sentence. Available with the concept worker
`mentions`	dict		Sum of occurrences of the keywords related to a given mention in each sentence. Available with the raw-matcher worker
`entities`	dict		Sum of occurrences of the keywords related to a given entity in each sentence. Available with the ner-linking worker
`summary`	string		Summary in english of the document's text

It is possible to have exceptionally empty summaries for some texts that the model is not able to handle.

Available sentiment classes: positive , neutral , negative

*Deprecated: The field is deprecated and will be removed in future releases. Please consider updating your code as soon as possible.

POST `/analyze/status`

This route allows a client to get the status of a previously run analysis.

Parameters

Body parameters

name	type	description	required
`instance`	string	The identifier of an analysis. This identifier have to be used to get results.	Yes

Response

name	type	description
~~`count`~~	~~number~~	~~(Deprecated*) Total number of articles.~~
~~`filtered`~~	~~number~~	~~(Deprecated*) Number of filtered texts.~~
~~`globalSpeed`~~	~~number~~	~~(Deprecated*) Analysis global speed.~~
`handled`	number	Number of documents in the analysis result set.
~~`lastErrorMessage`~~	~~string~~	~~(Deprecated*) If analysis fails, the last error message that has been raised.~~
`startedAt`	date	The time when the analysis started.
`status`	string	The current status of the analysis. One of : `pending` `starting` `running` `failed` `stopped` `completed`
`updatedAt`	date	The last time the analysis was updated.

Pending: Your analysis is queued. The limit for concurrent analyses is reached and your analysis will start as soon as another already-running analysis finishes. See the limitation page for more information.

Starting: Your analysis is starting. Necessary resources are being gathered in order to run it.

*Deprecated: The field is deprecated and will be removed in future releases. Please consider updating your code as soon as possible.

POST `/analyze/{id}/timeseries`

This route allows a client to run an HTTP request in order to start the computing of a Timeseries.

Parameters

Path parameters

name	type	description
`id`	string	The analysis id. This id is the response of `analyze/dataset` route. The analysis must be completed.

Body parameters

name	type	description	required
`operands`	list[string]	The operators that will be used for aggregation. Must be a list composed of one or more of: `min` lowest value observed for the class on the defined period `max` highest value observed for the class on the defined period `median` middle value observed for the class on the defined period `mean` average value observed for the class on the defined period	No
`output_format`	string	The output format of the final result. Must be one of: `json` `csv` Currently, when using `json` as the output_format, the concept names used will be returned in lowercase, while the `csv` format maintains the original case to the `output_format`	No
`pivots`	list[string]	The pivots that will be used for aggregation (Additional to the date and entity). Must be a list composed of one or more of: `extract_day` `language` `entity` `site` `site_type` `country`	No
`time_granularity`	string	Aggregation granularity period. Must be one of: `day` `hour` `minute`	No
`volume_only`	boolean	Aggregation mode. Set to true to display only volumes	No

As you can notice, the output format is chosen when launching a timeseries and not when downloading it. This means you need to run a new timeseries in order to change the output format.

Response

name	type	description
`hash`	integer	The hash of launched analysis

The table above only show the successful HTTP API Response (Status code = 200). You can expect multiple responses and status codes. Please see here for more information.

GET `/analyze/{id}/timeseries/{hash}/status`

This route allows a client to retrieve the timeseries status of a given instance using its hash.

Parameters

Path parameters

name	type	description
`id`	string	The analysis id. The analysis must be completed. (format: `uuid`)
`hash`	string	The timeseries hash.

Response

name	type	description
`status`	string	The status of the timeseries. One of : `running` `failed` `stopped` `completed`

GET `/analyze/{id}/timeseries/{hash}/download`

This route allows a client to download the timeseries results of a given instance using its hash.

Parameters

Path parameters

name	type	description
`id`	string	The analysis id. The analysis must be completed. (format: `uuid`)
`hash`	string	The timeseries hash.

Response

name	type	description
`{concept_label}_score`	float	The percentage of documents containing at least one keyword related to the concept. Formula: `Volume of documents co-mentionning the company and the concept / Volume of documents mentionning the company`
`entity`	string	The detected entity
`extract_day`	string	Extract day of the article Format: `date YYYY-MM-dd`
`extract_hour`	integer	Extract hour of the article
`extract_minute`	integer	Extract minute of the article
`language`	string	The language of the article
`{operator}_{sentiment_class}`	float	`{operator}`¹ aggregation sentiment score⁴ based on the`{sentiment_class}`² score⁴ of all the sentences of all the documents matching the entity of interest for the selected aggregation period
~~`{operator}_{emotion_class}`~~	float	(Deprecated*)`{operator}`¹ aggregation emotion score⁴ based on the`{emotion_class}`³ score⁴ of all the sentences of all the documents matching the entity of interest for the selected aggregation period
`entity_{operator}_{sentiment_class}`	float	`{operator}`¹ aggregation sentiment score⁴ based on the`{sentiment_class}`² score⁴ of the sentences matching the entity of interest for the selected aggregation period
~~`entity_{operator}_{emotion_class}`~~	float	~~(Deprecated*)`{operator}`¹ aggregation emotion score⁴ based on the`{emotion_class}`³ score⁴ of the sentences matching the entity of interest for the selected aggregation period~~
`volume_document`	integer	The volume of documents where the entity of interest is matched for the aggregation period
`volume_sentence`	integer	The volume of all sentences of all documents where the entity of interest is matched for the aggregation period
`entity_volume_sentence`	integer	The volume of sentences where the entity of interest is matched for the aggregation period.
`volume_document_{concept_label}`	integer	The volume of documents where the entity of interest AND the specified concept are matched for the aggregation period
`volume_sentence_{concept_label}`	integer	The volume of all sentences of all documents where the entity of interest AND the specified concept are matched for the aggregation period
`{concept_label}_sentiment_polarity`	integer	Average sentiment polarity of documents that match both the specified concept and the entity
`concepts_keywords_count`	dict[str, dict[str, int]]	Represents the count of keywords matched per concepts in the document for the aggregation period. More info on the timeseries indicators page

Available operators: min, max, median, mean

min: lowest value observed for the class on the defined period

max: highest value observed for the class on the defined period

median: middle value observed for the class on the defined period

mean: average value observed for the class on the defined period

Available sentiment classes: positive, neutral, negative

~~Available emotion classes: anger, anticipation, fear, joy, sadness, surprise, trust~~

Sentiment ~~and emotion~~ scores will be displayed using scientific notation, meaning that an exponent can appear at the end of the number

Spaces surrounding the concept label are removed in the result

GET `/analyze/{id}`

This route allows a client to retrieve the payload of an instance previously ran using its id.

Parameters

Path parameters

name	type	description
`id`	string	The analysis id. (format: `uuid`)

Response

name	type	description	scope
`entities`	list[dict]	List of entities to be requested	global
`entity_of_interest`	string	Unique id for the entity of interest	entity
`keywords`	list	List of keywords to search	entity
`sites_excludes`	list	List of website to exclude from search	global
`min_match`	int	The message must contain at least `min_match` keywords. Example: `keywords: ["apple", "iphone", "macbook"]` `min_match: 2` Behavior = TextReveal® API will only keep the document if and only if at least 2 elements from the keywords list appear in the document.	global
`min_repeat`	int	The message must contain at least `min_repeat` occurrence of a keyword. Example: `keywords: ["apple", "iphone"]` `min_repeat: 2` Behavior = TextReveal® API will only keep the document if and only if it contains at least 2 occurrences of either `apple` or `iphone`.	global
`start_date`	date	`format: (YYYY-MM-DD)`	global
`end_date`	date	`format: (YYYY-MM-DD)`	global
`site_type`	list	Type of sites to search (field `thread.site_type`) Available options are : news blogs discussions licensed_news premium_news	global
`languages`	list	List of languages to search	global
`countries`	list	List of countries to search (field `thread.country`)	global
`sites`	list	List of websites to search	global
`co_mentions`	list	List of keywords to search with the keywords list. Works like a boolean `AND`. Example: `keywords : ["TotalEnergy"]` `co_mentions: ["gas", "oil price"]` Behavior = TextReveal® API will look for documents relevant to at least one of the co_mentions. For the above example, below are the different cases of relevancy: `TotalEnergy` and `gas` `TotalEnergy` and `oil price` `TotalEnergy` and `oil price` and `gas` N.B: Search of `co_mentions` is operated in full-text and is case insensitive.	global
`keywords_exclude`	list	List of keywords to exclude from the search. Works like a boolean `AND NOT`. Example: `keywords: ["apple", "iphone"]` `keywords_exclude: ["Steve Jobs", "Tim Cook"]` Behavior: TextReveal® API will look for documents relevant to `apple` the company or `Iphone` but NOT containing either `Steve Jobs` or `Tim Cook`. N.B: Search of `keywords_exclude` is operated in full-text and is case insensitive.	global
`qscore`	float	Readability score of the document. This score is calculated on some kpis such as the average length of sentences within the document or the ratio of non alphanumerical character within the document.	global
`neg_keywords`	list	List of keywords not used for search but for named entity resolution or annotation task. Detailed explanation: Using generic keywords or high cardinality keywords can bring huge volume of data to process or reduce quality of the data extracted. `neg_keywords` parameter allows you to add such keywords so that you can use them to annotate sentence containing them within documents already containing less generic keywords. Example: `keywords: ['Microsoft']` (Used for search in datalake) `neg_keywords: ['MSFT']` (Not used for search in datalake) Behavior = Textreveal® api will look for documents containing only microsoft, then after, it will annotate every sentence mentionning `MSFT` or `Microsoft`	entity
`workers`	list	Workflow steps definition	global
`context`	string	Context description of the entity	entity
`precompute`	boolean	Whether to query offline data: True False	global
`similarity_threshold`	float	Similarity score threshold for recognized or matched entities. Filters out documents containing entities with a similarity score lesser than the threshold.

POST `/analyze/{id}/stop`

This route allows a client to stop an instance previously ran using its id.

Parameters

Path parameters

name	type	description
`id`	string	The analysis id. (format: `uuid`)

POST `/analyze/{id}/download`

Prepare the download of your instance. The result will be available in the analyze/{id}/download/{hash} route once the process is completed.

Parameters

Body parameters

name	type	description	required
`id`	uuid	The instance id	yes
`limit`	number \| dict	The number of documents to download. Or or dictionary specifying a limit per resource	no
`limit`>`by`	string	The resource to limit. Possible values: `entity`	yes
`limit`>`value`	number	The limit value	yes
`fields`	List[string]	Collect only the fields you need. By default, all fields except `summary` are returned. Available keys for the `fields` parameter are: `concepts` `document_entity_negative` `document_entity_neutral` `document_entity_polarity` `document_entity_positive` `document_negative` `document_neutral` `document_polarity` `document_positive` `entities` `extract_date` `id` `language` `mentions` `qscore` `sentences` `thread` `title` `url` `summary`	no
`date`	daterange	Extract only the documents published between the two dates.	no
`date`>`start`	date	The start date of the date range. The format is `YYYY-MM-DD`.	no
`date`>`end`	date	The end date of the date range. The format is `YYYY-MM-DD`.	no
`concepts`	List[string]	Extract only the documents that contain the concepts. Each concept must be present in the analysis.	no
`entities`	List[string]	Extract only the documents that contain the entities. Each entity must be present in the analysis.	no
`sort`	dict	Sort the documents in ascending or descending order given a field	no
`sort`>`field`	string	The field to sort the documents. Available fields are: `document_negative` `document_neutral` `document_positive` `document_polarity` `document_entity_polarity` `document_entity_positive` `document_entity_neutral` `document_entity_negative`	yes
`sort`>`order`	string	The order of the sorting. Available values are `ASC` `DESC`	yes

Path parameters

name	type	description
`id`	string	The analysis id. The analysis must be completed. (format: `uuid`)

Response

name	type	description
`hash`	integer	The hash of the download

GET `/analyze/{id}/download/{hash}/status`

This route allows a client to retrieve the download status of a given instance using its hash.

Parameters

Path parameters

name	type	description
`id`	string	The analysis id. The analysis must be completed. (format: `uuid`)
`hash`	string	The download hash.

Response

name	type	description
`status`	string	The status of the download. One of : `running` `failed` `stopped` `completed`

GET `/analyze/{id}/download/{hash}`

This route allows a client to download the results of a given instance using its hash.

Parameters

Path parameters

name	type	description
`id`	string	The analysis id. The analysis must be completed. (format: `uuid`)
`hash`	string	The download hash.

Response

Array of urls that you can use to retrieve the result of the download. Example:

[
  "https://files.textreveal.com/download/company=e8c8d3ba-4ca0-45d1-b4ba-c1b1f2364a12/instance=fabd78aa-5241-4842-8108-fd52ef805cde/download=03d8c58a31/output-0.parquet.gz",
  "https://files.textreveal.com/download/company=e8c8d3ba-4ca0-45d1-b4ba-c1b1f2364a12/instance=fabd78aa-5241-4842-8108-fd52ef805cde/download=03d8c58a31/output-1.parquet.gz"
]

POST `/analyze/timeserie` `deprecated`

Deprecated: The route is deprecated and will be removed in future releases. Please consider updating your code as soon as possible.

Parameters

Body parameters

Response

name	type	description
`extract_day`	date	Day of extraction of the article. (YYYY-MM-DD) Date : UTC+0
`extract_date`	datetime	Date of extraction of the article
`extract_hour`	times	Available when time series are aggregated by hour
`extract_minute`	times	Available when time series are aggregated by minute
`country`	string	Country of the site, determined automatically by the site language, IP and TLD
`entity`	string	Entity detected for the record
`id`	string	identifier of the document
`site_type`	string	Type of data source for document news blogs discussions licensed_news premium_news
`language`	string	language of the document
`site`	string	Website of the document
`url`	string	Url of the document
`volume_sentence`	int	Number of sentences
`volume_document`	int	Number of the documents
`mean_<indicator>`	float	Mean score calculated for the record
`max_<indicator>`	float	Max score calculated for the record
`min_<indicator>`	float	Min score calculated for the record
`median_<indicator>`	float	Median score calculated for the record

POST /analyze/dataset

Parameters

Body parameters

Response

POST /analyze/tql

Parameters

Body parameters

Response

POST /analyze/download

Parameters

Body parameters

Response

POST /analyze/status

Parameters

Body parameters

Response

POST /analyze/{id}/timeseries

Parameters

Path parameters

Body parameters

Response

GET /analyze/{id}/timeseries/{hash}/status

Parameters

Path parameters

Response

GET /analyze/{id}/timeseries/{hash}/download

Parameters

Path parameters

Response

GET /analyze/{id}

Parameters

Path parameters

Response

POST /analyze/{id}/stop

Parameters

Path parameters

POST /analyze/{id}/download

Parameters

Body parameters

Path parameters

Response

GET /analyze/{id}/download/{hash}/status

Parameters

Path parameters

Response

GET /analyze/{id}/download/{hash}

Parameters

Path parameters

Response

POST /analyze/timeserie deprecated

Parameters

Body parameters

Response

POST `/analyze/dataset`

POST `/analyze/tql`

POST `/analyze/download`

POST `/analyze/status`

POST `/analyze/{id}/timeseries`

GET `/analyze/{id}/timeseries/{hash}/status`

GET `/analyze/{id}/timeseries/{hash}/download`

GET `/analyze/{id}`

POST `/analyze/{id}/stop`

POST `/analyze/{id}/download`

GET `/analyze/{id}/download/{hash}/status`

GET `/analyze/{id}/download/{hash}`

POST `/analyze/timeserie` `deprecated`