Analyze
Extract

Start analyze

Create a granular dataset in a secure web server for multiple Entities.

POST
https://api.textreveal.com/api/2.0/analyze/dataset

Request

Request Body

  • co_mentionsstring[]

    List of keywords to search with the keywords list. Works like a boolean AND. Example:

    • keywords : ["TotalEnergy"]
    • co_mentions: ["gas", "oil price"]

    Behavior: TextReveal® API will look for documents relevant to at least one of the co_mentions. For the above example, below are the different cases of relevancy:

    • TotalEnergy and gas
    • TotalEnergy and oil price
    • TotalEnergy and oil price and gas

    N.B: Search of co_mentions is operated in full-text and is case insensitive.

    Example: ["tablets"]
  • conceptsobject

    List of concepts or risks that are to be analyzed. Each individual concept is defined by its own list of keywords.

    Punctuation is not handled in the concept labels. Each concept label must be unique (case insensitive).

  • concepts_filterobject

    Same as concepts but filters out documents that does not contain the concepts.

    Note: You can either use concepts or concepts_filter

  • countriesstring[]

    List of countries to search (field thread.country).

    N.B: Use alpha-2 format.

    Example: ["US"]
  • end_date*date

    The date when the anaysis should end.

    Example: "2019-02-01"
  • entities*object[]
  • keywords_excludestring[]

    List of keywords to exclude from the search. Works like a boolean AND NOT.

    Example:

    • keywords: ["apple", "iphone"]
    • keywords_exclude: ["Steve Jobs", "Tim Cook"] Behavior: TextReveal® API will look for documents relevant to apple the company or Iphone but NOT containing either Steve Jobs or Tim Cook.

    N.B: Search of keywords_exclude is operated in full-text and is case insensitive.

    Example: ["Steve Jobs"]
  • languages(string (enum))[]

    List of languages to search, see Language Support page for more information.

    Note: We do not recommend using multiple values.

    Default: ["english"]Values: "english", "french", "italian", "german", "spanish", "portuguese", "romanian", "russian", "finnish", "danish", "norwegian", "czech", "slovak", "polish", "swedish", "dutch", "japanese", "chinese"
  • min_matchnumber

    The message must contain at least min_match keywords.

    When used, each entity must have at least min_match keywords.

    Example:

    • keywords: ["apple", "iphone", "macbook"]
    • min_match: 2 Behavior: TextReveal® API will only keep the document if and only if at least 2 elements from the keywords list appear in the document.
    Default: 1Example: 1
  • min_repeatnumber

    The message must contain at least min_repeat occurrence of a keyword.

    Example:

    • keywords: ["apple", "iphone"]
    • min_repeat: 2

    Behavior: TextReveal® API will only keep the document if and only if it contains at least 2 occurrences of either apple or iphone.

    Default: 1Example: 1
  • qscorefloat

    Quality threshold to filter out unreadable data.

    No filtering is applied if the quality-score worker is not provided.

    Default: 50Range: [0, 100]
  • search_instring[]

    Allows to define if the documents extraction has to be done by searching entity keywords in the title and/or in the text.

    Example:

    • search_in: ["title", "text"]

    Note:

    • This parameter is only applied on the keywords of entity, not on keywords_exclude, co_mentions, neg_keywords, min_repeat, min_match.
    • Not available with ner-linking worker.
    Default: ["title", "text"]Example: ["title", "text"]
  • sentiments_filterobject

    Partial object containing a min/max values for each sentiments. The end analysis will contains documents that match these filters.

    Note: This can be compared to a filter on:

    • document_{sentiment}.mean key for positive, negative and neutral
    • document_{sentiment} key for polarity
  • similarity_thresholdfloat

    Similarity score threshold for recognized or matched entities. Filters out documents containing entities with a similarity score lesser than the threshold.

    Default: 0Range: [0, 1]
  • site_type(string (enum))[]

    Type of sites to search (field thread.site_type)

    Default: ["news", "blogs", "discussions"]Values: "news", "blogs", "discussions", "licensed_news", "premium_news"
  • sitesstring[]

    List of websites to search.<br /> N.B: Use the base domain of the websites.

    Example: ["apple.com"]
  • sites_excludestring[]

    A list of source sites to be excluded.

    Example: ["apple.com"]
  • start_date*date

    The date when the analysis should start.

    Example: "2019-01-31"
  • workers*(string (enum))[]

    List of the tasks that will be used for analysis. Should contain at least 'raw-matcher' or 'ner-linking'.

    Values: "quality-score", "ner-linking", "raw-matcher", "concept", "entity-similarity", "embedder-indicators"
Request
{
  "concepts": {
    "environment": [
      "environmental impact",
      "environmental controversy",
      "pesticide"
    ],
    "governance": [
      "offshore transaction",
      "dupery",
      "humbug"
    ],
    "pollution": [
      "fuel leakage",
      "greenhouse gases"
    ],
    "social": [
      "unscrupulous",
      "inequality",
      "malfeasance",
      "workplace violence"
    ]
  },
  "countries": [
    "US",
    "FR"
  ],
  "end_date": "2019-02-01",
  "entities": [
    {
      "context": "Apple is a technology company that designs, manufactures, and markets consumer electronics, personal computers, and software.",
      "entity_of_interest": "apple",
      "keywords": [
        "Apple Inc.",
        "Steve Wozniak",
        "Apple Computer",
        "Ron Wayne",
        "AC Wellness",
        "FileMaker",
        "Braeburn Capital",
        "David Pakman",
        "AAPL",
        "Apple",
        "Steve Jobs",
        "apple.com"
      ]
    },
    {
      "context": "SESAMm is a fintech company that specializes in big data and artificial intelligence for investment.",
      "entity_of_interest": "sesamm",
      "keywords": [
        "SESAMm SAS",
        "Florian Aubry",
        "SESAMm",
        "Pierre Rinaldi",
        "Sylvain Forté",
        "sesamm.com"
      ]
    }
  ],
  "languages": [
    "english"
  ],
  "qscore": 90,
  "sentiments_filter": {
    "positive": {
      "min": 0.5
    }
  },
  "similarity_threshold": 0.5,
  "site_type": [
    "news",
    "blogs",
    "discussions"
  ],
  "sites_exclude": [
    "apple.com"
  ],
  "start_date": "2019-01-31",
  "workers": [
    "quality-score",
    "concept",
    "raw-matcher",
    "ner-linking",
    "entity-similarity",
    "embedder-indicators"
  ]
}

Response

Response - 200

An identifier of the analysis to retrieve results

  • instanceuuid

    The identifier of an analysis. This identifier have to be used to get results.

    Example: "a62caf56-5961-4fff-ba2e-6d4dcf98960f"
Response
{
  "instance": "a62caf56-5961-4fff-ba2e-6d4dcf98960f"
}