NER Linking Worker
The NER Linking worker allows you to filter out all documents not mentioning relevant entities. This process uses the named entity recognition with the keywords provided as an input in the request payload to annotate and link entities present in the document.
The ner-linking
worker is restricted to certain languages,
see the Language Support page for more informations
Definition
The NER Linking worker is a two-stage process:
- NER operates at the sentence level
- Entity linking function using the keywords specified in the payload
First stage: Named Entities Recognition at a sentence-level
TextReveal® NER is hybrid. It uses a combination of rule-based and machine learning approaches.
Available classes for the NER are the following:
Label | Type |
---|---|
ORG | Organization |
LOC | Location |
PERSON | Person |
PRODUCT | Product |
GPE | Geopolitical Entity |
The NER module annotates within the sentences the presumed mentions of organization, location, person, product or geopolitical entity.
This process only takes place once and happens before you run your analysis.
Examples
- NER's output for the sentences
Tim Cook eats an apple in front of Apple. Bill Gates is delivering a speech on Windows
:
[
{
"text": "Tim Cook eats an apple in front of Apple",
"entities": {
"PERSON": [
"Tim Cook"
],
"ORG": [
"Apple"
]
}
},
{
"text": "Bill Gates is delivering a speech on Windows",
"entities": {
"PERSON": [
"Bill Gates"
],
"PRODUCT": [
"Windows"
]
}
}
]
Sentence annotated with matched entities
- Example of a result from a NER System:
Colour-coded recognised entities
Second stage: Entity Linking Function
The Entity Linking Function’s job is to filter out documents not mentioning any entity of interest and keep only the ones which are relevant with the analysis you ran.
In order to do so, the Entity Linking Function uses the NER result of the previous stage and the keywords you've specified in the payload. The Entity Linking Function checks for each document’s sentence that has been annotated at the previous stage if it matches with one of the keywords of the entity of interest. If it does, the document is kept.
This function identifies and links sentences to an entity id, preserving the entity type. Identification is performed using lower case matching. By default, only the first occurrence of the same entity of interest is retrieved. When the sentence does not contain any entity of interest, an empty array is returned.
Example
[
{
"entities": {
"apple": [
{
"Apple": 1,
"Tim Cook": 1
}
],
"microsoft": [
{
"Windows": 1,
"Bill Gates": 1
}
]
},
"extract_date": "2019-02-01 09:27:39.016",
"id": "H36itWwBVJ4dixto1Mq89",
"language": "english",
"sentences": [
{
"entities": [
"apple"
],
"matches": [
{
"class": "entity",
"count": null,
"entity_id": "apple",
"entity_type": "ORG",
"label": "Apple"
},
{
"class": "entity",
"count": null,
"entity_id": "apple",
"entity_type": "PERSON",
"label": "Tim Cook"
}
],
"sentence_id": 0,
"text": "Tim Cook eats an apple in front of Apple",
"type": 0
},
{
"entities": [
"microsoft"
],
"matches": [
{
"class": "entity",
"count": null,
"entity_id": "microsoft",
"entity_type": "PRODUCT",
"label": "Windows"
},
{
"class": "entity",
"count": null,
"entity_id": "microsoft",
"entity_type": "PERSON",
"label": "Bill Gates"
}
],
"sentence_id": 1,
"text": "Bill Gates is delivering a speech on Windows",
"type": 0
}
]
}
]
Covered languages
The ner-linking
worker is restricted to certain languages,
see the Language Support page for more informations