Skip to content

Multiple NER-tool's combined in one output. Incovating mutliple NER-engine's in parallel.

License

Notifications You must be signed in to change notification settings

KBNLresearch/multiNER

Repository files navigation

MultiNER combines the output from five different (Dutch) named-entity recognition (https://en.wikipedia.org/wiki/Named-entity_recognition) packages into one answer.

This software is part of the dac (https://github.com/KBNLresearch/dac) project, Entity linker for the Dutch historical newspaper collection of the National Library of the Netherlands.

We've noticed a lot of misclassifications in our NER setup, so we've decided to combine the ouput of different NER packages. The following packages are used:

- Stanford NER (https://nlp.stanford.edu/software/CRF-NER.shtml)
- spaCy (https://spacy.io/)
- polyglot (https://polyglot.readthedocs.io/)
- DBpedia Spotlight (https://www.dbpedia-spotlight.org/)
- Flair (https://github.com/zalandoresearch/flair)

In our setup Stanford and Spotlight are the leading NER package's (So all these show up in the integrated results), only if 2 other NER packages agree on a NE, the answer show's up in the integrated results. If just Spotlight or Stanford see an NE, and none agree, it will still show up in the end result.

Example response:

"count": 3,
"type_certainty": 2,
"type": "person",
"right_context": "zich mij te vragen, of",
"pos": 1324,
"ne_context": "Manchon",
"ne": "Manchon",
"ner_src": [
    "stanford",
    "spacy",
    "polyglot"
],
"left_context": "op zijn allerlaatst verwaardigde mevrouw",
"types": [
    "person",
    "location"
]

In the example 3 NER packages have figured out "Manchon" is a NE, two of them agree that it is a person, and one thinks it's a location. Hence count: 3 and type_certainty 2. ner_src show's which packages think the current NE is a NE (in this case all of them). The context shows the surroundings of the NE.

All this information is picked-up by dac to weigh a possible match in WikiData/DBPedia.

MultiNER has only been tested using python 3, and is open-source with a MIT Licence.

There is a live demo available here:

https://ner.kbresearch.nl/

==

Install notes:

See ner.py / Dockerfile and *.sh files for details.

Or run from docker:

docker build -t multiner:latest .

docker run -i -p 8099:8099 multiner:latest run.sh