Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'ORGANIZATION' entity is not correctly identified #96

Open
Anusha-12 opened this issue Jan 8, 2024 · 5 comments
Open

'ORGANIZATION' entity is not correctly identified #96

Anusha-12 opened this issue Jan 8, 2024 · 5 comments

Comments

@Anusha-12
Copy link

I tired the evaluation script to validate the synthetic_data present in the data directory. The evaluation score is NaN for 'ORGANIZATION', it is not able to correctly predict the output
Screenshot 2024-01-08 at 1 33 13 PM

`

@omri374
Copy link
Contributor

omri374 commented Jan 28, 2024

Thanks, we'll look into this. Apologies for the delayed response.

@Anusha-12
Copy link
Author

Hi, is there any update with this?

@omri374
Copy link
Contributor

omri374 commented Feb 11, 2024

Apologies for the delay. Which evaluation script are you running? and with which model/nlp engine?
I tried to run it as is (on the first 300 samples in the synthetic dataset, and this is what I got:

image

image

image

If you're getting other results, please share a reproducible example and make sure you're using the latest code in the repo.

@Anusha-12
Copy link
Author

yes, I am using the Presidio Analyzer with 'Jean-Baptiste/roberta-large-ner-english' and I got different results. I will upload the graph

@omri374
Copy link
Contributor

omri374 commented Feb 18, 2024

I tried to reproduce with this model. This is the configuration I used:

from presidio_analyzer import AnalyzerEngine
from presidio_analyzer.nlp_engine import TransformersNlpEngine, NerModelConfiguration


# Here we define a transformers based NLP engine, 
# but you can use this cell to customize your Presidio Analyzer instance

# Define which model to use
model_config = [{"lang_code": "en", "model_name": {
    "spacy": "en_core_web_sm",  # use a small spaCy model for lemmas, tokens etc.
    "transformers": "Jean-Baptiste/roberta-large-ner-english"
    }
}]


# Map transformers model labels to Presidio's
model_to_presidio_entity_mapping = dict(
    PER="PERSON",
    LOC= "LOCATION",
    ORG="ORGANIZATION"
)

ner_model_configuration = NerModelConfiguration(labels_to_ignore = ["O"], 
                                                model_to_presidio_entity_mapping=model_to_presidio_entity_mapping)

nlp_engine = TransformersNlpEngine(models=model_config,
                                   ner_model_configuration=ner_model_configuration)

# Set up the engine, loads the NLP module (spaCy model by default) 
# and other PII recognizers
analyzer_engine = AnalyzerEngine(nlp_engine=nlp_engine)

Results did show ORGANIZATION metrics, but please try to reproduce too and let me know if you were able to get this working.

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants