neRecognizer

1. Usage and version information

Usage: . run.sh

Settings in config.py:

input_data: Path to input tsv file, e.g. data/ontonotes_en_name_entity.tsv
string_col: Name of text column, e.g. 'string' 
entities_col: Name of entities column, e.g. 'type'
ent_min_n: Counts threshold for removing scarce entities, e.g. 3
train: Run training stage (True/False)
test: Run testing stage (True/False)

Version 1:

Sentence tokenization and iterative searching

Version 2:

Simple search for lomgest match
Compatible output with the name_entity.tsv

Version 3:

Re-design for iterative search over token.tsv

Version 4:

Scoring fixed
Calculating scores by entity

1. Data Preparation

Reading the dev data line by line and adding completed sentences to a new csv. Using the part_of_speech column to detect the end of sentences and empty lines

We can now read the sentences as one string and use the ids to search the train data for the true entity

2. Recognizer

First we extract all the possible ngrams from the sentence. We start searching from the longest ones and stop when we find a match.

We then update the sentence by removing the matched string and we add the remaining sentence(s) to the queue. We continue until no ngrams are left

2. Printing Results

Printing out the augmented sentences, using NOMATCH for empty tags. Below the lines we get the local scores for each sentence.

Printing the final results. The first table contains results by entity. After that we get the overall scores for the devset.

A boxplot with the True Positives (Matched) counts per entity

These are the most frequent matches in the subset

Name		Name	Last commit message	Last commit date
Latest commit History 72 Commits
graphs		graphs
README.md		README.md
config.py		config.py
main.py		main.py
prep_data.py		prep_data.py
recognizer.py		recognizer.py
requirements.txt		requirements.txt
run.sh		run.sh
scoring.py		scoring.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

neRecognizer

1. Usage and version information

1. Data Preparation

2. Recognizer

2. Printing Results

About

Releases

Packages

Languages

kostisd/neRecognizer

Folders and files

Latest commit

History

Repository files navigation

neRecognizer

1. Usage and version information

1. Data Preparation

2. Recognizer

2. Printing Results

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages